Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Mar 25:16:857653.
doi: 10.3389/fncom.2022.857653. eCollection 2022.

Maximal Dependence Capturing as a Principle of Sensory Processing

Affiliations

Maximal Dependence Capturing as a Principle of Sensory Processing

Rishabh Raj et al. Front Comput Neurosci. .

Abstract

Sensory inputs conveying information about the environment are often noisy and incomplete, yet the brain can achieve remarkable consistency in recognizing objects. Presumably, transforming the varying input patterns into invariant object representations is pivotal for this cognitive robustness. In the classic hierarchical representation framework, early stages of sensory processing utilize independent components of environmental stimuli to ensure efficient information transmission. Representations in subsequent stages are based on increasingly complex receptive fields along a hierarchical network. This framework accurately captures the input structures; however, it is challenging to achieve invariance in representing different appearances of objects. Here we assess theoretical and experimental inconsistencies of the current framework. In its place, we propose that individual neurons encode objects by following the principle of maximal dependence capturing (MDC), which compels each neuron to capture the structural components that contain maximal information about specific objects. We implement the proposition in a computational framework incorporating dimension expansion and sparse coding, which achieves consistent representations of object identities under occlusion, corruption, or high noise conditions. The framework neither requires learning the corrupted forms nor comprises deep network layers. Moreover, it explains various receptive field properties of neurons. Thus, MDC provides a unifying principle for sensory processing.

Keywords: computational modeling; grandmother cell; invariant representation; object recognition (OR); redundancy capturing; redundancy reduction; sparse coding; sparse recovery (SR).

PubMed Disclaimer

Conflict of interest statement

RR, DD, and CRY declare the existence of a financial competing interest in the form of a patent application based on this work. The remaining author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

FIGURE 1
FIGURE 1
Illustration of the classic and the maximal dependence capturing frameworks for object representation. (A) Classic framework. Inputs are decomposed into independent features and are reassembled hierarchically into more complex combinations to represent separate objects. Two randomly shaped objects are depicted. Their representations at later stages are color-coded. Gray patches depict the receptive field properties of representation neurons. (B) The problem of missing parts. (i) Occlusion (depicted as the gray oval obscuring the object) needs to be associated with the un-occluded one to be identified. (ii,iii) The classic models require learning the occluded feature or feature combinations to be the same at every stage of processing. There is also a need to learn every corrupted form to reach robustness. (C) In the MDC framework, coding units capture structural relationships among the features and encode them as a whole. (D) In the MDC model, missing feature (grayed out area) does not affect the encoding because redundancy allows the same units to represent the features and the objects even when parts are missing. There is no need to learn from all corrupted forms.
FIGURE 2
FIGURE 2
Dependence Capturing by the MDC framework. (A) Illustration of encoding in the MDC framework. Symbols from world languages are converted to 256 (16 × 16) pixel images (x) that are transformed into the activity of a set of 800 representation units (a) to encode symbol identities. An example of the process is shown using a character encoded by a single representation unit to indicate the ability to encode complex structural features. (B) Structures of the dictionary elements (receptive fields) learned from 1,000 symbols. Highlighted elements displayed in the larger size are most active while representing the inputs shown in Figure 3. Note the similarity among some of the elements. (C) Information contents of sample dictionary elements normalized to the maximum observed information content. While very simple structures are least informative about any object, more comprehensive structures are highly informative. (D) Distribution of the normalized information contents of the dictionary elements. Most of the structures are highly informative about specific objects indicating that the mathematical framework captures features that share maximum dependence with the inputs.
FIGURE 3
FIGURE 3
Invariant object representations in the MDC framework. (A) Representation of inputs by the activity of the dictionary elements. Only the most active ones are shown. The height of the bars indicates activity evoked by the input patterns. Note that the inputs only activate the dictionary elements that are most similar to them despite the similarity among the elements. Such coding results in very sparse input representations. (B) Correlation among representation units is minimal, and the pairwise correlation matrix of the representation units is an identity. (C) Representations of original (i) and corrupted inputs under noisy (ii), pixel loss (iii), and occlusion (iv) conditions. Representations of the corrupted signals (Repre) are similar to those of the originals. Reconstructed images (Recon) from the output units resemble the original symbols. (D) Example of two highly similar symbols being distinctly and robustly represented. The original input signals (i) corrupted by noise (ii) or occlusion (iii) are transformed to output activities that are similar to each other. Reconstructed images recover the original signals. (E) Z-scored similarity (specificity) between representations of corrupted and original signals as a function of total pixels in the input layer [randomly selected as in panel (Ciii)]. Scores calculated with different pixel numbers (blue dots) is fit to a sigmoidal curve (red line).
FIGURE 4
FIGURE 4
Robust representation of human faces. (A) Examples of face images in a library of 2,000 (1,000 male and 1,000 female) used to train a two-layer model. (B) Examples of dictionary elements learned from the face library. Note that they incorporate complex combinations of facial features but are not necessarily part or whole of any specific face. (C) Representation and recovery of faces with different alterations. A face (i) was altered to wear a pair of sunglasses (ii), a mustache (iii), or both (iv). The altered faces’ representations were nearly identical to the original, even though these examples were not in the training set. Images reconstructed from the representations based on the dictionary were similar to the original images. (D) Representation and recovery of occluded faces. Four different occlusions of a face—top (ii), bottom (iii), left (iv), and right (v) were generated. Representations of the occluded faces were highly similar to the original one (i). Reconstructions also matched the original face. (E) Face identity was not preserved in representations based on PCA performed on the 2,000 training faces. Representations of original (i) and occluded faces (ii–v) were obtained in the principal space (the first 25 components are shown to highlight the differences). Representations of occluded faces are different from the original. Reconstructed images match the corrupted rather than original faces. (F,G) Specificity (Z-score similarity) of face representations (F) and cosine similarity between reconstructed and original images (G) were calculated for MDC and PCA models. 50 faces were chosen randomly from the training set to create the four occluded versions.
FIGURE 5
FIGURE 5
Complexity of tuning properties is determined by the number of objects and dimension of the representation. (A) The structures of dictionary elements for the symbols in Figure 2 with a 300-unit output layer. Compared with that of the 800-unit shown in Figure 2B, there are more localized features. (B) Representation sparsity increases with increased dimensions. (C) K-L divergence between pixel distributions in the input signal and in the dictionary element as a function of the number of symbols to be encoded. (D) Coding redundancies in input and output (representation) while encoding increasing numbers of symbols. (E,F) dictionary elements of faces with a 500-dimension (E) or 1,000-dimension (F) encoder-set. (G) Response elicited by faces are sparser with increased dimensions.
FIGURE 6
FIGURE 6
The emergence of simple and complex receptive fields from training with natural images in MDC. (A) Illustration of image patches derived from the natural scene. (B) Examples of structures of individual dictionary elements. Both simple (cyan) and complex (magenta) elements are observed (i) and can be determined using Fourier transforms of the structures (ii). (C) As the number of training images increases, encoders’ tuning properties become more localized, and the percentage of simple PFs increases.

Similar articles

References

    1. Allen G. I., Grosenick L., Taylor J. (2014). A generalized least-square matrix decomposition. J. Am. Stat. Assoc. 109 145–159. 10.1080/01621459.2013.852978 - DOI
    1. Anderson C. H., Vanessen D. C. (1987). Shifter circuits: a computational strategy for dynamic aspects of visual processing. Proc. Natl. Acad. Sci. 84 6297–6301. 10.1073/pnas.84.17.6297 - DOI - PMC - PubMed
    1. Atick J. J. (1992). Could information theory provide an ecological theory of sensory processing? Network 3 213–251. 10.3109/0954898X.2011.638888 - DOI - PubMed
    1. Atick J. J., Redlich A. N. (1990). Towards a theory of early visual processing. Neur. Comp. 2 308–320. 10.1162/neco.1990.2.3.308 - DOI
    1. Attneave F. (1954). Some informational aspects of visual perception. Psychol. Rev. 61 183–193. 10.1037/h0054663 - DOI - PubMed