Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Sep;24(9):2384-421.
doi: 10.1162/NECO_a_00330. Epub 2012 Jun 26.

Characterizing responses of translation-invariant neurons to natural stimuli: maximally informative invariant dimensions

Affiliations

Characterizing responses of translation-invariant neurons to natural stimuli: maximally informative invariant dimensions

Michael Eickenberg et al. Neural Comput. 2012 Sep.

Abstract

The human visual system is capable of recognizing complex objects even when their appearances change drastically under various viewing conditions. Especially in the higher cortical areas, the sensory neurons reflect such functional capacity in their selectivity for complex visual features and invariance to certain object transformations, such as image translation. Due to the strong nonlinearities necessary to achieve both the selectivity and invariance, characterizing and predicting the response patterns of these neurons represents a formidable computational challenge. A related problem is that such neurons are poorly driven by randomized inputs, such as white noise, and respond strongly only to stimuli with complex high-order correlations, such as natural stimuli. Here we describe a novel two-step optimization technique that can characterize both the shape selectivity and the range and coarseness of position invariance from neural responses to natural stimuli. One step in the optimization is finding the template as the maximally informative dimension given the estimated spatial location where the response could have been triggered within each image. The estimates of the locations that triggered the response are updated in the next step. Under the assumption of a monotonic relationship between the firing rate and stimulus projections on the template at a given position, the most likely location is the one that has the largest projection on the estimate of the template. The algorithm shows quick convergence during optimization, and the estimation results are reliable even in the regime of small signal-to-noise ratios. When we apply the algorithm to responses of complex cells in the primary visual cortex (V1) to natural movies, we find that responses of the majority of cells were significantly better described by translation-invariant models based on one template compared with position-specific models with several relevant features.

PubMed Disclaimer

Figures

Figure 1
Figure 1
(A) Model of neural response based on one translation invariant stimulus feature. The spike probability represents a logical OR combination of responses from hidden, position-specific units that are selective for the same stimulus feature centered at different retinotopic coordinates. (B) An example of a discrete 3 × 3 grid approximation that can be used to model invariance of neural responses to image translation. The shaded square denotes the spatial extent of the preferred image feature; nine possible ways of centering the preferred template within the overall stimulus are shown.
Figure 2
Figure 2. Statistical description of neural responses along relevant and irrelevant dimensions
In the framework of the position-specific model, some images elicit spikes (black) and others do not (gray). Here each of the images s⃗ is represented as a point in a two-dimensional space, although it is a point in a high d-dimensional space (each axis may correspond to the luminance of a pixel). Because the vertical dimension (x2) does not affect the spike probability, the probability distribution of stimuli along that dimension P (x2) is similar to the distribution of stimuli given a spike P (x2|spike). On the other hand, the horizontal dimension x1 can account for the spiking behavior, because the spikes are observed whenever the stimulus component x1 exceeds a certain value.
Figure 3
Figure 3. Two approaches for characterizing translation invariant feature selectivity
(A) In the direct approach, we seek a template whose spatial extent is smaller than the overall stimulus that covers the response region of a neuron. The spike probability is examined by translating the candidate template to different locations of the translation grid (shown here for a 3 × 3 grid). (B) In the Fourier approach, in order to account for the translation invariance, the template is shifted to different locations of the translation grid assuming periodic boundary conditions. Compared to the direct approach, the Fourier approach can typically handle finer translation grids (due to memory restrictions in the direct approach), but it yields coarser estimates of the template because of the need to leave larger margins when using periodic boundary conditions.
Figure 4
Figure 4. Feature selectivity of translation invariant neurons cannot be characterized without taking this invariance into account
(A) The relevant feature of a model neuron with translation invariant responses. The centers of the 3 × 3 translation grid are marked with crosses. (B) The nonlinear gain function of the translation invariant model cell evaluated at the location producing a maximal projection with the model template (θ = 2.5, σ = 1.0, stimulus repeated 20 times). (C) The estimated template without taking into account translation invariance. (D) Comparison of the nonlinear gain functions with respect to the estimated filter (solid line) with the nonlinear gain function with respect to the model template at the central location of the translation grid (dashed line). Both functions are computed without translation invariance. The observed increase in the nonlinear gain function for negative projection values is due to the overlap between the templates centered at neighboring positions of the translation grid.
Figure 5
Figure 5. Estimation of translation invariant models
(A) The relevant template of the model neuron overlayed with the 3×3 translation grid (whose points are marked by crosses). (B) Comparison between the nonlinear gain function of the model cell (solid line) and the translation invariant estimation (dashed line). In contrast to the case of estimation without translation invariance, cf. Fig. 4, this estimation does reproduce the correct, sigmoidal form of the nonlinear gain function. (C) The Fourier method estimation using the 3×3 translation grid (same grid as in the model) yields a dot product of c = 0.897 ± 0.008 and a fraction of information explained Iexpl = 0.963 ± 0.006 (1 is the maximum). (D) Analogous estimation using the direct method yielded c = 0.899 ± 0.011 and Iexpl = 0.969 ± 0.008. Assuming a mismatched 5 ×5 translation grid (compared to the model) still leads to reasonable estimation results using either the Fourier method (E), c = 0.790 ± 0.004 and Iexpl = 0.826 ± 0.002, or the direct method (F), c = 0.78 ± 0.02, Iexpl = 0.826 ± 0.003. (G) Comparison of model spike probability (black line, gray area shows standard errors of the mean) and the predicted spike probability (blue) using the template and model from panel D. Predictions were made for a novel set of frames not used in estimating the model.
Figure 6
Figure 6. Example of algorithm convergence
(A) Convergence in terms of information explained on the test data set by the candidate template. (B) Convergence in terms of projection of the candidate template onto the model template. In both panels, the four different lines correspond to four different jackknife analyses of the same model neuron. In the case of information, different final values are due to differences in the overall information per spike in a particular test data set. According to both parameters the algorithm converges in all cases within 100 iterations, less than d = 256 of the template space. Insets in (B) show the estimated templates after 1, 5, 10, and 15 line optimizations.
Figure 7
Figure 7. Recovering the coarseness of translation invariance
The percent of information explained is plotted as a function of the translation grid size assumed during estimation. (A) Model cells with a 3 × 3 translation grid, σ = 1.0, θ = 2.5, 2.75, 3.0 analyzed from 20 repeats of the whole stimulus sequence (16, 384 frames). The best predictive power is obtained when the same grid is used during estimation. Significant t-tests are obtained for the difference between the peak value and the value for 1 × 1 grid and 5 × 5 grid (p < 10−4, t-test). (B) Model cells with 5 × 5 translation grid and θ = 3.0, 3.25, 3.75 (other parameters are the same as in (A)). The use of coarser translation grids results in significantly worse performance (p < 10−4); however finer translation grid results in the same performance (p = 0.16). (C) Model cells with 17 × 17 translation grid, θ = 4.0, 4.25, 4.50. This is the case of perfect translation invariance with the grid spacing of 1 pixel. We find that the performance of the estimation algorithm continues to improve from 5 × 5 to 9 × 9 grids (p < 10−4). In all cases, therefore, the algorithm could disambiguate coarser translation grids from the true ones.
Figure 8
Figure 8. Projection between estimated and model dimensions as a function of the number of spikes
Nspikes. Improvement in performance with increasing number of spikes is shown for 12 model cells. All of the model cells had the same relevant template as in Fig. 5A and translation grid 3 × 3, but different noise levels and thresholds θ = 2.5 (A), θ = 2.75 (B)), θ = 3.0 (C), and θ = 3.5 (D). Within each panel, model cells have σ = 0.5 (light gray, ○), 0.75 (dark gray, ▽), and 1.0 (black, □). The solid and dashed lines represent results of quadratic and linear regressions. Stimulus dimensionality D = 1024, corresponding to frames with 32 × 32 pixels. Results were obtained using the Fourier approach. Good performance is obtained for all models cells even in the severely undersampled regime with D > Nspikes. As expected, the improvements with increasing the number of spikes are more pronounced for neurons with larger noise levels.
Figure 9
Figure 9. Population analysis of predictive power of position-specific and position-invariant models for V1 complex cells
Fraction of information explained by models with one (A), two (B) and three (C) features. Correlation coefficients between measured and predicted firing rates are compared for models with one (D), two (E), and three (F) translation invariant features with correlation coefficients obtained with the three-feature position-specific model. Models with the same number of features can be compared according percent information values (A–C), whereas models with different number of features can be compared according to correlation coefficients (D–F). Across the population, position-invariant models with one or two features outperformed their position-specific counterparts. Furthermore, significant improvements were observed for some of the neurons considered individually (points marked with empty circles, P < 0.05 t-test), where even the models with single translation invariant template outperformed the models with three position-specific features (D).
Figure 10
Figure 10. Example V1 complex cell that was better described with a position-invariant model
(A) Three relevant spatiotemporal features for a position-specific LN model are shown. Each feature is shown in a separate row and represents a spatiotemporal profile covering three time lags from −132 to −33 msec before the spike arrival time. Results are shown as averages over four jackknife estimates of each feature. The color scale denotes signal-to-noise ratio relative to the variance across the jackknife estimates, which was corrected for overlapping data in the jackknife estimates (Efron and Tibshirani, 1998). (B) Three relevant spatiotemporal templates of a position-invariant LN model, notations are as in (A). Firing rate predictions were made using these models for a novel, repeated data set. Predictions using the position-specific models (C) and position-invariant models (D) are shown using red, blue, and green lines for models based on one, two and three features, respectively. The measured firing rate (black line) is shown together with its standard error of the mean (gray shading), Neuron 883-2.
Figure 11
Figure 11. Example V1 complex cell that was better described with a position-specific model
Notations are as in Figure 10. Neuron 772-2.

Similar articles

Cited by

References

    1. Adelman TL, Bialek W, Olberg RM. The information content of receptive fields. Neuron. 2003;40:823–833. - PubMed
    1. Atencio CA, Sharpee TO, Schreiner CE. Cooperative nonlinearities in auditory cortical neurons. Neuron. 2008;58:956–966. - PMC - PubMed
    1. Atencio CA, Sharpee TO, Schreiner CE. Hierarchical computation in the canonical auditory cortical circuit. PNAS. 2009;106:21894–21899. - PMC - PubMed
    1. Baddeley R, Abbott LF, Booth MCA, Sengpiel F, Freeman T, Wakeman EA, Rolls ET. Responses of neurons in primary and inferios temporal visual cortices to natural scenes. Proc R Soc Lond B. 1997;264:1775–1783. - PMC - PubMed
    1. Bialek W, de Ruyter van Steveninck RR. Features and dimensions: Motion estimation in fly vision. 2005. q-bio/0505003.

Publication types