Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2022 Dec 5:16:1006703.
doi: 10.3389/fncel.2022.1006703. eCollection 2022.

Efficient processing of natural scenes in visual cortex

Affiliations
Review

Efficient processing of natural scenes in visual cortex

Tiberiu Tesileanu et al. Front Cell Neurosci. .

Abstract

Neural circuits in the periphery of the visual, auditory, and olfactory systems are believed to use limited resources efficiently to represent sensory information by adapting to the statistical structure of the natural environment. This "efficient coding" principle has been used to explain many aspects of early visual circuits including the distribution of photoreceptors, the mosaic geometry and center-surround structure of retinal receptive fields, the excess OFF pathways relative to ON pathways, saccade statistics, and the structure of simple cell receptive fields in V1. We know less about the extent to which such adaptations may occur in deeper areas of cortex beyond V1. We thus review recent developments showing that the perception of visual textures, which depends on processing in V2 and beyond in mammals, is adapted in rats and humans to the multi-point statistics of luminance in natural scenes. These results suggest that central circuits in the visual brain are adapted for seeing key aspects of natural scenes. We conclude by discussing how adaptation to natural temporal statistics may aid in learning and representing visual objects, and propose two challenges for the future: (1) explaining the distribution of shape sensitivity in the ventral visual stream from the statistics of object shape in natural images, and (2) explaining cell types of the vertebrate retina in terms of feature detectors that are adapted to the spatio-temporal structures of natural stimuli. We also discuss how new methods based on machine learning may complement the normative, principles-based approach to theoretical neuroscience.

Keywords: efficient coding hypothesis; natural scene analysis; sensory system; textures analysis; visual cortex (VC).

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

Figure 1
Figure 1
Efficient coding relating natural-image statistics to psychophysics. (A) The forced-choice task from Victor and Conte (1991). After a cue, a subject is shown either an unstructured (Victor and Conte, 1991) texture half of the time, or a correlated texture. The subject was asked to distinguish between the structured and unstructured stimuli. (B) Average four-point correlations calculated for each of two kinds of glider (see C) over a database of natural images. Group 1 correlations have an average that is statistically positive, while Group 2 correlations average close to zero even for large patches. Adapted from Tkačik et al. (2010). (C) Psychophysics results for the same texture groups as in (B). The x-axis shows the strength of the correlations. Group 1 textures have low discrimination thresholds, while Group 2 textures are hard to distinguish from unstructured noise even at the highest correlation levels. Adapted from Victor and Conte (1991) and used with permission from Elsevier. (D–F) Two regimes of efficient coding, depending on input and output noise. Adapted from Hermundstad et al. (2014). (D) The model that is being optimized. (E) Results of the optimization as a function of input noise. Note that higher gain leads to higher sensitivity assuming a fixed threshold on the amplified signal. Signal variability is the variance of the input signal. The parameter Λ measures the balance between input and output noise. Small Λ is the regime where input nose dominates, while large Λ is when output noise dominates. (F) (Left) The “variance predicts salience,” sampling-limited regime that was found to be relevant for texture perception in Hermundstad et al. (2014). (Right) The more familiar whitening regime of efficient coding.
Figure 2
Figure 2
Efficient coding predicts detailed psychophysical thresholds for a variety of binary and ternary textures. (A–C) Results for binary textures, adapted from Hermundstad et al. (2014). (A) The four-alternative forced-choice (4AFC) design of the experiment: after a cue, subjects are shown a background of structured (unstructured) texture with a strip of unstructured (structured) texture in one of the four cardinal positions. The subjects need to identify the location of the strip. (B) Predicted (in various shades of blue) and measured (in shades of red and purple) perceptual sensitivities for various two-, three-, and four-point correlations. Sample patches are shown under the x-axis. The different shades correspond to different preprocessing choices (for the natural-image analysis) and different subjects (for the psychophysics). (C) Predicted (blue) and measured (shades of red and purple) isodiscrimination contours for textures that combine two of the 10 axes used in (B). (D,E) Results for ternary textures, adapted from Teşileanu et al. (2020). (D) Predicted (blue) and measured (red) discrimination thresholds for textures in different “simple” planes of the grayscale texture space with three gray levels. See Teşileanu et al. (2020) for a detailed description of the texture space. (E) Predicted (blue) and measured (red) discrimination thresholds for textures in different “mixed” planes. See Teşileanu et al. (2020) for a detailed description of the texture space.
Figure 3
Figure 3
Photorealistic textures from the Portilla-Simoncelli texture model. The synthetic images were generated using the Matlab code from https://github.com/LabForComputationalVision/textureSynth. (A–D) The original samples were taken either from the same repository (for “reptile skin” and “nuts”), or from other freely available images on the internet (for “wood” and “Milky Way”).
Figure 4
Figure 4
Rat sensitivity to visual multipoint correlations verifies the prediction from efficient coding theory, matching that found in humans. (A) Example stimuli used in the psychophysics task. Rats performed a two-alternative forced choice task where they had to report if a given stimulus was an instance of structured noise (a correlated pattern with one-, two-, three-, or four-point structure, generated using the gliders on the right), or unstructured “white” noise. (B) Operant structure of the psychophysics task. (C) Example psychometric curve from one of the rats trained to distinguish 2-point correlated textures from white noise. Black dots: experimental data. Blue line: ideal observer model fit. (D) Comparison between rat and human sensitivity to structured textures (diamonds and squares, respectively), and degree of variability of the corresponding statistics in natural images (dots). Adapted from Caramellino et al. (2021).
Figure 5
Figure 5
Effect of short-term adaptation on the response timescale of high- vs. low- level feature detectors (cartoon). (A) Dynamical visual stimulus (movie frames). Orange dot, yellow shape: idealized receptive field of a low-level feature detector neuron (“edge detector,” orange) and a high-level feature detector (“rat head detector,” yellow). (B) Single-trial response of the two example neurons, when adaptation is absent (green trace) and when adaptation is strong (blue trace). Note how adaptation shortens the timescale of the response. Reproduced from Piasini et al. (2021). Activity traces in B are obtained by simulating a simple neural encoding model, also described in details in Piasini et al. (2021).
Figure 6
Figure 6
Measuring response and intrinsic timescales along the rat analog of the ventral visual stream. (A) Example frames from one of the nine movies used as visual stimuli. (B) Functional identification of rat cortical areas. Top: example slice of rat visual cortex, obtained from one of the rats where recordings were performed. Red fluorescence indicates the insertion path of the multielectrode silicon probe, schematized in white. Bottom: firing intensity maps showing the RFs of the units recorded at selected recording sites along the probe (indicated by the numbers under the RF maps). The reversal in the progression of the retinotopy between sites 16 and 17 marks the boundary between areas LI and LL (shown by a dashed line on the top panel). (C) Response timescales (y-axis) measured across the cortical hierarchy for stimuli with different timescales (x-axis). Markers indicate empirical estimates, lines indicate linear regression with common slope across areas and varying intercept. Gray line indicates a regression where all extrastriate areas (LM, LI, LL) are pooled together and compared to V1. (D) Same as (C), for intrinsic timescales. Adapted from Piasini et al. (2021).

References

    1. Angeloni C. F., Młynarski W., Piasini E., Williams A. M., Wood K. C., Garami L., et al. . (2021). Cortical efficient coding dynamics shape behavioral performance. bioRxiv [Preprint]. 10.1101/2021.08.11.455845 - DOI
    1. Anselmi F., Poggio T. (2014). Representation Learning in Sensory Cortex: A Theory. Technical Report 26, CBMM.
    1. Atick J. J., Redlich A. N. (1990). Towards a theory of early visual processing. Neural Comput. 2, 308–320. 10.1162/neco.1990.2.3.308 - DOI
    1. Attwell D., Laughlin S. B. (2001). An energy budget for signaling in the grey matter of the brain. J. Cereb. Blood Flow Metab. 21, 1133–1145. 10.1097/00004647-200110000-00001 - DOI - PubMed
    1. Balas B., Nakano L., Rosenholtz R. (2009). A summary-statistic representation in peripheral vision explains visual crowding. J. Vis. 9, 1–18. 10.1167/9.12.13 - DOI - PMC - PubMed