Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Nov 9;107(45):19525-30.
doi: 10.1073/pnas.1006076107. Epub 2010 Oct 11.

Bayesian model of dynamic image stabilization in the visual system

Affiliations

Bayesian model of dynamic image stabilization in the visual system

Yoram Burak et al. Proc Natl Acad Sci U S A. .

Abstract

Humans can resolve the fine details of visual stimuli although the image projected on the retina is constantly drifting relative to the photoreceptor array. Here we demonstrate that the brain must take this drift into account when performing high acuity visual tasks. Further, we propose a decoding strategy for interpreting the spikes emitted by the retina, which takes into account the ambiguity caused by retinal noise and the unknown trajectory of the projected image on the retina. A main difficulty, addressed in our proposal, is the exponentially large number of possible stimuli, which renders the ideal Bayesian solution to the problem computationally intractable. In contrast, the strategy that we propose suggests a realistic implementation in the visual cortex. The implementation involves two populations of cells, one that tracks the position of the image and another that represents a stabilized estimate of the image itself. Spikes from the retina are dynamically routed to the two populations and are interpreted in a probabilistic manner. We consider the architecture of neural circuitry that could implement this strategy and its performance under measured statistics of human fixational eye motion. A salient prediction is that in high acuity tasks, fixed features within the visual scene are beneficial because they provide information about the drifting position of the image. Therefore, complete elimination of peripheral features in the visual scene should degrade performance on high acuity tasks involving very small stimuli.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1.
Fig. 1.
(A) The letters E and F on the 20/20 line of the Snellen eye chart test, projected on an image of the foveal cone mosaic (photoreceptor image modified from ref. 39). The 1-arcmin features that distinguish the letters extend over only a few cones. Also shown is a sample fixational eye movement trajectory for a standing subject (courtesy of ref. 12), sampled every 2 ms for a duration of 500 ms and then smoothed with a 4-ms boxcar filter. Red dots mark the spike times from a neuron firing at 100 Hz. (B) Diagram of model for spike generation; see text for details. (C) Spikes generated by our model retina, presented with a letter E spanning 5 arcmin for 40 ms (with instantaneous RGC response), (Left) with no image drift and (Right) with image drift following statistics of human fixational eye motion. (D) Architecture of a neural implementation of the factorized decoder. (Upper) Each RGC projects to multiple what and where cells. (Lower) The projections are reciprocally gated between the two populations.
Fig. 2.
Fig. 2.
(A) Example of image reconstruction by the factorized decoder. (Upper) From left to right: the stimulus; snapshot of activity in the where cell population at t = 10 ms; and tracking of horizontal and vertical image position over time, with probability (grayscale) compared with actual trajectory (red). Parameters: 30 × 30 pixels, 0.5 arcmin/pixel, λ0,1 = 10/100 Hz, D = 100 arcmin2/s. (Lower) Several snapshots of activity in the what cell population. (B) Fraction of correctly estimated pixels as a function of time, averaged over 100 randomly selected images each containing 50 × 50 pixels and spanning 25 × 25 arcmin. Spikes generated with image motion are presented to the factorized and static decoders (solid traces). Performance of static decoder is shown also for a static image (dashed trace).
Fig. 3.
Fig. 3.
(A) Performance as a function of D, averaged over 1,000 presentations of random images. The convergence time (at which 90% of pixels are estimated correctly) increases with D (Left) and the accuracy (fraction of correctly estimated pixels at t = 300 ms) decreases with D (Right). Results are shown for images containing 40 × 40 pixels (20 × 20 arcmin). Increasing the firing rate improves performance (λ0,1 = 10/100 Hz, solid traces; λ0,1 = 20/200 Hz, dashed traces). (B) Performance improves with image size. Solid traces show performance for several image sizes, indicated in the Inset in units of arcminutes. Dashed trace shows reconstruction of 5 × 5 arcmin images consisting of 1 × 1 arcmin pixels. In all other traces resolution is 0.5 × 0.5 arcmin. Vertical dashed lines designate the value of D that corresponds to measured statistics of human fixational eye motion (–13).
Fig. 4.
Fig. 4.
Performance for spike trains generated with a temporal filter in RGC response. (A) Convergence time when the trajectory is known to the decoder. In contrast to the case of instantaneous response, performance depends on the diffusion statistics. Traces show the convergence time (for 90% accuracy), as a function of D for a factorized decoder that takes into account the filter (SI Appendix, Section III). Parameters: 20 × 20 pixel images, 0.5 arcmin/pixel (dashed trace) and 1 arcmin/pixel (solid trace). For known trajectory, image size has little effect (SI Appendix). Vertical dashed line: D = 100 arcmin2/s. (Inset) The temporal filter f(τ). (B) Performance of the naive factorized decoder when spikes are generated with a temporal filter (unknown trajectory). Traces show fraction of correctly estimated pixels as a function of time, averaged over 1,000 presentations of random images of sizes 40 × 40 arcmin, with D = 100 arcmin2/s. Solid and dashed traces: 1 × 1 arcmin and 0.5 × 0.5 arcmin pixels, respectively. The nonmonotonic dependence at short times is related to the structure of the temporal filter and can be eliminated using a modified version of the update rules (SI Appendix, Section III, and Fig. S3). (Inset) Accuracy at t = 300 ms measured for several image sizes, with 1 × 1 arcmin pixels (average over 1,000 presentations). (C) Performance on a discrimination task between 26 patterns representing the letters A–Z, averaged over 400 trials (see main text for all other parameters). Factorized decoder, black trace; static decoder, red trace; piecewise static decoder (Discussion and SI Appendix), gray trace. (D) Architecture of a neural implementation of the factorized decoder for binocular vision (Discussion).

Comment in

References

    1. Britten KH, Shadlen MN, Newsome WT, Movshon JA. The analysis of visual motion: A comparison of neuronal and psychophysical performance. J Neurosci. 1992;12:4745–4765. - PMC - PubMed
    1. Shadlen MN, Newsome WT. Neural basis of a perceptual decision in the parietal cortex (area LIP) of the rhesus monkey. J Neurophysiol. 2001;86:1916–1936. - PubMed
    1. Rao RPN. Bayesian inference and attentional modulation in the visual cortex. Neuroreport. 2005;16:1843–1848. - PubMed
    1. Deneve S, Latham PE, Pouget A. Efficient computation and cue integration with noisy population codes. Nat Neurosci. 2001;4:826–831. - PubMed
    1. Huys QJM, Zemel RS, Natarajan R, Dayan P. Fast population coding. Neural Comput. 2007;19:404–441. - PubMed

Publication types

LinkOut - more resources