. 2010 Nov 9;107(45):19525-30.

doi: 10.1073/pnas.1006076107. Epub 2010 Oct 11.

Bayesian model of dynamic image stabilization in the visual system

Yoram Burak¹, Uri Rokni, Markus Meister, Haim Sompolinsky

Affiliations

PMID: 20937893
PMCID: PMC2984143
DOI: 10.1073/pnas.1006076107

Bayesian model of dynamic image stabilization in the visual system

Yoram Burak et al. Proc Natl Acad Sci U S A. 2010.

. 2010 Nov 9;107(45):19525-30.

doi: 10.1073/pnas.1006076107. Epub 2010 Oct 11.

Authors

Yoram Burak¹, Uri Rokni, Markus Meister, Haim Sompolinsky

Affiliation

¹ Center for Brain Science, Harvard University, Cambridge, MA 02138, USA.

PMID: 20937893
PMCID: PMC2984143
DOI: 10.1073/pnas.1006076107

Abstract

Humans can resolve the fine details of visual stimuli although the image projected on the retina is constantly drifting relative to the photoreceptor array. Here we demonstrate that the brain must take this drift into account when performing high acuity visual tasks. Further, we propose a decoding strategy for interpreting the spikes emitted by the retina, which takes into account the ambiguity caused by retinal noise and the unknown trajectory of the projected image on the retina. A main difficulty, addressed in our proposal, is the exponentially large number of possible stimuli, which renders the ideal Bayesian solution to the problem computationally intractable. In contrast, the strategy that we propose suggests a realistic implementation in the visual cortex. The implementation involves two populations of cells, one that tracks the position of the image and another that represents a stabilized estimate of the image itself. Spikes from the retina are dynamically routed to the two populations and are interpreted in a probabilistic manner. We consider the architecture of neural circuitry that could implement this strategy and its performance under measured statistics of human fixational eye motion. A salient prediction is that in high acuity tasks, fixed features within the visual scene are beneficial because they provide information about the drifting position of the image. Therefore, complete elimination of peripheral features in the visual scene should degrade performance on high acuity tasks involving very small stimuli.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

**Fig. 1.**
(A) The letters E and F on the 20/20 line of the Snellen eye chart test, projected on an image of the foveal cone mosaic (photoreceptor image modified from ref. 39). The 1-arcmin features that distinguish the letters extend over only a few cones. Also shown is a sample fixational eye movement trajectory for a standing subject (courtesy of ref. 12), sampled every 2 ms for a duration of 500 ms and then smoothed with a 4-ms boxcar filter. Red dots mark the spike times from a neuron firing at 100 Hz. (B) Diagram of model for spike generation; see text for details. (C) Spikes generated by our model retina, presented with a letter E spanning 5 arcmin for 40 ms (with instantaneous RGC response), (*Left*) with no image drift and (*Right*) with image drift following statistics of human fixational eye motion. (D) Architecture of a neural implementation of the factorized decoder. (*Upper*) Each RGC projects to multiple *what* and *where* cells. (*Lower*) The projections are reciprocally gated between the two populations.

**Fig. 2.**
(A) Example of image reconstruction by the factorized decoder. (*Upper*) From left to right: the stimulus; snapshot of activity in the *where* cell population at t = 10 ms; and tracking of horizontal and vertical image position over time, with probability (grayscale) compared with actual trajectory (red). Parameters: 30 × 30 pixels, 0.5 arcmin/pixel, λ_0,1 = 10/100 Hz, D = 100 arcmin²/s. (*Lower*) Several snapshots of activity in the *what* cell population. (B) Fraction of correctly estimated pixels as a function of time, averaged over 100 randomly selected images each containing 50 × 50 pixels and spanning 25 × 25 arcmin. Spikes generated with image motion are presented to the factorized and static decoders (solid traces). Performance of static decoder is shown also for a static image (dashed trace).

**Fig. 3.**
(A) Performance as a function of D, averaged over 1,000 presentations of random images. The convergence time (at which 90% of pixels are estimated correctly) increases with D (*Left*) and the accuracy (fraction of correctly estimated pixels at t = 300 ms) decreases with D (*Right*). Results are shown for images containing 40 × 40 pixels (20 × 20 arcmin). Increasing the firing rate improves performance (λ_0,1 = 10/100 Hz, solid traces; λ_0,1 = 20/200 Hz, dashed traces). (B) Performance improves with image size. Solid traces show performance for several image sizes, indicated in the *Inset* in units of arcminutes. Dashed trace shows reconstruction of 5 × 5 arcmin images consisting of 1 × 1 arcmin pixels. In all other traces resolution is 0.5 × 0.5 arcmin. Vertical dashed lines designate the value of D that corresponds to measured statistics of human fixational eye motion (–13).

**Fig. 4.**
Performance for spike trains generated with a temporal filter in RGC response. (A) Convergence time when the trajectory is known to the decoder. In contrast to the case of instantaneous response, performance depends on the diffusion statistics. Traces show the convergence time (for 90% accuracy), as a function of D for a factorized decoder that takes into account the filter (*SI Appendix*, *Section III*). Parameters: 20 × 20 pixel images, 0.5 arcmin/pixel (dashed trace) and 1 arcmin/pixel (solid trace). For known trajectory, image size has little effect (*SI Appendix*). Vertical dashed line: D = 100 arcmin²/s. (*Inset*) The temporal filter f(τ). (B) Performance of the naive factorized decoder when spikes are generated with a temporal filter (unknown trajectory). Traces show fraction of correctly estimated pixels as a function of time, averaged over 1,000 presentations of random images of sizes 40 × 40 arcmin, with D = 100 arcmin²/s. Solid and dashed traces: 1 × 1 arcmin and 0.5 × 0.5 arcmin pixels, respectively. The nonmonotonic dependence at short times is related to the structure of the temporal filter and can be eliminated using a modified version of the update rules (*SI Appendix*, *Section III,* and Fig. S3). (*Inset*) Accuracy at t = 300 ms measured for several image sizes, with 1 × 1 arcmin pixels (average over 1,000 presentations). (C) Performance on a discrimination task between 26 patterns representing the letters A–Z, averaged over 400 trials (see main text for all other parameters). Factorized decoder, black trace; static decoder, red trace; piecewise static decoder (*Discussion* and *SI Appendix*), gray trace. (D) Architecture of a neural implementation of the factorized decoder for binocular vision (*Discussion*).

See this image and copyright information in PMC

Comment in

Does the brain de-jitter retinal images?
Olshausen BA, Anderson CH. Olshausen BA, et al. Proc Natl Acad Sci U S A. 2010 Nov 16;107(46):19607-8. doi: 10.1073/pnas.1015709107. Epub 2010 Nov 8. Proc Natl Acad Sci U S A. 2010. PMID: 21059962 Free PMC article. No abstract available.
Psychophysical and physiological evidence contradicts a model of dynamic image stabilization.
Wehrhahn C. Wehrhahn C. Proc Natl Acad Sci U S A. 2011 Mar 8;108(10):E35; author reply E36. doi: 10.1073/pnas.1019614108. Epub 2011 Feb 17. Proc Natl Acad Sci U S A. 2011. PMID: 21330549 Free PMC article. No abstract available.

References

1. Britten KH, Shadlen MN, Newsome WT, Movshon JA. The analysis of visual motion: A comparison of neuronal and psychophysical performance. J Neurosci. 1992;12:4745–4765. - PMC - PubMed
1. Shadlen MN, Newsome WT. Neural basis of a perceptual decision in the parietal cortex (area LIP) of the rhesus monkey. J Neurophysiol. 2001;86:1916–1936. - PubMed
1. Rao RPN. Bayesian inference and attentional modulation in the visual cortex. Neuroreport. 2005;16:1843–1848. - PubMed
1. Deneve S, Latham PE, Pouget A. Efficient computation and cue integration with noisy population codes. Nat Neurosci. 2001;4:826–831. - PubMed
1. Huys QJM, Zemel RS, Natarajan R, Dayan P. Fast population coding. Neural Comput. 2007;19:404–441. - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

R01 EY014737/EY/NEI NIH HHS/United States

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Bayesian model of dynamic image stabilization in the visual system

Affiliation

Bayesian model of dynamic image stabilization in the visual system

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

Comment in

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources