Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2024 Sep;10(1):23-46.
doi: 10.1146/annurev-vision-101623-025432. Epub 2024 Sep 19.

Optimization in Visual Motion Estimation

Affiliations
Review

Optimization in Visual Motion Estimation

Damon A Clark et al. Annu Rev Vis Sci. 2024 Sep.

Abstract

Sighted animals use visual signals to discern directional motion in their environment. Motion is not directly detected by visual neurons, and it must instead be computed from light signals that vary over space and time. This makes visual motion estimation a near universal neural computation, and decades of research have revealed much about the algorithms and mechanisms that generate directional signals. The idea that sensory systems are optimized for performance in natural environments has deeply impacted this research. In this article, we review the many ways that optimization has been used to quantitatively model visual motion estimation and reveal its underlying principles. We emphasize that no single optimization theory has dominated the literature. Instead, researchers have adeptly incorporated different computational demands and biological constraints that are pertinent to the specific brain system and animal model under study. The successes and failures of the resulting optimization models have thereby provided insights into how computational demands and biological constraints together shape neural computation.

Keywords: Bayes optimality; efficient coding; motion estimation; optimization; task optimization.

PubMed Disclaimer

Conflict of interest statement

DISCLOSURE STATEMENT

The authors are not aware of any affiliations, memberships, funding, or financial holdings that might be perceived as affecting the objectivity of this review.

Figures

Figure 1
Figure 1
An optimized model acts as a concrete point of comparison for understanding the performance and features of the system. (a) The performance of an optimized model may be compared to the measured system’s performance (ΔP). The parameters of the optimized model may also be compared to those fit to the measured system (Δθ). (b) One may also compare the measured system to the optimized model in terms of functional properties beyond the optimized one.
Figure 2
Figure 2
Motion detection is an inference problem. (a) A natural scene. (b) An intensity trace across the highlighted slice in panel a. Circles denote locations at which intensity is detected by an eye and correspond to the locations of time traces in panel c. (c) Intensity traces (bottom) created by an image moving with time-varying velocity (top). The visual system processes the intensity traces to infer the velocity. (d) A spatiotemporal intensity pattern created by the scene moving rightward at a constant speed. Velocity estimation is equivalent to estimating the slope of this pattern. (e) Self-motion creates optic flow across the retina. When an animal rotates about a vertical axis, flow is in the azimuthal direction at all elevations (top). When an animal translates through the world, the flow direction and speed depend on the angle with respect to the direction of movement, as well as the distance to objects (bottom). Panels a–c adapted with permission from Mano et al. (2021).
Figure 3
Figure 3
Statistical properties of natural scenes. (a) Power tends to be highest at low frequencies and fall off at high frequencies. Data taken from Ruderman & Bialek (1993). (b) Natural scenes have a positively skewed intensity distribution, so that light patches are far brighter than average, while dark patches are only a little darker than average. Data taken from Brady & Field (2000). (c,d) The power spectrum leads to (c) spatial correlations in natural scenes that become (d) spatiotemporal correlations when the scene moves. Panels c and d adapted from Fitzgerald & Clark (2015) (CC BY 4.0).
Figure 4
Figure 4
Models for motion detection occupy a continuum related to Marr & Poggio’s (1976) levels of analysis. The computational level of understanding reflects what a circuit does to promote the animal’s survival. In flies, motion detection stabilizes orientation and walking speed during navigation, among other functions. The algorithmic level reflects a mathematical summary of the computation, in this case, a correlator model, which explains fly rotational behavior very well in many, but not all, circumstances (Hassenstein & Reichardt 1956). This algorithm can be split into processing steps, which yields insight into the computation and leads to models that are progressively closer to what may be implemented in the circuit. In the figure, a linear–nonlinear model (Leong et al. 2016) and a split ON–OFF set of computations (Fitzgerald & Clark 2015, Salazar-Gatzimas et al. 2018) can be equivalent to the correlator model under some limits. Finally, the biological mechanism reflects the actual biophysical and circuit processes that implement the higher-level descriptions. In the figure, specific input neurons change conductances in a direction-selective T4 cell, a model that reduces to a correlator model with small inputs (Zavatone-Veth et al. 2020). Vm represents the T4 membrane voltage; Mi9, Mi1, and Mi4 are classes of neurons providing input to T4 at different retinotopic offsets. Images of processing steps taken from Fitzgerald & Clark (2015) (CC BY 4.0).
Figure 5
Figure 5
Models for motion estimation. (a) A correlator model of motion estimation, also known as a Hassenstein Reichardt correlator (Hassenstein & Reichardt 1956). Intensity or contrast signals from neighboring points in space are multiplied after one signal is delayed in time. This operation amplifies signals when the delayed and nondelayed signals coincide at the multiplicative step. The output of the model is the difference between two mirror-symmetric multipliers. (b) A motion energy model (Adelson & Bergen 1985). An oriented spatiotemporal filter amplifies signals in a preferred direction compared to the null direction, after which the filtered signal is squared. The linear operation alone does not create a direction-selective signal, since both preferred and null-direction signals have the same mean. (c) A biophysical model for motion estimation (Mo & Koch 2003, Zavatone-Veth et al. 2020) can be expanded into a Volterra series that approximates its operations at different polynomial orders of the input (Poggio & Reichardt 1973, Potters & Bialek 1994). The first three non-direction-selective terms each contain a nonlinearity, N. The lowest-order directional terms multiply pairs of inputs: These terms are approximated by correlator and motion energy models. The last term is an example third-order term, which multiplies three signals from two points in space. Other third- and higher-order terms are not shown, and we omit signs and scale factors for simplicity. Vm represents membrane voltage in the model.
Figure 6
Figure 6
The many meanings of optimal. (a) The visual system computes intermediate representations, c, of the input x: xcŷ. Although only one is shown, there could be several layers of representations. Encoding is moving from x to c, while decoding is moving from c to a useful quantity ŷ, which is latent in x and c. (b) Some theories restrict optimization to the encoding or decoding step, constraining or ignoring the other step. Other theories impose fewer constraints and/or use a loss function that depends on both input and output representations. Many different optimization theories can be derived by choosing different objective functions and constraints.

References

    1. Adelson EH, Bergen JR. 1985. Spatiotemporal energy models for the perception of motion. J. Opt. Soc. Am. A 2(2):284–99 - PubMed
    1. Agrochao M, Tanaka R, Salazar-Gatzimas E, Clark DA. 2020. Mechanism for analogous illusory motion perception in flies and humans. PNAS 117(37):23044–53 - PMC - PubMed
    1. Alexander E, Cai LT, Fuchs S, Hladnik TC, Zhang Y, et al. 2022. Optic flow in the natural habitats of zebrafish supports spatial biases in visual self-motion estimation. Curr. Biol 32(23):5008–21.e8 - PMC - PubMed
    1. Anderson PW. 1972. More is different: broken symmetry and the nature of the hierarchical structure of science. Science 177(4047):393–96 - PubMed
    1. Anstis S 1970. Phi movement as a subtraction process. Vis. Res 10(12):1411–30 - PubMed

LinkOut - more resources