Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2013 Jun;14(6):429-42.
doi: 10.1038/nrn3503.

Bridging the gap between theories of sensory cue integration and the physiology of multisensory neurons

Review

Bridging the gap between theories of sensory cue integration and the physiology of multisensory neurons

Christopher R Fetsch et al. Nat Rev Neurosci. 2013 Jun.

Abstract

The richness of perceptual experience, as well as its usefulness for guiding behaviour, depends on the synthesis of information across multiple senses. Recent decades have witnessed a surge in our understanding of how the brain combines sensory cues. Much of this research has been guided by one of two distinct approaches: one is driven primarily by neurophysiological observations, and the other is guided by principles of mathematical psychology and psychophysics. Conflicting results and interpretations have contributed to a conceptual gap between psychophysical and physiological accounts of cue integration, but recent studies of visual-vestibular cue integration have narrowed this gap considerably.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Schematic of a generic cue-integration/cue-conflict psychophysical task
Simplified version of a visual-auditory localization task, , in which the subject reports whether a stimulus was located to the left or right of a reference location (marked ‘0’). The stimulus can be a flash of light (the visual cue; light bulb icon) and/or a broadband noise burst or click (the auditory cue; speaker icon) presented at one of several possible locations in front of the subject. The two cues are presented either at the same location or separated by some amount (the cue-conflict), and the reliability of one or both cues is often manipulated experimentally, here denoted by the width and blurring of the icons. a. Depiction of cue-conflict trials in which the visual cue is more reliable and also displaced to the right, while the auditory cue is less reliable and displaced to the left. For this example, the cue-conflict is kept fixed, and the pair of stimuli is jointly moved to the left or right on different trials, generating a sigmoidal choice curve (psychometric function, green, plotted relative to the midpoint between the two stimuli). If the subjects weight the cues according to their reliability, they will make more rightward choices for a given position of the paired stimuli (relative to non-conflict conditions), and the psychometric curve will be shifted to the left of center. The stimulus position at which the curve reaches 50% rightward choices (point of subjective equality, PSE, dashed lines) maps onto a particular set of perceptual weights (waud and wvis; the wi in Eq. 1, Box 1), which in this case would have the relationship waud < wvis, since the visual cue is more reliable. b. Scenario on a different set of trials with the same cue-conflict but reversed reliability (auditory more reliable than visual). Here the subject should make more leftward choices, shifting the curve to the right (waud > wvis). c. In addition to measuring shifts of the psychometric function, performance with combined visual-auditory stimuli (green curve) can be compared to single-cue conditions (red and blue curves), testing the prediction that reliabilities add (Eq. 2 in Box 1; here denoted by a decrease in the standard deviation, σ, of the green cumulative Gaussian psychometric function by a factor of the square root of 2).
Figure 2
Figure 2. A probabilistic population code (PPC) framework accounts for optimal cue integration by summation of unisensory population activity
a. In this model (reproduced with permission from REF. ), sensory cues C1 and C2 each generate a ‘hill’ of population activity in their respective unisensory areas, which could be, for example, regions in visual cortex and auditory cortex. Each data point indicates a single neuron, and these cells are arranged by their preferred stimulus value (e.g., receptive field location). The hills are noisy, not smooth, because of variability in neuronal responses. Owing to the particular kind of variability in these model neurons (also commonly found in real neurons), each hill of activity encodes a conditional probability distribution (P(ri∣S), insets) whose variance is inversely proportional to the gain, or height, of the hill, indicated by the vertical arrows (g ∝ 1/σ2; note the weaker response and consequently broader distribution for the less reliable cue, C2). The inverse variance of this distribution is the quantity needed to perform optimal reliability-weighted cue integration (Box 1). Summing the two unisensory populations neuron-by-neuron generates a third population (right side) whose gain is the sum of the unisensory gains g1 and g2. Therefore, the inverse variance of the probability distribution P(r1 + r2∣S) encoded by the multisensory population is equal to the sum of the individual cues’ inverse variances, or reliabilities – the same operation prescribed by the optimal integration model (Eq. 2 in Box 1). b. A simulated cue-conflict trial in which sensory cue C1 (blue) specifies, in arbitrary units, a stimulus value of −20 and C2 (red) a value of +20. The C1 response has a greater gain than the C2 response, simulating a more reliable cue being presented along with a less reliable one, respectively. After summation, the resulting hill of activity (green) is skewed toward the more reliable cue, as shown schematically by the encoded probability distributions (inset). A downstream brain area that optimally decodes this multisensory activity would produce behavioral responses consistent with optimal cue integration theory (Box 1). Note that the shape of the multisensory hill – which depends on parameters such as the shape and width of tuning curves and the size of the cue-conflict – need not mimic the shape of the encoded distributions. Optimal cue integration can still occur via a linear combination of unisensory activity for a variety of tuning widths or shapes, provided that the linear combination is appropriately tailored to these tuning properties.
Figure 3
Figure 3. Combined psychophysical and neurophysiological studies of visual-vestibular cue integration in the macaque
a. Monkeys were trained to report their perceived heading (direction of self-motion relative to straight ahead) while seated in a virtual-reality setup. The apparatus consists of a motion platform that can translate in any direction, upon which is mounted a projector and rear-projection screen for displaying optic flow patterns that simulate movement of the observer through a random-dot ‘cloud’. Figure modified with permission from REF. . b. While the monkey fixated his gaze (dashed lines) on a spot at the center of the screen (yellow), heading stimuli were delivered in one of three conditions: vestibular (platform motion only, indicated by arrows on the platform), visual (optic flow only, indicated by arrows on the screen), or combined (platform motion and optic flow, as shown). c. Following each 2-second motion stimulus (here, a heading to the left of straight ahead), the monkey indicated his choice by making a saccadic eye movement to one of two targets (red). d. Behavioral data (psychometric functions) for a single session are plotted showing the proportion of rightward choices as a function of signed heading angle, where positive heading indicates rightward motion and negative indicates leftward. The slope of the fitted curve is a measure of the animal’s sensitivity to small changes in heading, in other words the reliability of the cue(s). The slope was greater in the combined condition (blue curve, triangles) than in the single-cue conditions (black and red curves), indicating an improvement in sensitivity (i.e., reduction in uncertainty or variance). The average improvement across sessions was close to the optimal prediction (Eq. 2). e. The firing rate responses (tuning curves) of a single example neuron from area MSTd are plotted using the same conventions as the behavioral data. Note the steeper slope of the tuning curve in the combined condition (blue, triangles), suggesting an increase in sensitivity of the neuron under multisensory stimulation. f. The firing rates depicted in panel e. were converted into simulated choices by an ideal observer using ROC analysis. The resulting ‘neurometric’ functions quantify the sensitivity of the neuron to small changes in heading during the vestibular (black), visual (red), and combined (blue) conditions. Similar to the behavioral effect, the slope of the neurometric curve is steeper in the combined condition than the single-cue conditions. Panels e and f modified with permission from REF. g. In a separate study100, the cues were placed in conflict to test for reliability-based cue weighting, analogous to Fig. 1a-b. Here, the visual cue was more reliable, hence the monkey made more rightward choices when the visual heading was displaced to the right (Δ = +4°, green curve and symbols) and more leftward choices when the visual heading was displaced to the left (Δ = −4°, blue curve and symbols). h. Tuning curves from the same neuron as in panel e., recorded under cue-conflict conditions. The curves are offset from one another because the more reliable visual cue drives the cell to fire more spikes (Δ = −4°, blue) or fewer spikes (Δ = +4°, green) for a given heading angle. i. Conversion of these firing rates into neurometric functions reveals a pattern similar to the behavioral result in panel g.; the shift of the curves for different values of Δ reflects the trial-by-trial weighting of cues (favoring the more reliable visual cue, as predicted from optimal cue integration). Panels g-i modified with permission from REF. .
Figure 4
Figure 4. The normalization model of multisensory integration
a. In this model, as in the simplified conceptual model of Box 2, unisensory neurons from separate populations send inputs to a topographically aligned multisensory neuron. b. Unisensory inputs are multiplied by synaptic weights d1 and d2 (fixed for a given neuron) and summed to generate the driving input to a particular multisensory neuron. This driving input is then divided by the summed responses of the rest of the population (the normalization pool; see Eq. 4 under “Closing the loop”). c. In addition to explaining the origin of the multisensory combination rule in MSTd, , divisive normalization can also account for the classic empirical principles of multisensory integration made famous by studies of the superior colliculus (SC), . One such phenomenon is called the spatial principle, illustrated here as a case of cross-modal suppression. One stimulus (‘input 1’, cross in red circles) is presented at the center of the receptive field of a simulated SC neuron, while a second stimulus (‘input 2’, X in blue circles) is presented two standard deviations (2σ) away from center. At relatively high (>7) intensities of the two inputs, the response to the combined inputs (black curve) is less than the response to input 1 alone (i.e., cross-modal suppression), even though input 2 alone is excitatory. This results from the contribution of input 2 to the normalization pool. Panels a-c modified with permission from REF. .

References

    1. Faisal AA, Selen LP, Wolpert DM. Noise in the nervous system. Nat Rev Neurosci. 2008;9:292–303. - PMC - PubMed
    1. Gepshtein S. Two psychologies of perception and the prospect of their synthesis. Philosophical Psychology. 2010;23:217–281.
    1. Knill DC, Pouget A. The Bayesian brain: the role of uncertainty in neural coding and computation. Trends Neurosci. 2004;27:712–9. - PubMed
    1. Einstein A. Über das Relativitätsprinzip und die aus demselben gezogenen Folgerungen. Jahrbuch der Radioaktivität und Elektronik. 1907;4:411–462.
    1. Angelaki DE, Shaikh AG, Green AM, Dickman JD. Neurons compute internal models of the physical laws of motion. Nature. 2004;430:560–4. - PubMed

Highlighted references

    1. Knill DC, Pouget A. The Bayesian brain: the role of uncertainty in neural coding and computation. Trends Neurosci. 2004;27:712–9. A concise review that provides a good introduction to the idea of sensory uncertainty and the Bayesian perspective on behavior and neural coding, including the incorporation of priors and studies of motor control.

    1. Landy MS, Maloney LT, Johnston EB, Young M. Measurement and modeling of depth cue combination: in defense of weak fusion. Vision Res. 1995;35:389–412. Focusing on the array of visual cues available for the perception of depth, this paper develops several key ideas underlying contemporary ideal observer models of cue integration, while also introducing a psychophysical procedure that has become a standard method for testing such models.

    1. Ernst MO, Banks MS. Humans integrate visual and haptic information in a statistically optimal fashion. Nature. 2002;415:429–33.. One of the earliest and clearest psychophysical demonstrations of optimal cue integration across separate sensory modalities. The authors showed that human subjects integrate vision and touch to estimate the width of a grasped object, taking into account the relative reliability of the cues and combining them to improve their performance. Importantly, cue reliability was varied randomly from trial to trial, suggesting that the brain may not need to explicitly learn or represent the uncertainty of the cues to accomplish the task.

    1. Gu Y, Angelaki DE, DeAngelis GC. Neural correlates of multisensory cue integration in macaque MSTd. Nat Neurosci. 2008;11:1201–10. Using a visual-vestibular heading discrimination task, this study showed that monkeys, like humans, are capable of combining sensory cues to improve perceptual performance. The authors also characterized a population of neurons in extrastriate visual cortex (MSTd) that could underlie the behavior.

    1. Meredith M, Stein B. Visual, auditory, and somatosensory convergence on cells in superior colliculus results in multisensory integration. J Neurophysiol. 1986;56:640–62. This paper was among the first to demonstrate the impressive capacity of SC neurons to combine visual, tactile, and auditory cues, yielding multisensory responses that were often considerably enhanced (and sometimes suppressed) relative to unisensory responses. These early observations laid the foundation for the well-known empirical ‘principles’ of multisensory integration (the spatial and temporal principles, inverse effectiveness, etc.).

Publication types

LinkOut - more resources