Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2014 Mar 19;81(6):1240-1253.
doi: 10.1016/j.neuron.2014.02.044.

Multisensory integration: flexible use of general operations

Affiliations
Review

Multisensory integration: flexible use of general operations

Nienke van Atteveldt et al. Neuron. .

Abstract

Research into the anatomical substrates and "principles" for integrating inputs from separate sensory surfaces has yielded divergent findings. This suggests that multisensory integration is flexible and context dependent and underlines the need for dynamically adaptive neuronal integration mechanisms. We propose that flexible multisensory integration can be explained by a combination of canonical, population-level integrative operations, such as oscillatory phase resetting and divisive normalization. These canonical operations subsume multisensory integration into a fundamental set of principles as to how the brain integrates all sorts of information, and they are being used proactively and adaptively. We illustrate this proposition by unifying recent findings from different research themes such as timing, behavioral goal, and experience-related differences in integration.

PubMed Disclaimer

Figures

Figure 1
Figure 1. A schematic representation of the proposed complementary role of canonical integration operations enabling context-dependent integration
Simplified explanation of Phase Resetting (red box) and Divisive Normalization (green box) operations, and how they may complement each other by operating predominantly in different brain areas, time-scales and operation modes. A. Different brain areas. In low-level sensory cortex, such as primary auditory cortex (A1), cross-modal visual inputs are modulatory (they enter outside cortical layer 4 and do not drive action potentials). By resetting the phase of ambient oscillations in A1, they do change the probability that an appropriately timed excitatory (auditory) input will depolarize neurons above threshold to generate action potentials. It is therefore likely that Phase-Resetting represents a common operation for how multisensory cues interact in low-level sensory cortices. Divisive Normalization models describe interaction of two or more excitatory inputs. For multisensory integration, this operation seems therefore optimized for brain areas that receive converging excitatory multisensory inputs, such as Superior Temporal Polysensory (STP) area in the macaque monkey (of which the Superior Temporal Sulcus (STS) may be the human homologue). B. Different time scales. PR can occur at all time-scales, but many task-related modulations occur at lower frequencies, such as delta (around 1.5 Hz) and theta (around 7 Hz) (e.g. Schroeder & Lakatos, 2009). The suppressive divisive denominator in the DN operation may in part be mediated by fast-spiking interneurons that produce gamma-range (>30 Hz) oscillations. DN therefore seems appropriate for operating at a fast time-scale. C. Different operation modes. When relevant inputs are predictable in time, the brain assumedly uses a “rhythmic” mode (Schroeder & Lakatos, 2009) where neuronal excitability cycles at low frequencies. PR of these low-frequency oscillations, e.g. by a cross-modal modulatory input, synchronizes high-excitability phases of the oscillations with the anticipated timing of relevant inputs. In the absence of predictable input, the brain is thought to operate in a “continuous mode”. In this mode, gamma-range oscillations are enhanced continuously, along with suppression of lower frequency power to avoid relatively long periods of weaker excitability. As the DN operation likely operates within gamma-cycles it can be used in this mode to continuously facilitate multisensory integration. N.B., in the “rhythmic” mode, gamma amplitude is coupled to the phase of the theta/delta oscillations, so DN may be active during the high-excitability phase of the lower frequency oscillation.
Figure 2
Figure 2. Evidence in the macaque (A) and human (B) brain for cross-modal phase reset as a mechanism for predictive integration
A) Effect of somatosensory-auditory SOA on the bimodal response. (left) The colormap shows an event related current source density (CSD) response from the site of a large current sink in the supragranular layers of macaque area AI, for different somatosensory-auditory SOAs. CSD is an index of the net synaptic responses (transmembrane currents) that lead to action potentials that lead to action potential generation (indexed by the concomitant multiunit activity, MUA signal) in the local neuronal ensemble. Increasing SOAs are mapped to the y-axis from top to bottom, with 0 on top corresponding to simultaneous auditory-somatosensory stimulation. AU in the bottom represents the auditory alone condition. Red dotted lines denote the 20-80 ms time interval for which we averaged the CSD and MUA in single trials for quantification (right) in which we represent mean CSD and MUA amplitude values (x-axis) for the 20-80 ms auditory post-stimulus time interval (error-bars show standard error) with different somatosensory-auditory SOAs (y-axis). Stars denote the number of experiments (out of a total of 6) for which at a given SOA the bimodal response amplitude was significantly different from the auditory. Peaks in the functions occur at ∼ SOAs of 27, 45, 114, and 976 msec, which correspond to the periods of oscillations in the gamma (30-50 Hz), beta (14-25 Hz), theta (4-8 Hz) and delta (1-3 Hz) ranges that are phase-reset (and thus aligned over trials) by the initial somatosensory input. As CSD and concomitant MUA increases signify increases in local neuronal excitation, these findings illustrate how the phase reset of ongoing oscillatory activity in A1 predictively prepares local neurons to respond preferentially to auditory inputs with particular timing relationships to the somatosensory (resetting) input. (Reprinted from Lakatos et al., Neuron 2007). B) Sound-induced (cross-modal) phase locking of alpha-band oscillations in human occipital cortex and visual cortex excitability. (left) Phase-dynamics in EEG at alpha frequency over posterior recording sites in response to a brief sound (incidence of preferred phase at 100ms post-sound from 0 to 300ms after sound-onset). This EEG alpha-phase dynamics correlated with (right) sound-induced cycling of visual cortex excitability over the first 300ms after sound onset as tested through phosphene perception rate in response to single occipital transcranial magnetic stimulation pulses. These findings illustrate co-cycling of perception with underlying perceptually relevant oscillatory activity at identical frequency, here in the alpha-range (around 10Hz) (Adapted from Romei et al., 2012). Both A and B support the notion that a sensory input can reset the phase of ongoing oscillations in cortical areas specialized to process another modality, and thereby can facilitate processing at certain periodic intervals and suppress processing at the intervals in-between. With this mechanism, a cross-modal input can reset oscillations to enhance processing specifically at times that relevant input is predicted.
Figure 3
Figure 3. Different phase-resetting events during a conversation at a cocktail party, and their effects in low-level sensory cortices
A cocktail party is a good example situation where high flexibility of cue interaction is important for optimal perception and behavior. The rhythmic mode, and hence phase-reset, dominates because of the many rhythmic elements in audiovisual speech. When entering a cocktail party, one first actively explores the scene visually (A). When one speaker is attended (B), the brain's attention system orchestrates the entrainment of ongoing oscillations in low-level sensory cortices to optimally process the relevant speech stream (in red) and visual gestures (person in highlighted square). This guides stimulus-driven entrainment (C), the temporal structure of the acoustic input is being tracked in the auditory cortex (AC), and this process is facilitated by predictive visual cues (D). In parallel, transients in the speech acoustics may also phase-reset oscillatory activity in visual cortex (VC). A. During active visual exploration, eye movements produce internal motor cues that reset low-frequency oscillations in VC to prepare the visual processing system for incoming visual information (Ito et al., 2011; Melloni et al., 2009; Rajkai et al., 2008). The anatomical origins of the motor-related phase-resetting cues are uncertain, but plausible candidates are efference copies from the oculomotor system [pontine reticular formation and/or extraocular muscles, see (Ito et al., 2011)] or a corollary discharge route through the superior colliculuc (SC), thalamus and Frontal Eye Fields (FEF), see (Melloni et al., 2009). It is also possible that saccades and the corollary activity are both generated in parallel by attention (Melloni et al., 2009; Rajkai et al., 2008). B. Selective attention orchestrates phase-resetting of oscillations in auditory and visual cortices [e.g. (Lakatos et al., 2008)]. The anatomical origins of the attentional modulatory influence again is not certain, but two plausible candidate mechanisms are cortico-cortical (through ventral prefrontal cortex(vPFC)/FEF) and cortico-thalamic-cortical (reticular nucleus and non-specific matrix) pathways. C. External cross-modal cues can influence processing in low-level sensory cortices by resetting oscillations. Different anatomical pathways are possible for this cross-modal phase-resetting. For example, sensory cortices can influence each other through direct (lateral) anatomical connections [e.g. (Falchier et al., 2002)], or through feedforward projections from nonspecific (Hackett et al., 2007; Lakatos et al., 2007) or higher order (Cappe et al., 2007) thalamic nuclei. D. The cross modal (visual-auditory) phase reset is predictive in that visual gestures in AV speech reliably precede the related vocalizations. Cocktail party image: iStock. Cross-modal timing figure in D reprinted from Schroeder et al., 2008.

Similar articles

Cited by

References

    1. Altieri N, Stevenson R, Wallace M, Wenger M. Learning to Associate Auditory and Visual Stimuli: Behavioral and Neural Mechanisms. Brain Topography. 2013 Epub ahead of print. - PMC - PubMed
    1. Alvarado J, Rowland B, Stanford T, Stein B. A neural network model of multisensory integration also accounts for unisensory integration in superior colliculus. Brain Research. 2008;1242:13–23. - PMC - PubMed
    1. Alvarado J, Vaughan J, Stanford T, Stein B. Multisensory versus unisensory integration: contrasting modes in the superior colliculus. Journal of Neurophysiology. 2007;97:3193–3205. - PubMed
    1. Arnold DH, Yarrow K. Temporal recalibration of vision. Proceedings of the Royal Society B: Biological Sciences. 2011;278:535–538. - PMC - PubMed
    1. Bach-y-Rita P, W Kercel S. Sensory substitution and the human-machine interface. Trends in Cognitive Sciences. 2003;7:541–546. - PubMed

Publication types