Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Feb 7;97(3):640-655.e4.
doi: 10.1016/j.neuron.2017.12.034. Epub 2018 Jan 26.

Integration of Visual Information in Auditory Cortex Promotes Auditory Scene Analysis through Multisensory Binding

Affiliations

Integration of Visual Information in Auditory Cortex Promotes Auditory Scene Analysis through Multisensory Binding

Huriye Atilgan et al. Neuron. .

Abstract

How and where in the brain audio-visual signals are bound to create multimodal objects remains unknown. One hypothesis is that temporal coherence between dynamic multisensory signals provides a mechanism for binding stimulus features across sensory modalities. Here, we report that when the luminance of a visual stimulus is temporally coherent with the amplitude fluctuations of one sound in a mixture, the representation of that sound is enhanced in auditory cortex. Critically, this enhancement extends to include both binding and non-binding features of the sound. We demonstrate that visual information conveyed from visual cortex via the phase of the local field potential is combined with auditory information within auditory cortex. These data provide evidence that early cross-sensory binding provides a bottom-up mechanism for the formation of cross-sensory objects and that one role for multisensory binding in auditory cortex is to support auditory scene analysis.

Keywords: attention; auditory cortex; auditory-visual; binding; cross-modal; ferret; multisensory; sensory cortex; visual cortex.

PubMed Disclaimer

Figures

None
Graphical abstract
Figure 1
Figure 1
Hypothesis and Experimental Design (A) Conceptual model illustrating how binding can be identified as a distinct form of multisensory integration. Multisensory binding is defined as a subset of multisensory integration that results in the formation of a crossmodal object. During binding, all features of the audio-visual object are linked and enhanced, including both those features that bind the stimuli across modalities (here temporal coherence between auditory [A] intensity and visual [V] luminance) and orthogonal features such as auditory pitch and timbre, and visual color and size. Other forms of multisensory integration would result in enhancement of only the features that promote binding—here auditory intensity and visual luminance. To identify binding therefore requires a demonstration that non-binding features (e.g., here pitch, timbre, color, or size) are enhanced. Enhanced features are highlighted in yellow. (B) When two competing sounds (red and blue waveforms) are presented, they can be separated on the basis of their features but may elicit overlapping neuronal representations in auditory cortex. (C) Hypothesized enhancement in auditory stream segregation when a temporally coherent visual stimulus enables multisensory binding. When the visual stimulus changes coherently with the red sound (A1, top), this sound is enhanced and the two sources are better segregated. Perceptually this would result in more effective auditory scene analysis and an enhancement of the non-binding features. (D) Stimulus design: auditory stimuli were two artificial vowels (denoted A1 and A2), each with distinct pitch and timbre and independently amplitude modulated with a noisy low pass envelope. (E) Visual stimulus: a luminance modulated white light was presented with one of two temporal envelopes derived from the amplitude modulations of A1 and A2. (F) The stimulus combinations that were tested experimentally in single-stream (a single auditory visual pair) and dual-stream (two sounds and one visual stimulus) conditions. See also Figure S1.
Figure 2
Figure 2
Visual Stimuli Can Determine Which Sound Stream Auditory Cortical Neurons Follow in a Mixture (A and B) Spiking responses from an example unit in response to (A) single-stream auditory-visual stimuli used as decoding templates and (B) dual-stream stimuli. In each case, rasters and peristimulus time histograms (PSTHs) are illustrated, color coded according to their auditory-visual (A) or visual (B) identity. When the visual component of the dual stream was V1, the majority of trials were classified as A1V1 (82%, 19/23 trials) and whereas when the visual stimulus was V2, only 26% (6/23 trials) were classified as A1V1. (see also green data point in C), yielding a visual preference score of 56%. (C–H) Population data for awake (C–E 271 units) and anesthetized (F–H 331 units) datasets. In each case, the left panels (C and F) show the distribution of decoding values according to the visual condition, with units in which the VPI was significantly >0 colored purple, whereas those with a VPI value statistically indistinguishable from 0 are colored gray. The middle panels (D and G) show the population mean (±SEM) projecting onto the vertical axis of (C) and (F) for V1 condition and horizontal axis of (C) and (F) for the V2 condition (with purple lines showing data for units with significant VPI values). (E) and (H) show the visual preference index (VPI) color coded according to whether these values were significantly >0. Pairwise comparisons revealed a significant effect of visual condition on decoding in all datasets: awake: All: t540 = 6.1, p = 2.3e-09 (n = 271), Sig VPI: t180 = 18.8 p = 2.0e-44 (n = 91); anesthetized: All: t660 = 9.5, p = 3.3e-20 (n = 331), Sig. VPI: t348 = 38.9, p = 1.2e-128 (n = 175). See also Figures S2–S4.
Figure 3
Figure 3
Visual Stimuli Shape the Neural Representation of an Auditory Scene (A and B) In an additional control experiment (n = 89 units recorded in awake animals), the responses to coherent auditory-visual and auditory-only (A Only) single-stream stimuli were used as templates to decode dual-stream stimuli either accompanied by visual stimuli (V1/V2) or in the absence of visual stimulation (no visual). Shown are spiking responses from an example unit in response to (A) single-stream auditory stimuli that were used as decoding templates to decode the responses to dual-stream stimuli in (B); in each case, the auditory waveform, rasters, and PSTHs are shown. In this example, when decoded with auditory-visual templates: 79% (22/28) of responses were classified as A1 when the visual stimulus was V1, and 32% of responses (9/28) were classified as A1 when the visual stimulus was V2, yielding a VPI score of 47%. When decoded with A-only templates, the values were 75% when V1 (22/28) and 35% when V2 (10/28), yielding a VPI of 40%. For comparison, the auditory-only condition (A12) is shown in green. (C and D) Population data showing the proportion of responses classified as A1 when the visual stimulus was V1 or V2 when decoded with auditory-only templates (C) or auditory-visual templates (D). (E and F) Resulting VPI scores from auditory-only decoding (E) or auditory-visual decoding (F). (G) Mean (±SEM) values for these units when decoded with A-only templates, auditory-visual templates (as in Figure 2), or in the absence of a visual stimulus. The green data point in (C) and (D) depicts the example in (A) and (B).
Figure 4
Figure 4
Temporally Coherent Changes in Visual Luminance and Auditory Intensity Enhance the Representation of Auditory Timbre (A) Example unit response (from the awake dataset) showing the influence of visual temporal coherence on spiking responses to dual-stream stimuli with (red PSTH) or without (black PSTH) timbre deviants. (B and C) Timbre deviant discrimination in the awake dataset. Two deviants were included in each auditory stream giving a possible maximum of 4 per unit (B), histogram showing the number of deviants (out of 4) that could be discriminated from spiking responses (C), and boxplots showing the timbre deviant discrimination scores in the single-stream condition across different visual conditions (Coh: coherent, ind: independent). The boxes show the upper- and lower-quartile values, the horizontal lines indicate the median, and the whiskers depict the most extreme data points not considered to be outliers (which are marked as individual symbols). (D) Discrimination scores for timbre deviant detection in dual-stream stimuli in awake animals. Discrimination scores are plotted according to the auditory stream in which the deviant occurred and the visual stream that accompanied the sound mixture. V1 stimuli are plotted in red, and V2 stimuli in blue; therefore, the boxplots at the far left and right of the plot represent the cases in which the deviants occurred in an auditory stream that was temporally coherent with the visual stimulus, while the central two boxplots represent the discrimination of deviants occurring in the auditory stream that was temporally independent of the visual stimulus. (E–G) The same as (B)–(D) but for the anesthetized dataset. See also Figure S6.
Figure 5
Figure 5
Auditory-Visual Temporal Coherence Enhances Neural Coding in Auditory Cortex (A–D) A pattern classifier was used to determine whether neuronal responses were informative about auditory or visual stimuli. The responses to single-stream stimuli are shown for two example units, with responses grouped according to the identity of the auditory (A and B, for an auditory discriminating unit) or visual stream (C and D, for a visual discriminating unit). In each case, the stimulus amplitude (A and B)/luminance (C and D) waveform is shown in the top panel with the resulting raster plots and PSTHs below. (E and F) Decoder performance (mean ± SEM) for discriminating stimulus identity (coherent: A1V1 versus A2V2, purple; independent: A1V2 versus A2V1, blue) in auditory and visual classified units recorded in awake (E) and anesthetized (F) ferrets. Pairwise comparisons for decoding of coherent versus independent stimuli (∗∗∗p < 0.001).
Figure 6
Figure 6
Visual Stimuli Elicit Reliable Changes in the Phase of the LFP (A and B) Example LFP responses to single-stream auditory stimuli (A, A1 stream; B, A2 stream) across visual conditions. Data obtained from the recording site at which multi-unit spiking activity discriminated auditory stream identity in Figures 5A and 5B. The amplitude waveforms of the stimuli are shown in the top row, with the evoked LFP underneath (mean across 21 trials). The resulting inter-trial phase coherence (ITPC) values are shown in the bottom two rows, top row showing temporally coherent auditory-visual stimuli, bottom row showing temporally independent auditory-visual stimuli. (C and D) ITPC averaged across stimulus presentation time was calculated for each stimulus separately (C, A1V1 and A1V2; D, A2V2 and A2V1) and for trials with a randomly selected visual stimulus (ITPC across). (E and F) Single-stream phase dissimilarity values (PDI) were calculated by comparing ITPC within values to the ITPC across null distributions for each stimulus class (E, A1V1 and A1V2; F, A2V2 and A2V1). (G and H) Population mean ITPC values across frequency for temporally coherent stimuli (G, awake dataset, significant frequencies 10.5–13, 16–20 Hz; H, anesthetized dataset). (I and J) Population mean ITPC values across frequency for temporally independent stimuli (I, awake dataset, significant frequencies 10.5–22 Hz; J, anesthetized dataset, no frequencies significantly different). Dots indicate frequencies at which the ITPC-within values were significantly greater than the ITPC-across values (pairwise t test, α = 0.0012, Bonferroni corrected for 43 frequencies). (K and L) Mean (±SEM) single-stream phase dissimilarity index (PDI) values for coherent and independent stimuli in awake animals (K, significant frequencies 10.5–12.5) and anesthetized (L) animals. Black dots indicate frequencies at which the temporally coherent single-stream PDI is significantly greater than in the independent conditions (p < 0.001). (M and N) Mean (±SEM) dual-stream PDI values for awake (M, significant frequencies 10.5–12.5) and anesthetised (N) datasets.
Figure 7
Figure 7
Visual-Stimulus-Induced LFP Phase Changes in Auditory Cortex Are Mediated by Visual Cortex (A) Schematic showing the location of auditory cortical recording sites and the location of a cooling loop (black, gray line marks the 500-μm radius over which cooling is effective; Wood et al., 2017), which was used to inactivate visual cortex. Individual recording sites contributing to (C)–(N) are shown with stars (simultaneous recordings are marked in the same color). (B) Spike rate responses in auditory cortex (top row) and visual cortex (bottom row, sites >500 μm from the loop) in response to noise bursts or light flashes before, during, and after cooling. (C and D) Inter-trial phase coherence values (mean ± SEM) for the coherent (C) and independent (D) auditory-visual stimuli recorded in auditory cortex (AC) prior to cooling visual cortex (VC) compared to the shuffled null distribution (inter-trial phase coherence across). Asterisks indicate the frequencies at which the inter-trial phase coherence values are significantly different from the shuffled inter-trial phase coherence-across distribution. (E) Single-stream phase dissimilarity index values calculated from the inter-trial phase coherence values in (C) and (D). (F–H) As in (C)–(E) but while visual cortex was cooled to 9 degrees. (I–N) As in (C)–(H) but for sites in visual cortex >500 μm from the cooling loop. (C)–(H) include data from 83 sites from 6 electrode penetrations; (I)–(N) include data from 47 sites from five penetrations.

References

    1. Antunes F.M., Malmierca M.S. Effect of auditory cortex deactivation on stimulus-specific adaption in the medial geniculate body. J Neurosci. 2011;31:17306–17316. - PMC - PubMed
    1. Azouz R., Gray C.M. Cellular mechanisms contributing to response variability of cortical neurons in vivo. J. Neurosci. 1999;19:2209–2223. - PMC - PubMed
    1. Berens P. CircStat: A MATLAB toolbox for circular statistics. J. Stat. Software. 2009;31 Published online September 23, 2009.
    1. Bizley J.K., Nodal F.R., Bajo V.M., Nelken I., King A.J. Physiological and anatomical evidence for multisensory interactions in auditory cortex. Cereb. Cortex. 2007;17:2172–2189. - PMC - PubMed
    1. Bizley J.K., Walker K.M.M., Silverman B.W., King A.J., Schnupp J.W.H. Interdependent encoding of pitch, timbre, and spatial location in auditory cortex. J. Neurosci. 2009;29:2064–2075. - PMC - PubMed

Publication types