Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Nov 18;19(11):e3001465.
doi: 10.1371/journal.pbio.3001465. eCollection 2021 Nov.

Attention controls multisensory perception via two distinct mechanisms at different levels of the cortical hierarchy

Affiliations

Attention controls multisensory perception via two distinct mechanisms at different levels of the cortical hierarchy

Ambra Ferrari et al. PLoS Biol. .

Update in

Abstract

To form a percept of the multisensory world, the brain needs to integrate signals from common sources weighted by their reliabilities and segregate those from independent sources. Previously, we have shown that anterior parietal cortices combine sensory signals into representations that take into account the signals' causal structure (i.e., common versus independent sources) and their sensory reliabilities as predicted by Bayesian causal inference. The current study asks to what extent and how attentional mechanisms can actively control how sensory signals are combined for perceptual inference. In a pre- and postcueing paradigm, we presented observers with audiovisual signals at variable spatial disparities. Observers were precued to attend to auditory or visual modalities prior to stimulus presentation and postcued to report their perceived auditory or visual location. Combining psychophysics, functional magnetic resonance imaging (fMRI), and Bayesian modelling, we demonstrate that the brain moulds multisensory inference via two distinct mechanisms. Prestimulus attention to vision enhances the reliability and influence of visual inputs on spatial representations in visual and posterior parietal cortices. Poststimulus report determines how parietal cortices flexibly combine sensory estimates into spatial representations consistent with Bayesian causal inference. Our results show that distinct neural mechanisms control how signals are combined for perceptual inference at different levels of the cortical hierarchy.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Bayesian Causal Inference and the possible roles of attentional control.
(a) Generative models of Forced Fusion and Bayesian Causal Inference. For Forced Fusion, a single source generates auditory and visual signals. Bayesian Causal Inference explicitly models the two causal structures, i.e., whether auditory and visual signals come from one common cause (C = 1) or from separate causes (C = 2). (b) During perceptual inference, the observer is thought to invert the generative models; it infers the number of sources by combining prior knowledge and audiovisual evidence. A Forced Fusion estimate is computed by averaging auditory and visual estimates alone with prior spatial estimates weighted by their relative reliabilities (inverse sensory variance σ2). The full segregation estimates, visual or auditory, are computed separately. To account for causal uncertainty, the final Bayesian Causal Inference estimate, auditory (S^A) or visual (S^V), is computed by combining the audiovisual Forced Fusion estimate (S^AV,C=1) with the task-relevant full segregation estimate, auditory (S^A,C=2) or visual (S^V,C=2), each weighted by the posterior probabilities of a common (C = 1) or independent (C = 2) causes. (c) Attentional control can mould multisensory perceptual inference via two distinct mechanisms and thereby induce differences in observers’ auditory and visual estimates. First, attending to a particular sensory modality may enhance the reliability of the signals in the attended sensory modality and thereby their weights during Forced Fusion. Second, modality-specific report (i.e., task relevance) determines the late readout consistent with the principles of Bayesian Causal Inference, i.e., whether the Forced Fusion estimate is combined with the auditory or visual full segregation estimate.
Fig 2
Fig 2. Experimental design and procedure, neuroimaging univariate results, and response times in the fMRI experiment.
(a) The experiment conformed to a 3 (auditory location) × 3 (visual location) × 2 (prestimulus attention: attA, attV) × 2 (poststimulus report: repA, repV) factorial design (A for auditory and V for visual). Auditory and visual signals were independently sampled from 3 locations along the azimuth (−9°, 0°, and 9° visual angle), resulting in 9 audiovisual spatial combinations with 3 levels of spatial disparity: none (0°; dark grey); low (9°; mid grey); and high (18°; light grey). The orthogonal pre- and postcue attention cueing paradigm resulted in two valid (attArepA; attVrepV) and two invalid (attVrepA; attArepV) conditions. (b) Prior to block start, participants were cued to attend to either the auditory or visual signal (via colour of fixation cross); 350 ms after each audiovisual stimulus, they were cued to report their perceived auditory or visual location (via coloured letter: A for auditory and V for visual). Participants responded via a button press using different keypads for each sensory modality. (c) Increased activations for invalid relative to valid trials [Invalid (attVrepA & attArepV) > Valid (attArepA & attVrepV)] in blue, for AV spatially incongruent relative congruent stimuli [AVincongruent (AV disparity ≠ 0°) > AVcongruent (AV disparity = 0°)] in red and their overlap in pink, rendered on an inflated canonical brain (p < 0.001 uncorrected at peak level for visualisation purposes, extent threshold k > 0 voxels). (d) Across participants’ mean (±SEM) parameter estimates in arbitrary units from L SFG (x = −4, y = 8, and z = 52) and L ACC (x = −10, y = 18, and z = 32). (e) Across participants’ mean (±SEM) response times. Data in d and e plotted as a function of (i) prestimulus attention: auditory attA versus visual attV; (ii) poststimulus report: auditory repA versus visual repV; and (iii) audiovisual spatial (in)congruency: AVincongruent (AV disparity ≠ 0°) versus AVcongruent (AV disparity = 0°). The data used to make this figure are available in S1 and S2 Datas. ACC, anterior cingulate cortex; AIns, anterior insula; IFG, inferior frontal gyrus; IPS, intraparietal sulcus; L ACC, left anterior cingulate gyrus; L SFG, left superior temporal gyrus; SFG, superior temporal gyrus; SPL, superior parietal lobule.
Fig 3
Fig 3. Audiovisual weight index (wAV) and Bayesian modelling results for the fMRI experiment.
(a) Across participants’ mean wAV (±SEM) shown as a function of (i) prestimulus attention: auditory attA versus visual attV; (ii) poststimulus report: auditory repA versus visual repV; and (iii) AV spatial disparity: low dispL (9°) versus high dispH (18°). wAV = 1 for purely visual and wAV = 0 for purely auditory influence. (b) Along the first factor of a 2 × 3 factorial model space, we assessed the influence of prestimulus attention by comparing whether the sensory variances were (i) constant (fixed: σAattA2 = σAattV2, σVattA2 = σVattV2); or (ii) different (free: σAattA2, σAattV2, σVattA2, σVattV2) across prestimulus attention. Along the second factor, we assessed the influence of poststimulus report by comparing (i) a forced fusion model in which the sensory variances were fixed (FF fixed: σArepA2 = σArepV2, σVrepA2 = σVrepV2); (ii) a forced fusion model in which the sensory variances were allowed to differ between auditory and visual report (FF free: σArepA2, σArepV2, σVrepA2, σVrepV2); and (iii) a BCI model in which the influence of poststimulus report arises via a late flexible readout. The matrix represents our 2 × 3 model space. For each model, we show the pEP (larger pEP represents better model) via greyscale. BOR represents the probability that results are due to chance. (c) Across participants’ mean (±SEM) of auditory and visual noise parameter estimates (i.e., σAattA2, σAattV2, σVattA2, σVattV2) of the best model, i.e., BCI model with free prestimulus attention parameters (attA, auditory; attV, visual). p-Values based on one-tailed sign permutation test. The data used to make this figure are available in S2 Data. BCI, Bayesian causal inference; BOR, Bayesian omnibus risk; FF, Forced Fusion; pEP, protected exceedance probability.
Fig 4
Fig 4. Neural audiovisual weight index (nwAV) across the audiovisual processing hierarchy.
(a) fMRI voxel response patterns were obtained from anatomical ROIs along the visual and auditory dorsal cortical hierarchies: V1-3 (blue), pIPS (cyan), aIPS (green), and hA (orange). ROIs are displayed on a canonical brain. (b) An SVR model was trained to learn the mapping from the fMRI voxel response patterns to the external spatial locations based on the audiovisual spatially congruent trials (green cells = congruent). The learnt mapping was then used to decode the spatial location from the fMRI voxel response patterns of the audiovisual spatially incongruent trials (orange cells = incongruent) to compute nwAV. (c) Across participants’ mean nwAV (±SEM) shown as a function of (i) prestimulus attention (Att): auditory/attA versus visual/attV; and (ii) poststimulus report (Rep): auditory/repA versus visual/repV, with statistical results of sign permutation tests. nwAV = 1 for purely visual and nwAV = 0 for purely auditory influence. The data used to make this figure are available in S2 Data. ** p < 0.01, * p < 0.05. aIPS, anterior intraparietal sulcus; hA, higher-order auditory cortex; pIPS, posterior intraparietal sulcus; ROI, region of interest; SVR, support vector regression; V1-3, low-level visual cortex.

Update of

References

    1. Alais D, Burr D. The ventriloquist effect results from near-optimal bimodal integration. Curr Biol. 2004;14(3):257–62. doi: 10.1016/j.cub.2004.01.029 - DOI - PubMed
    1. Ernst MO, Banks MS. Humans integrate visual and haptic information in a statistically optimal fashion. Nature. 2002;415(6870):429–33. doi: 10.1038/415429a - DOI - PubMed
    1. Ernst MO, Bülthoff HH. Merging the senses into a robust percept. Trends Cogn Sci. 2004;8(4):162–9. doi: 10.1016/j.tics.2004.02.002 - DOI - PubMed
    1. Fetsch CR, Pouget A, DeAngelis GC, Angelaki DE. Neural correlates of reliability-based cue weighting during multisensory integration. Nat Neurosci. 2012;15(1):146–54. - PMC - PubMed
    1. Fetsch CR, DeAngelis GC, Angelaki DE. Bridging the gap between theories of sensory cue integration and the physiology of multisensory neurons. Nat Rev Neurosci. 2013;14(6):429–42. doi: 10.1038/nrn3503 - DOI - PMC - PubMed

Publication types