Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Mar;7(3):397-410.
doi: 10.1038/s41562-022-01507-3. Epub 2023 Jan 16.

A modality-independent proto-organization of human multisensory areas

Affiliations

A modality-independent proto-organization of human multisensory areas

Francesca Setti et al. Nat Hum Behav. 2023 Mar.

Abstract

The processing of multisensory information is based upon the capacity of brain regions, such as the superior temporal cortex, to combine information across modalities. However, it is still unclear whether the representation of coherent auditory and visual events requires any prior audiovisual experience to develop and function. Here we measured brain synchronization during the presentation of an audiovisual, audio-only or video-only version of the same narrative in distinct groups of sensory-deprived (congenitally blind and deaf) and typically developed individuals. Intersubject correlation analysis revealed that the superior temporal cortex was synchronized across auditory and visual conditions, even in sensory-deprived individuals who lack any audiovisual experience. This synchronization was primarily mediated by low-level perceptual features, and relied on a similar modality-independent topographical organization of slow temporal dynamics. The human superior temporal cortex is naturally endowed with a functional scaffolding to yield a common representation across multisensory events.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Experimental conditions, computational modelling and analytical pipeline.
a, In the first experiment, the neural correlates of a full audiovisual (AV) stimulus were studied in a sample of TD participants to examine how the brain processes multisensory information. In a second experiment, two unimodal versions of the same movie (that is, visual only (V) and auditory only (A)) were presented and the similarity across visually and auditory-evoked brain responses (A versus V) was assessed in two samples of TD participants. In a third experiment, we tested the role of audiovisual experience for the emergence of these shared neural representations by measuring the similarity of brain responses elicited across congenitally SD individuals (that is, blind and deaf participants). b, A brief description of the features extracted through computational modelling from the movie. Movie-related features fall into two categories: (1) low-level acoustic (for example, spectral and sound envelope properties to account for frequency- and amplitude-based modulations) and visual features (for example, set of static Gabor-like filters and motion energy information based on their spatiotemporal integration); and (2) high-level semantic descriptors (for example, manual annotation of both visual and sound-based natural and artificial categories and word embedding features; for further details, see Supplementary Information). In c, we show the results of a continuous wavelet transform analysis applied to the movie acoustic and visual signals to evaluate the existence of collinearities across the low-level features of the two sensory streams. Results show the presence of cross-modal correspondences, with hundreds of highly coherent events (white marks) distributed along the time course of the movie (x axis), lasting from a few tenths of a second to several minutes (y axis).
Fig. 2
Fig. 2. ISC results in TD participants.
a,b, ISC in TD participants in the AV (a) and A versus V (b) conditions respectively (P < 0.05, one-tailed, FWEc, minimum cluster size of 20 adjacent voxels). c, Conjunction analysis of the two aforementioned experimental conditions.
Fig. 3
Fig. 3. ISC results in SD participants.
a, ISC results from the across-modality A versus V in SD participants (P < 0.05, one-tailed, FWEc, minimum cluster size of 20 adjacent voxels). In the matrix on the right, the raw ISC across TD and SD individuals was reported for the A-only and V-only conditions. ISC (Pearson’s r coefficient) was extracted from a region of interest (6 mm radius) centred within the left STG, around the synchronized peak of the first experiment. White cells indicated subject pairings below the significant threshold (uncorrected P < 0.05, one-tailed). b, ISC for the A versus V condition was compared between TD and SD participants within the six brain regions obtained from the conjunction analysis (Wilcoxon rank sum test, two-tailed, P < 0.05 Bonferroni corrected for the number of regions). All regions except the bilateral posterior parietal cortex retained a significantly greater ISC in TD than SD individuals (left temporo-parietal: W = 11,926, PBonf < 0.001, NTD = 100, NSD = 81, rTD-SD = 0.029, standard error (SE) 0.003; bilateral posterior parietal: W = 9,961, PBonf = 0.085, NTD = 100, NSD = 81, rTD-SD = 0.007, SE 0.003; right temporo-parietal: W = 11,077, PBonf < 0.001, NTD = 100, NSD = 81, rTD-SD = 0.015, SE 0.002; right dorso-lateral prefrontal: W = 10,452, PBonf = 0.001, NTD = 100, NSD = 81, rTD-SD = 0.010, SE 0.002; bilateral medial prefrontal: W = 11015, PBonf < 0.001, NTD = 100, NSD = 81, rTD-SD = 0.015, SE 0.002; left inferior frontal: W = 11,913, PBonf < 0.001, NTD = 100, NSD = 81, rTD-SD = 0.028, SE 0.003). Average ISC (with SE) in the AV condition is shown, by a shaded area in rose, as a ceiling effect due to multisensory integration. Transparency is applied to indicate that the group ISC was not significant (NS, P > 0.05) compared with a null distribution. In each box, the dark line represents the sample mean and the dark-grey shaded box the 95% confidence interval of the SE of the mean, while the light-grey shaded box indicates the standard deviation.
Fig. 4
Fig. 4. Impact of perceptual and semantic features on ISC.
a,b, Model-mediated ISC across TD and SD participants in the A versus V condition for the low-level model, based on the movie acoustic and visual properties (P < 0.05, one-tailed, FWEc, minimum cluster size of 20 adjacent voxels). c,d, Model-mediated ISC across TD and SD individuals in the A versus V condition for the high-level model, based on semantic features (that is, categorical information and GPT-3 features, P < 0.05, one-tailed FWEc, minimum cluster size of 20 adjacent voxels). e,f, Results of a Wilcoxon signed rank test comparing the low- and high-level models in TD and SD participants separately (P < 0.05, two-tailed, FWEc). g,h, ISC for the groups of TD and SD individuals in the A versus V condition during the processing of the scrambled movie (P < 0.05, one-tailed, FWEc, minimum cluster size of 20 adjacent voxels). Note that only the temporo-parietal cortex is mapped, since we do not find any significant results in frontal areas for the SD group in any of the explored conditions.
Fig. 5
Fig. 5. TRWs.
ac, TRW results in the three experimental conditions (a, multisensory AV condition in TD; b, across-modality in TD; c, across-modality in SD) using windows from 2 s to 240 s. On the left, flat brain maps show the temporal peak when ISC is maximal. Matrices on the right indicate the overall synchronization profile across voxels which survived the statistical threshold (P < 0.05, one-tailed, FWEc, minimum cluster size of 20 adjacent voxels) in the A versus V condition in SD participants. The voxels were sorted according to the peak of the TRWs in the SD condition. Voxel order was then kept constant across the other two experiments. Pixel intensity depicts the normalized ISC (scaling the maximum to one). Red dotted lines represent the interpolated position of maximal peaks across ordered voxels. Only a few voxels presented responses characterized by multiple peaks, whereas the majority demonstrated a clear preference for a specific temporal window.

References

    1. Stein BE, Stanford TR. Multisensory integration: current issues from the perspective of the single neuron. Nat. Rev. Neurosci. 2008;9:255–266. doi: 10.1038/nrn2331. - DOI - PubMed
    1. Beauchamp MS, et al. Integration of auditory and visual information about objects in superior temporal sulcus. Neuron. 2004;41:809–823. doi: 10.1016/S0896-6273(04)00070-4. - DOI - PubMed
    1. Hocking J, Price CJ. The role of the posterior superior temporal sulcus in audiovisual processing. Cereb. Cortex. 2008;18:2439–2449. doi: 10.1093/cercor/bhn007. - DOI - PMC - PubMed
    1. Lewkowicz DJ, Turkewitz G. Cross-modal equivalence in early infancy: auditory–visual intensity matching. Dev. Psychol. 1980;16:597–607. doi: 10.1037/0012-1649.16.6.597. - DOI
    1. Hillock-Dunn A, Wallace MT. Developmental changes in the multisensory temporal binding window persist into adolescence. Dev. Sci. 2012;15:688–696. doi: 10.1111/j.1467-7687.2012.01171.x. - DOI - PMC - PubMed

Publication types