. 2024 Mar 6;44(10):e0870232023.

doi: 10.1523/JNEUROSCI.0870-23.2023.

Attention Drives Visual Processing and Audiovisual Integration During Multimodal Communication

Noor Seijdel¹, Jan-Mathijs Schoffelen², Peter Hagoort^{3

2}, Linda Drijvers^{3

2}

Affiliations

¹ Neurobiology of Language Department - The Communicative Brain, Max Planck Institute for Psycholinguistics, Nijmegen 6525 XD, The Netherlands noor.seijdel@mpi.nl.
² Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, 6525 HT, The Netherlands.
³ Neurobiology of Language Department - The Communicative Brain, Max Planck Institute for Psycholinguistics, Nijmegen 6525 XD, The Netherlands.

PMID: 38199864
PMCID: PMC10919203
DOI: 10.1523/JNEUROSCI.0870-23.2023

Attention Drives Visual Processing and Audiovisual Integration During Multimodal Communication

Noor Seijdel et al. J Neurosci. 2024.

. 2024 Mar 6;44(10):e0870232023.

doi: 10.1523/JNEUROSCI.0870-23.2023.

Authors

Noor Seijdel¹, Jan-Mathijs Schoffelen², Peter Hagoort^{3

2}, Linda Drijvers^{3

2}

Affiliations

¹ Neurobiology of Language Department - The Communicative Brain, Max Planck Institute for Psycholinguistics, Nijmegen 6525 XD, The Netherlands noor.seijdel@mpi.nl.
² Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, 6525 HT, The Netherlands.
³ Neurobiology of Language Department - The Communicative Brain, Max Planck Institute for Psycholinguistics, Nijmegen 6525 XD, The Netherlands.

PMID: 38199864
PMCID: PMC10919203
DOI: 10.1523/JNEUROSCI.0870-23.2023

Abstract

During communication in real-life settings, our brain often needs to integrate auditory and visual information and at the same time actively focus on the relevant sources of information, while ignoring interference from irrelevant events. The interaction between integration and attention processes remains poorly understood. Here, we use rapid invisible frequency tagging and magnetoencephalography to investigate how attention affects auditory and visual information processing and integration, during multimodal communication. We presented human participants (male and female) with videos of an actress uttering action verbs (auditory; tagged at 58 Hz) accompanied by two movie clips of hand gestures on both sides of fixation (attended stimulus tagged at 65 Hz; unattended stimulus tagged at 63 Hz). Integration difficulty was manipulated by a lower-order auditory factor (clear/degraded speech) and a higher-order visual semantic factor (matching/mismatching gesture). We observed an enhanced neural response to the attended visual information during degraded speech compared to clear speech. For the unattended information, the neural response to mismatching gestures was enhanced compared to matching gestures. Furthermore, signal power at the intermodulation frequencies of the frequency tags, indexing nonlinear signal interactions, was enhanced in the left frontotemporal and frontal regions. Focusing on the left inferior frontal gyrus, this enhancement was specific for the attended information, for those trials that benefitted from integration with a matching gesture. Together, our results suggest that attention modulates audiovisual processing and interaction, depending on the congruence and quality of the sensory input.

Keywords: attention; audiovisual integration; magnetoencephalography (MEG); multimodal communication; neural processing; rapid invisible frequency tagging (RIFT).

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interests.

Figures

**Figure 1.**
Experimental paradigm. Participants were asked to attend to one of the videos, indicated by a cue. The attended video was frequency-tagged at 65 Hz, and the unattended video at 63 Hz. Speech was frequency-tagged at 58 Hz. Participants were asked to attentively watch and listen to the videos. After the video, participants were presented with four written options and had to identify which verb they heard in the video by pressing one of 4 buttons on an MEG-compatible button box. This task ensured that participants were attentively watching the videos and was used to check whether the verbs were understood. Participants were instructed not to blink during the video presentation. In addition to the normal trials, “attention trials” were included in which participants were asked to detect a change in brightness.

**Figure 2.**
Verb categorization behavior. A, Accuracy results per condition. Response accuracy is highest for clear speech conditions and when a gesture matches the speech signal. B, RT per condition. RT are faster in clear speech and when a gesture matches the speech signal.

**Figure 3.**
Power at temporal and occipital sensors and corresponding source regions (% increased compared to a poststimulus baseline) averaged across conditions. A, Average ERF for a single subject at selected sensors overlying the left and right temporal lobe. Auditory input was tagged by 58 Hz amplitude modulation. Tagging was phase-locked over trials. ERFs show combined planar gradient data. B, Average ERF for a single subject at selected sensors overlying the occipital lobe. Visual input was tagged by 65 Hz and a 63 Hz flicker. C, Power increase in temporal sensors at the tagged frequency of the auditory stimulus (58 Hz). D, Power increases in occipital sensors are observed at the visual tagging frequencies (63 Hz: unattended; 65 Hz: attended). E, Power increase in the auditory cortex at the tagged frequency of the auditory stimulus (58 Hz). F, Power increases in the visual cortex observed at the visual tagging frequencies (63 Hz, unattended; 65 Hz, attended). The shaded error bars represent the standard error.

**Figure 4.**
Sources of power at the auditory tagged signal at 58 Hz and the visually tagged signals at 65 Hz and 63 Hz. A, Power change in percentage when comparing power values in the stimulus window to a poststimulus baseline for the different tagging frequencies, pooled over conditions. Power change is the largest over temporal regions for the auditory tagging frequency and largest over occipital regions for the visually tagged signals. B, Power change values in percentage extracted from the ROIs. Raincloud plots reveal raw data, density, and boxplots for power change in different conditions. CM, clear speech with a matching gesture; CMM, clear speech with a mismatching gesture; DM, degraded speech with a matching gesture; DMM, degraded speech with a mismatching gesture.

**Figure 5.**
Power at the intermodulation frequencies (f_visual − f_auditory). A, Power over left frontal sensors (% increased compared to a poststimulus baseline). B, Power over LIFG source region (% increased compared to a poststimulus baseline). C, Sources of power at 7 Hz. D, Power change values in percentage extracted from the LIFG in source space. Raincloud plots reveal raw data, density, and boxplots for power change per condition.

**Figure 6.**
A, Power over LIFG source region (% increased compared to the mismatching gesture conditions). The shaded error bars represent the standard error. B, Power was higher in the CM condition compared to the CMM condition across the temporal lobe. Comparing DM and DMM, we observed enhanced activity in LIFG, left parietal regions, and occipital cortex.

See this image and copyright information in PMC

References

1. Ahmed F, Nidiffer AR, O’Sullivan AE, Zuk NJ, Lalor EC (2023) The integration of continuous audio and visual speech in a cocktail-party environment depends on attention. Neuroimage 274:120143. 10.1016/j.neuroimage.2023.120143 - DOI - PubMed
1. Alsius A, Möttönen R, Sams ME, Soto-Faraco S, Tiippana K (2014) Effect of attentional load on audiovisual speech perception: evidence from ERPs. Front Psychol 5:727. 10.3389/fpsyg.2014.00727 - DOI - PMC - PubMed
1. Alsius A, Navarra J, Campbell R, Soto-Faraco S (2005) Audiovisual integration of speech falters under high attention demands. Curr Biol 15:839–843. 10.1016/j.cub.2005.03.046 - DOI - PubMed
1. Alsius A, Navarra J, Soto-Faraco S (2007) Attention to touch weakens audiovisual speech integration. Exp Brain Res 183:399–404. 10.1007/s00221-007-1110-1 - DOI - PubMed
1. Alsius A, Soto-Faraco S (2011) Searching for audiovisual correspondence in multiple speaker scenarios. Exp Brain Res 213:175–183. 10.1007/s00221-011-2624-0 - DOI - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Attention Drives Visual Processing and Audiovisual Integration During Multimodal Communication

Affiliations

Attention Drives Visual Processing and Audiovisual Integration During Multimodal Communication

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources