Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Nov 5;12(1):18789.
doi: 10.1038/s41598-022-22041-2.

Brain activity during shadowing of audiovisual cocktail party speech, contributions of auditory-motor integration and selective attention

Affiliations

Brain activity during shadowing of audiovisual cocktail party speech, contributions of auditory-motor integration and selective attention

Patrik Wikman et al. Sci Rep. .

Abstract

Selective listening to cocktail-party speech involves a network of auditory and inferior frontal cortical regions. However, cognitive and motor cortical regions are differentially activated depending on whether the task emphasizes semantic or phonological aspects of speech. Here we tested whether processing of cocktail-party speech differs when participants perform a shadowing (immediate speech repetition) task compared to an attentive listening task in the presence of irrelevant speech. Participants viewed audiovisual dialogues with concurrent distracting speech during functional imaging. Participants either attentively listened to the dialogue, overtly repeated (i.e., shadowed) attended speech, or performed visual or speech motor control tasks where they did not attend to speech and responses were not related to the speech input. Dialogues were presented with good or poor auditory and visual quality. As a novel result, we show that attentive processing of speech activated the same network of sensory and frontal regions during listening and shadowing. However, in the superior temporal gyrus (STG), peak activations during shadowing were posterior to those during listening, suggesting that an anterior-posterior distinction is present for motor vs. perceptual processing of speech already at the level of the auditory cortex. We also found that activations along the dorsal auditory processing stream were specifically associated with the shadowing task. These activations are likely to be due to complex interactions between perceptual, attention dependent speech processing and motor speech generation that matches the heard speech. Our results suggest that interactions between perceptual and motor processing of speech relies on a distributed network of temporal and motor regions rather than any specific anatomical landmark as suggested by some previous studies.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
The audiovisual (AV) cocktail party design used in the current study. (A) The participants were presented with video clips (ca.1 min in duration) of a male speaker and a female speaker discussing neutral topics, such as the weather, while a continuous audiobook was played in the background. Speech from the two talkers alternated with a short break between talkers. The participants performed four tasks: (1) a listening task, where they attended to the dialogue while ignoring the audiobook and answered questions about each line of the dialogue immediately after the video-clip finished, and (2) a visual control task, where the participants ignored the dialogue and audiobook, and instead counted rotations of a cross presented below the neck of the talker who was speaking at the moment, (3) a shadowing task, where participant shadowed, that is, overtly repeated as quickly as possible the line of the speaker of the same gender as the participant themselves (i.e., male participants shadowed the male speaker’s speech and female participants shadowed the female speaker’s speech), (4) a motor control task where participants overtly counted from ‘one’ forward during the lines spoken by the speaker of the same gender as themselves. (B) Videos were presented at two levels of auditory quality: Poor auditory quality, where the audio stream of the dialogue was noise-vocoded with four logarithmically equidistant frequency bands above 0.3 kHz (i.e., the fundamental frequency was untouched), and good auditory quality, where it was noise-vocoded using 16 bands above 0.3 kHz (white horizontal lines on the spectrograms denote the frequency band borders). (B) Visual quality of the faces was modulated by masking the speakers’ faces with different amounts of dynamic white noise.
Figure 2
Figure 2
Behavioral performance in the four tasks and four audiovisual (AV) quality conditions (± SEM). (A) In the listening task, performance was above chance level in all conditions, but AV qualities had no significant effects on performance. In the visual control task, audiovisual qualities had no effect on task performance. In the motor control task, we only report the amount of overtly uttered numbers. (B) In the shadowing task Auditory quality had a significant effect on both accuracy and response time, and Visual quality had a significant main effect on accuracy.
Figure 3
Figure 3
The listening task and the shadowing task activated partly overlapping regions in distributed cortical networks. Significant clusters (initial cluster threshold z = 3.1; permutated cluster significance p < 0.05, family-wise error rate, FWER corrected across all whole brain analyses) for the listening task vs. silent baseline (Bright red and orange), and the shadowing task vs. silent baseline (dark red and orange). Upper row: Lateral views of the inflated left and right hemisphere (lighter gray denotes gyri and darker gray sulci). Lower row: Medial views of the left and right hemisphere. STP supratemporal plane, STG/S superior temporal gyrus/sulcus, TP temporal pole, MT middle temporal visual area, aIns anterior insula, IGG inferior frontal gyrus, M1 primary motor cortex, PM premotor cortex, S1 primary somatosensory cortex, VC visual cortex, FG fusiform gyrus, SM supplementary motor cortex.
Figure 4
Figure 4
The peak coordinates for the AV speech listening task are significantly anterior to the peak coordinates for the shadowing task in the STG/STS. The white lines denote the anterior–posterior dimension in the STG, individual peak coordinates for the listening task (vs. baseline) are denoted by blue circles (mean blue cross) and the corresponding peak coordinates for the shadowing task are denoted by red circles (mean red cross). The upper row shows the peak coordinates for the task vs. baseline in the left and right STG/STS (lighter gray denotes gyri and darker gray sulci) The lower row shows the peak coordinates for the listening and shadowing tasks contrasted with their respective control task, that is, visual control and motor control task, respectively, controlling for stimulus and speech production related effects. LH left hemisphere, RH right hemisphere, A anterior, p posterior, HG Heschl’s gyrus, STG superior temporal gyrus, STS superior temporal sulcus.
Figure 5
Figure 5
The omnibus ANOVA with factors Motor speech production, Attention to AV speech, Auditory quality and Visual quality revealed extensive main effects of Motor speech production and Attention to AV speech. (A) Significant clusters (initial cluster threshold z = 3.1; permutated cluster significance p < 0.05, FWER corrected) for the main effect of Motor speech production. Clusters where the motor tasks (shadowing and motor control task; Motor) activations were stronger than the non-motor tasks (AV speech listening and visual control task; NonMotor) are shown in red/yellow, the converse in blue/cyan. From left to right: lateral and medial views of the inflated left hemisphere and lateral and medial views of the right hemisphere (lighter gray denotes gyri and darker gray sulci). (B) Clusters where the tasks that demanded attention to AV speech (shadowing and AV speech listening task; Attention) activations were stronger than those during the tasks not demanding attention to AV speech tasks (motor control and visual control task; ignore) are shown in red/yellow, the converse in blue/cyan. STS superior temporal sulcus, aIns anterior insula, IFG inferior frontal gyrus, M1 primary motor cortex, PM premotor cortex, S1 primary somatosensory cortex, V1/V2C visual area ½, SM supplementary motor cortex, TPJ temporoparietal junction, pPT posterior planum temporale, DLPFC dorsolateral prefrontal cortex, SMG supramarginal gyrus.
Figure 6
Figure 6
Significant interaction between Motor speech production and Attention to AV speech were found in left hemisphere auditory, motor and language regions. Top row: Significant clusters (initial cluster threshold z = 3.1; permutated cluster significance p < 0.05, FWER corrected) for the interaction Motor speech production × Attention to AV speech. From left to right: lateral and medial views of the inflated left hemisphere and lateral and medial views of the right hemisphere (lighter gray denotes gyri and darker gray sulci). Bottom row: The mean % signal change (vs. baseline) in each of the tasks are plotted separately for select significant clusters, Plots for all interaction clusters are shown in Supplementary Fig. 1. Error bars represent ± SEM. IFG inferior frontal gyrus, STS superior temporal gyrus, PM premotor cortex, SM supplementary motor cortex.
Figure 7
Figure 7
Significant clusters where shorter response times (RTs) were associated with stronger activations in the shadowing task. Significant clusters (initial cluster threshold z = 3.1; permutated cluster significance p < 0.05, FWER corrected) for the correlation effect (within subjects) between RT in the shadowing task and neural activations. From left to right: lateral and medial views of the inflated left hemisphere and lateral and medial views of the right hemisphere (lighter gray denotes gyri and darker gray sulci).
Figure 8
Figure 8
In the left posterior planum temporale (pPT), there was a significant difference in the mean % signal change between the shadowing (Shadow) condition with poor auditory and poor visual quality and the corresponding motor control (Motor C) condition. *p < 0.05, FWER corrected. pv poor visual quality, pa poor auditory quality, gv good visual quality, ga good auditory quality.

References

    1. Tremblay P, Dick AS. Broca and Wernicke are dead, or moving past the classic model of language neurobiology. Brain Lang. 2016;162:60–71. - PubMed
    1. Liberman AM, Harris KS, Hoffman HS, Griffith BC. The discrimination of speech sounds within and across phoneme boundaries. J. Exp. Psychol. 1957;54:358–368. - PubMed
    1. Hickok G. Computational neuroanatomy of speech production. Nat. Rev. Neurosci. 2012;13:135–145. doi: 10.1038/nrn2158. - DOI - PMC - PubMed
    1. Buchsbaum BR, Hickok G, Humphries C. Role of left posterior superior temporal gyrus in phonological processing for speech perception and production. Cogn. Sci. 2001;25:663–678. doi: 10.1016/s0364-0213(01)00048-9. - DOI
    1. Peschke C, Ziegler W, Kappes J, Baumgaertner A. Auditory–motor integration during fast repetition: The neuronal correlates of shadowing. Neuroimage. 2009;47:392–402. doi: 10.1016/j.neuroimage.2009.03.061. - DOI - PubMed

Publication types