Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Feb 3;36(5):1596-606.
doi: 10.1523/JNEUROSCI.1730-15.2016.

Left Superior Temporal Gyrus Is Coupled to Attended Speech in a Cocktail-Party Auditory Scene

Affiliations

Left Superior Temporal Gyrus Is Coupled to Attended Speech in a Cocktail-Party Auditory Scene

Marc Vander Ghinst et al. J Neurosci. .

Abstract

Using a continuous listening task, we evaluated the coupling between the listener's cortical activity and the temporal envelopes of different sounds in a multitalker auditory scene using magnetoencephalography and corticovocal coherence analysis. Neuromagnetic signals were recorded from 20 right-handed healthy adult humans who listened to five different recorded stories (attended speech streams), one without any multitalker background (No noise) and four mixed with a "cocktail party" multitalker background noise at four signal-to-noise ratios (5, 0, -5, and -10 dB) to produce speech-in-noise mixtures, here referred to as Global scene. Coherence analysis revealed that the modulations of the attended speech stream, presented without multitalker background, were coupled at ∼0.5 Hz to the activity of both superior temporal gyri, whereas the modulations at 4-8 Hz were coupled to the activity of the right supratemporal auditory cortex. In cocktail party conditions, with the multitalker background noise, the coupling was at both frequencies stronger for the attended speech stream than for the unattended Multitalker background. The coupling strengths decreased as the Multitalker background increased. During the cocktail party conditions, the ∼0.5 Hz coupling became left-hemisphere dominant, compared with bilateral coupling without the multitalker background, whereas the 4-8 Hz coupling remained right-hemisphere lateralized in both conditions. The brain activity was not coupled to the multitalker background or to its individual talkers. The results highlight the key role of listener's left superior temporal gyri in extracting the slow ∼0.5 Hz modulations, likely reflecting the attended speech stream within a multitalker auditory scene.

Significance statement: When people listen to one person in a "cocktail party," their auditory cortex mainly follows the attended speech stream rather than the entire auditory scene. However, how the brain extracts the attended speech stream from the whole auditory scene and how increasing background noise corrupts this process is still debated. In this magnetoencephalography study, subjects had to attend a speech stream with or without multitalker background noise. Results argue for frequency-dependent cortical tracking mechanisms for the attended speech stream. The left superior temporal gyrus tracked the ∼0.5 Hz modulations of the attended speech stream only when the speech was embedded in multitalker background, whereas the right supratemporal auditory cortex tracked 4-8 Hz modulations during both noiseless and cocktail-party conditions.

Keywords: coherence analysis; magnetoencephalography; speech in noise.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Experimental setup and the corresponding sounds (bottom traces). The Global scene is the combination of the Attended stream (black traces), the voice of the reader of a story, and of the Multitalker background (gray traces) obtained by mixing voices from six simultaneous French-speaking talkers (3 females and 3 males).
Figure 2.
Figure 2.
Global scene sound signals at different experimental SNRs and their corresponding relationship between subjective intelligibility scores: mean ± range, VAS ranging from 0 (totally unintelligible) to 10 (perfectly intelligible). Black represents the Attended stream. Gray represents the Multitalker background. The intelligibility of the Attended stream decreased significantly with decreasing SNRs. Vertical brackets represent the post hoc paired t tests between adjacent conditions. *** p < 0.001.
Figure 3.
Figure 3.
Sensor and source space results obtained in the No noise condition. Left, Spatial distribution of group-level sensor space coherence in the δ band (0.5 Hz; top) and θ band (4–8 Hz; bottom). In both frequency bands, the coherence maxima are located bilaterally at gradiometer sensors covering the temporal areas. The sensor array is viewed from the top. Right, Results obtained in the source space. Group-level statistical (p value) maps showing brain areas displaying statistical significant coherence. Maps are thresholded at p < 0.05. In the δ band, significant local maxima occur at the lower bank of the superior temporal gyrus bilaterally, with no significant hemispheric lateralization. In the θ band, a significant local maximum is seen only at right supratemporal auditory cortex.
Figure 4.
Figure 4.
Sensor and source-space results obtained for the listening conditions. Top, Mean coherence spectra representing the arithmetic mean of the 20 individual maximum coherence spectra when coherence was computed between MEG signals and the envelopes of the different components of the auditory scene. Middle, Top, Group-level gradiometer sensor space coherence in the δ band (∼0.5 Hz). Higher coherence values were found at the temporal-lobe sensors (with a left hemisphere dominance) when coherence was computed between MEG signals and the Attended stream (Cohatt) than with the Global scene (Cohglobal). Coherence decreased as the level of the Multitalker background progressively increased. The sensor array is viewed from the top. Middle, Bottom, Group-level source space coherence in the δ band when the coherence was computed with the Attended stream (Cohatt). Group-level p value coherence map disclosed local coherence maxima at the superior temporal gyrus bilaterally with left hemisphere dominance in the noisy conditions. Bottom, Top, Group-level gradiometer mean sensor space coherence in θ band (4–8 Hz). Higher coherence values were found at the temporal-lobe sensors (with a right dominance) when coherence was computed between MEG signals and the Attended stream (Cohatt) than with the Global scene (Cohglobal). Coherence values decreased with increasing noise level. Bottom, Bottom, Group-level source space coherence in θ band when the coherence was computed with the Attended stream (Cohatt). Group-level p value coherence map disclosed significant coherence maxima at the right supratemporal auditory cortex in every listening condition.
Figure 5.
Figure 5.
Top, Cortical areas sensitive to the Attended stream in speech-in-noise conditions in the δ band (∼0.5 Hz). The p valued maps represent the contrast Cohatt-Cohglobal with a threshold at statistical significance level (p < 0.05). Specific coupling between the Attended stream and MEG signals occurs at superior temporal gyri, with a left hemisphere dominance. Middle, Same illustration but for the θ band (4–8 Hz). Specific coupling between the Attended stream and MEG signals was observed at the right supratemporal auditory cortex, but also at the left superior temporal gyrus in the 0 and −5 dB condition. (The coherence at left auditory cortices was not significant in Cohatt and Cohglobal both at sensor and source levels.) Bottom, Comparison between sound time courses and left temporal-lobe MEG signals in a typical subject. Top, Left, Time course of the Attended stream. Top, Right, The same sample of voice signal but merged with the Multitalker background at an SNR of 0 dB (Global scene). Middle, Same audio signals as above but bandpass filtered through 0.1–1 Hz. Bottom, Time course of a left temporal MEG sensor showing that the coupling with the slow temporal fluctuations was stronger with the Attended stream than with the Global scene (same MEG sensor signal displayed on the left and the right).

Similar articles

Cited by

References

    1. Ahissar E, Nagarajan S, Ahissar M, Protopapas A, Mahncke H, Merzenich MM. Speech comprehension is correlated with temporal response patterns recorded from auditory cortex. Proc Natl Acad Sci U S A. 2001;98:13367–13372. doi: 10.1073/pnas.201400998. - DOI - PMC - PubMed
    1. Alain C, Reinke K, McDonald KL, Chau W, Tam F, Pacurar A, Graham S. Left thalamo-cortical network implicated in successful speech separation and identification. Neuroimage. 2005;26:592–599. doi: 10.1016/j.neuroimage.2005.02.006. - DOI - PubMed
    1. Ashburner J, Friston KJ. Nonlinear spatial normalization using basis functions. Hum Brain Mapp. 1999;7:254–266. doi: 10.1002/(SICI)1097-0193(1999)7:4<254::AID-HBM4>3.0.CO;2-G. - DOI - PMC - PubMed
    1. Ashburner J, Neelin P, Collins DL, Evans A, Friston K. Incorporating prior knowledge into image registration. Neuroimage. 1997;6:344–352. doi: 10.1006/nimg.1997.0299. - DOI - PubMed
    1. Bishop CW, Miller LM. A multisensory cortical network for understanding speech in noise. J Cogn Neurosci. 2009;21:1790–1805. doi: 10.1162/jocn.2009.21118. - DOI - PMC - PubMed

Publication types

LinkOut - more resources