Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Dec 14:17:1264453.
doi: 10.3389/fnins.2023.1264453. eCollection 2023.

Cortical responses time-locked to continuous speech in the high-gamma band depend on selective attention

Affiliations

Cortical responses time-locked to continuous speech in the high-gamma band depend on selective attention

Vrishab Commuri et al. Front Neurosci. .

Abstract

Auditory cortical responses to speech obtained by magnetoencephalography (MEG) show robust speech tracking to the speaker's fundamental frequency in the high-gamma band (70-200 Hz), but little is currently known about whether such responses depend on the focus of selective attention. In this study 22 human subjects listened to concurrent, fixed-rate, speech from male and female speakers, and were asked to selectively attend to one speaker at a time, while their neural responses were recorded with MEG. The male speaker's pitch range coincided with the lower range of the high-gamma band, whereas the female speaker's higher pitch range had much less overlap, and only at the upper end of the high-gamma band. Neural responses were analyzed using the temporal response function (TRF) framework. As expected, the responses demonstrate robust speech tracking of the fundamental frequency in the high-gamma band, but only to the male's speech, with a peak latency of ~40 ms. Critically, the response magnitude depends on selective attention: the response to the male speech is significantly greater when male speech is attended than when it is not attended, under acoustically identical conditions. This is a clear demonstration that even very early cortical auditory responses are influenced by top-down, cognitive, neural processing mechanisms.

Keywords: cocktail party; cortical FFR; phase-locked response; primary auditory cortex; speech tracking.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

Figure 1
Figure 1
Illustration of how the carrier and envelope modulations predictors are extracted from an auditory stimulus. The raw stimulus waveform is shown in the bottom-left corner. Envelope modulations predictor: to generate the envelope modulations predictor, starting with the raw waveform and following the arrows up and to the right, first an auditory spectrogram is generated using a model of the auditory periphery (Yang et al., 1992). Then, the acoustic envelope in each frequency bin in the range 300–4,000 Hz is bandpassed in the high-gamma range (70–200 Hz), and the average is then computed across the channels. The result is a single time-series signal. Carrier predictor: to generate the carrier predictor, following the arrows to the right, the raw stimulus waveform is simply bandpass filtered to the high-gamma range. The result is a second single time-series signal. [Figure reproduced with permission from Kulasingham et al. (2020)].
Figure 2
Figure 2
Prediction accuracies for male single-speaker (Top) and cocktail-party (Bottom) models. Red regions denote voxels where the TRF model produced a prediction accuracy that was significantly greater than that of the noise within the ROI. TRFs to female speech (not shown) did not produce significant responses in any voxels.
Figure 3
Figure 3
Comparison of male speech and female speech TRFs for the single speaker conditions. Solid black lines indicate the TRF grand average (over TRF amplitude, averaged across voxels in the ROI); shaded regions indicate values within one standard error of the mean. Red shading indicates TRF values significantly above the noise floor. The distribution of TRF vectors in the brain at the time with the maximum significant response is plotted as an inset for each TRF. (Top left) Average TRF of the envelope modulations predictor derived from the male speaker stimulus. Note the large significant response at ~30–50 ms in the TRF which indicates a consistent, time-locked neural response to the speech envelope modulations at a 30–50 ms latency. (Top right) Average TRF of the envelope modulations predictor derived from the female speaker stimulus. Notice the lack of a significant response in the average TRF or a region of significance over the null model. Similar results were observed for the carrier stimuli: (Bottom left) Average TRF of the carrier predictor derived from the male speaker stimulus. Note the significant response in the TRF at the same latency observed for the corresponding envelope TRF. (Bottom right) Average TRF of the carrier predictor derived from the female speaker stimulus. As in the case of the corresponding envelope TRF, there is no significant response observed for this TRF.
Figure 4
Figure 4
Comparison of attended and unattended TRFs for the male speech stimuli, in the cocktail-party setting. Solid black lines indicate the TRF grand average (over TRF amplitude, averaged across voxels in the ROI); shaded regions indicate values within one standard error of the mean. Red shading indicates TRF values significantly above the noise floor. The distribution of TRF vectors in the brain at the time with the maximum significant response is plotted as an inset for each TRF. (Top left) Male speech envelope TRF for subjects attending to the male speech (female speech is background). A large significant response in the TRF is observed between ~30–50 ms which indicates a consistent, time-locked neural response to the speech envelope modulations at a 30–50 ms latency. (Top right) Male speech envelope TRF for subjects attending to the female speech (male speech is background). (Bottom left) Male speech carrier TRF for subjects attending to the male speech (female speech is background). (Bottom right) Male speech carrier TRF for subjects attending to the female speech (male speech is background). Linear mixed effects model and post-hoc test results indicate that the attended speech TRF peak amplitude is significantly greater than the unattended speech TRF peak amplitude.
Figure 5
Figure 5
Cocktail-party male speech TRF peak amplitude comparison across subjects. Male speech TRF peak amplitudes in the latency range 20–50 ms are presented for attend male (red) and ignore male (gray) conditions. Dashed lines show each individual subject's change in peak height between attend and ignore conditions. Solid lines show the change in the mean between the conditions. For the envelope TRFs, note the significant decrease in the mean value, and for most subjects, between the conditions. No such trend is observed in the carrier TRFs. ***p < 0.001.

Update of

Similar articles

Cited by

References

    1. Ahissar E., Nagarajan S., Ahissar M., Protopapas A., Mahncke H., Merzenich M. M. (2001). Speech comprehension is correlated with temporal response patterns recorded from auditory cortex. Proc. Natl. Acad. Sci. U.S.A. 98, 13367–13372. 10.1073/pnas.201400998 - DOI - PMC - PubMed
    1. Basu M., Krishnan A., Weber-Fox C. (2010). Brainstem correlates of temporal auditory processing in children with specific language impairment. Dev. Sci. 13, 77–91. 10.1111/j.1467-7687.2009.00849.x - DOI - PubMed
    1. Bates D., Mächler M., Bolker B., Walker S. (2015). Fitting linear mixed-effects models using lme4. J. Stat. Softw. 67, 1–48. 10.18637/jss.v067.i01 - DOI
    1. Bidelman G. M. (2018). Subcortical sources dominate the neuroelectric auditory frequency-following response to speech. NeuroImage 175, 56–69. 10.1016/j.neuroimage.2018.03.060 - DOI - PubMed
    1. Bidet-Caulet A., Fischer C., Besle J., Aguera P.-E., Giard M.-H., Bertrand O. (2007). Effects of selective attention on the electrophysiological representation of concurrent sounds in the human auditory cortex. J. Neurosci. 27, 9252–9261. 10.1523/JNEUROSCI.1402-07.2007 - DOI - PMC - PubMed