Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Aug 8:16:828546.
doi: 10.3389/fnins.2022.828546. eCollection 2022.

Cortical tracking of voice pitch in the presence of multiple speakers depends on selective attention

Affiliations

Cortical tracking of voice pitch in the presence of multiple speakers depends on selective attention

Christian Brodbeck et al. Front Neurosci. .

Abstract

Voice pitch carries linguistic and non-linguistic information. Previous studies have described cortical tracking of voice pitch in clean speech, with responses reflecting both pitch strength and pitch value. However, pitch is also a powerful cue for auditory stream segregation, especially when competing streams have pitch differing in fundamental frequency, as is the case when multiple speakers talk simultaneously. We therefore investigated how cortical speech pitch tracking is affected in the presence of a second, task-irrelevant speaker. We analyzed human magnetoencephalography (MEG) responses to continuous narrative speech, presented either as a single talker in a quiet background or as a two-talker mixture of a male and a female speaker. In clean speech, voice pitch was associated with a right-dominant response, peaking at a latency of around 100 ms, consistent with previous electroencephalography and electrocorticography results. The response tracked both the presence of pitch and the relative value of the speaker's fundamental frequency. In the two-talker mixture, the pitch of the attended speaker was tracked bilaterally, regardless of whether or not there was simultaneously present pitch in the speech of the irrelevant speaker. Pitch tracking for the irrelevant speaker was reduced: only the right hemisphere still significantly tracked pitch of the unattended speaker, and only during intervals in which no pitch was present in the attended talker's speech. Taken together, these results suggest that pitch-based segregation of multiple speakers, at least as measured by macroscopic cortical tracking, is not entirely automatic but strongly dependent on selective attention.

Keywords: TRF; auditory cortex; cocktail party; mTRF; temporal response functions.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

FIGURE 1
FIGURE 1
Predictors for analyzing pitch tracking. (A) For a single speaker, pitch tracking was estimated using two predictors: pitch strength, reflecting the degree to which a distinctive pitch is present in the sound signal, and pitch value, reflecting the fundamental frequency of the pitch, relative to the baseline. For moments when pitch strength is 0, the pitch value is set to the default baseline value. (B) For two-speaker stimuli, pitch strength and value were estimated separately for each speaker and then split into two separate predictors, reflecting overt pitch (i.e., pitch is present only in a single speaker) and masked pitch (i.e., pitch is present in both speakers). Note that, as a consequence of this definition, the two masked pitch predictors are always simultaneous, whereas the overt pitch predictors are mutually exclusive.
FIGURE 2
FIGURE 2
Separable tracking of pitch strength and pitch value of a single talker. (A) Pitch strength and pitch value both improved model predictions independently, when controlling for acoustic envelope and onset spectrograms (p ≤ 0.05, corrected; darkened areas excluded from analysis). The color scale reflects the explained variability in MEG responses, expressed as % of the complete model. (B) Both pitch predictors showed some right lateralization. The plots show the right–left hemisphere predictive power difference, same scale as (A). (C) Temporal response functions (TRFs) showed dominant responses at latencies between 50 and 200 ms. TRF magnitude is shown for regions of significant model prediction. The three horizontal red bars indicate time windows used in (D). (D) Anatomical distribution of TRFs in 50 ms time windows. LH, left hemisphere; RH, right hemisphere; STG, superior temporal gyrus; IFG, inferior frontal gyrus; aSTG, anterior STG.
FIGURE 3
FIGURE 3
Pitch tracking in two simultaneous speakers depends on selective attention. (A) Significance tests of pitch tracking for overt and masked pitch in the attended and ignored speakers. STG and IFG were separately tested (darkened area excluded from tests). (B) Individual subject data (% variability explained) in a region of interest, defined as the intersection of the region of significant activity in the single speaker condition and the STG anatomical label. (C) Temporal response function (TRF) magnitude with dominant response at 100–200 ms latency. The three horizontal red bars indicate time windows used in (D). (D) TRF activity localized mainly to the auditory cortex, with involvement of a more anterior region for masked pitch in the attended speaker. LH, left hemisphere; RH, right hemisphere; STG, superior temporal gyrus; IFG, inferior frontal gyrus; aSTG, anterior STG.

Similar articles

Cited by

References

    1. Andermann M., Günther M., Patterson R. D., Rupp A. (2021). Early cortical processing of pitch height and the role of adaptation and musicality. NeuroImage 225:117501. 10.1016/j.neuroimage.2020.117501 - DOI - PubMed
    1. Bell A. J., Sejnowski T. J. (1995). An Information-Maximization Approach to Blind Separation and Blind Deconvolution. Neural. Comput. 7 1129–1159. 10.1162/neco.1995.7.6.1129 - DOI - PubMed
    1. Bendor D., Wang X. (2005). The neuronal representation of pitch in primate auditory cortex. Nature 436 1161–1165. 10.1038/nature03867 - DOI - PMC - PubMed
    1. Boersma P., Weenink D. (2017). Praat: Doing Phonetics by Computer [Computer program]. Available online at: http://www.praat.org/ (aceessed April 2021).
    1. Bourguignon M., Molinaro N., Wens V. (2018). Contrasting functional imaging parametric maps: the mislocation problem and alternative solutions. NeuroImage 169 200–211. 10.1016/j.neuroimage.2017.12.033 - DOI - PubMed

LinkOut - more resources