Cortical tracking of voice pitch in the presence of multiple speakers depends on selective attention

doi:10.3389/fnins.2022.828546

. 2022 Aug 8:16:828546.

doi: 10.3389/fnins.2022.828546. eCollection 2022.

Cortical tracking of voice pitch in the presence of multiple speakers depends on selective attention

Christian Brodbeck^{1

2}, Jonathan Z Simon^{2

3

4}

Affiliations

¹ Department of Psychological Sciences, University of Connecticut, Storrs, CT, United States.
² Institute for Systems Research, University of Maryland, College Park, College Park, MD, United States.
³ Department of Electrical and Computer Engineering, University of Maryland, College Park, College Park, MD, United States.
⁴ Department of Biology, University of Maryland, College Park, College Park, MD, United States.

PMID: 36003957
PMCID: PMC9393379
DOI: 10.3389/fnins.2022.828546

Cortical tracking of voice pitch in the presence of multiple speakers depends on selective attention

Christian Brodbeck et al. Front Neurosci. 2022.

. 2022 Aug 8:16:828546.

doi: 10.3389/fnins.2022.828546. eCollection 2022.

Authors

Christian Brodbeck^{1

2}, Jonathan Z Simon^{2

3

4}

Affiliations

¹ Department of Psychological Sciences, University of Connecticut, Storrs, CT, United States.
² Institute for Systems Research, University of Maryland, College Park, College Park, MD, United States.
³ Department of Electrical and Computer Engineering, University of Maryland, College Park, College Park, MD, United States.
⁴ Department of Biology, University of Maryland, College Park, College Park, MD, United States.

PMID: 36003957
PMCID: PMC9393379
DOI: 10.3389/fnins.2022.828546

Abstract

Voice pitch carries linguistic and non-linguistic information. Previous studies have described cortical tracking of voice pitch in clean speech, with responses reflecting both pitch strength and pitch value. However, pitch is also a powerful cue for auditory stream segregation, especially when competing streams have pitch differing in fundamental frequency, as is the case when multiple speakers talk simultaneously. We therefore investigated how cortical speech pitch tracking is affected in the presence of a second, task-irrelevant speaker. We analyzed human magnetoencephalography (MEG) responses to continuous narrative speech, presented either as a single talker in a quiet background or as a two-talker mixture of a male and a female speaker. In clean speech, voice pitch was associated with a right-dominant response, peaking at a latency of around 100 ms, consistent with previous electroencephalography and electrocorticography results. The response tracked both the presence of pitch and the relative value of the speaker's fundamental frequency. In the two-talker mixture, the pitch of the attended speaker was tracked bilaterally, regardless of whether or not there was simultaneously present pitch in the speech of the irrelevant speaker. Pitch tracking for the irrelevant speaker was reduced: only the right hemisphere still significantly tracked pitch of the unattended speaker, and only during intervals in which no pitch was present in the attended talker's speech. Taken together, these results suggest that pitch-based segregation of multiple speakers, at least as measured by macroscopic cortical tracking, is not entirely automatic but strongly dependent on selective attention.

Keywords: TRF; auditory cortex; cocktail party; mTRF; temporal response functions.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

**FIGURE 1**
Predictors for analyzing pitch tracking. **(A)** For a single speaker, pitch tracking was estimated using two predictors: *pitch strength*, reflecting the degree to which a distinctive pitch is present in the sound signal, and *pitch value*, reflecting the fundamental frequency of the pitch, relative to the baseline. For moments when pitch strength is 0, the pitch value is set to the default baseline value. **(B)** For two-speaker stimuli, pitch strength and value were estimated separately for each speaker and then split into two separate predictors, reflecting overt pitch (i.e., pitch is present only in a single speaker) and masked pitch (i.e., pitch is present in both speakers). Note that, as a consequence of this definition, the two masked pitch predictors are always simultaneous, whereas the overt pitch predictors are mutually exclusive.

**FIGURE 2**
Separable tracking of pitch strength and pitch value of a single talker. **(A)** Pitch strength and pitch value both improved model predictions independently, when controlling for acoustic envelope and onset spectrograms (p ≤ 0.05, corrected; darkened areas excluded from analysis). The color scale reflects the explained variability in MEG responses, expressed as % of the complete model. **(B)** Both pitch predictors showed some right lateralization. The plots show the right–left hemisphere predictive power difference, same scale as **(A)**. **(C)** Temporal response functions (TRFs) showed dominant responses at latencies between 50 and 200 ms. TRF magnitude is shown for regions of significant model prediction. The three horizontal red bars indicate time windows used in **(D)**. **(D)** Anatomical distribution of TRFs in 50 ms time windows. LH, left hemisphere; RH, right hemisphere; STG, superior temporal gyrus; IFG, inferior frontal gyrus; aSTG, anterior STG.

**FIGURE 3**
Pitch tracking in two simultaneous speakers depends on selective attention. **(A)** Significance tests of pitch tracking for overt and masked pitch in the attended and ignored speakers. STG and IFG were separately tested (darkened area excluded from tests). **(B)** Individual subject data (% variability explained) in a region of interest, defined as the intersection of the region of significant activity in the single speaker condition and the STG anatomical label. **(C)** Temporal response function (TRF) magnitude with dominant response at 100–200 ms latency. The three horizontal red bars indicate time windows used in **(D)**. **(D)** TRF activity localized mainly to the auditory cortex, with involvement of a more anterior region for masked pitch in the attended speaker. LH, left hemisphere; RH, right hemisphere; STG, superior temporal gyrus; IFG, inferior frontal gyrus; aSTG, anterior STG.

See this image and copyright information in PMC

Cited by

Dynamics of Pitch Perception in the Auditory Cortex.
Abrams EB, Marantz A, Krementsov I, Gwilliams L. Abrams EB, et al. J Neurosci. 2025 Mar 19;45(12):e1111242025. doi: 10.1523/JNEUROSCI.1111-24.2025. J Neurosci. 2025. PMID: 39909567
EEG-based cross-subject passive music pitch perception using deep learning models.
Meng Q, Tian L, Liu G, Zhang X. Meng Q, et al. Cogn Neurodyn. 2025 Dec;19(1):6. doi: 10.1007/s11571-024-10196-9. Epub 2025 Jan 3. Cogn Neurodyn. 2025. PMID: 39758357
Cocktail party training induces increased speech intelligibility and decreased cortical activity in bilateral inferior frontal gyri. A functional near-infrared study.
Lanzilotti C, Andéol G, Micheyl C, Scannella S. Lanzilotti C, et al. PLoS One. 2022 Dec 1;17(12):e0277801. doi: 10.1371/journal.pone.0277801. eCollection 2022. PLoS One. 2022. PMID: 36454948 Free PMC article.
Attentional Modulation of the Cortical Contribution to the Frequency-Following Response Evoked by Continuous Speech.
Schüller A, Schilling A, Krauss P, Rampp S, Reichenbach T. Schüller A, et al. J Neurosci. 2023 Nov 1;43(44):7429-7440. doi: 10.1523/JNEUROSCI.1247-23.2023. Epub 2023 Oct 4. J Neurosci. 2023. PMID: 37793908 Free PMC article.
Neural encoding of melodic expectations in music across EEG frequency bands.
Galeano-Otálvaro JD, Martorell J, Meyer L, Titone L. Galeano-Otálvaro JD, et al. Eur J Neurosci. 2024 Dec;60(11):6734-6749. doi: 10.1111/ejn.16581. Epub 2024 Oct 29. Eur J Neurosci. 2024. PMID: 39469882 Free PMC article.

See all "Cited by" articles

References

1. Andermann M., Günther M., Patterson R. D., Rupp A. (2021). Early cortical processing of pitch height and the role of adaptation and musicality. NeuroImage 225:117501. 10.1016/j.neuroimage.2020.117501 - DOI - PubMed
1. Bell A. J., Sejnowski T. J. (1995). An Information-Maximization Approach to Blind Separation and Blind Deconvolution. Neural. Comput. 7 1129–1159. 10.1162/neco.1995.7.6.1129 - DOI - PubMed
1. Bendor D., Wang X. (2005). The neuronal representation of pitch in primate auditory cortex. Nature 436 1161–1165. 10.1038/nature03867 - DOI - PMC - PubMed
1. Boersma P., Weenink D. (2017). Praat: Doing Phonetics by Computer [Computer program]. Available online at: http://www.praat.org/ (aceessed April 2021).
1. Bourguignon M., Molinaro N., Wens V. (2018). Contrasting functional imaging parametric maps: the mislocation problem and alternative solutions. NeuroImage 169 200–211. 10.1016/j.neuroimage.2017.12.033 - DOI - PubMed

Grants and funding

LinkOut - more resources

Full Text Sources

[1] Andermann M., Günther M., Patterson R. D., Rupp A. (2021). Early cortical processing of pitch height and the role of adaptation and musicality. NeuroImage 225:117501. 10.1016/j.neuroimage.2020.117501 - DOI - PubMed

[2] Andermann M., Günther M., Patterson R. D., Rupp A. (2021). Early cortical processing of pitch height and the role of adaptation and musicality. NeuroImage 225:117501. 10.1016/j.neuroimage.2020.117501 - DOI - PubMed

[3] Bell A. J., Sejnowski T. J. (1995). An Information-Maximization Approach to Blind Separation and Blind Deconvolution. Neural. Comput. 7 1129–1159. 10.1162/neco.1995.7.6.1129 - DOI - PubMed

[4] Bell A. J., Sejnowski T. J. (1995). An Information-Maximization Approach to Blind Separation and Blind Deconvolution. Neural. Comput. 7 1129–1159. 10.1162/neco.1995.7.6.1129 - DOI - PubMed

[5] Bendor D., Wang X. (2005). The neuronal representation of pitch in primate auditory cortex. Nature 436 1161–1165. 10.1038/nature03867 - DOI - PMC - PubMed

[6] Bendor D., Wang X. (2005). The neuronal representation of pitch in primate auditory cortex. Nature 436 1161–1165. 10.1038/nature03867 - DOI - PMC - PubMed

[7] Boersma P., Weenink D. (2017). Praat: Doing Phonetics by Computer [Computer program]. Available online at: http://www.praat.org/ (aceessed April 2021).

[8] Boersma P., Weenink D. (2017). Praat: Doing Phonetics by Computer [Computer program]. Available online at: http://www.praat.org/ (aceessed April 2021).

[9] Bourguignon M., Molinaro N., Wens V. (2018). Contrasting functional imaging parametric maps: the mislocation problem and alternative solutions. NeuroImage 169 200–211. 10.1016/j.neuroimage.2017.12.033 - DOI - PubMed

[10] Bourguignon M., Molinaro N., Wens V. (2018). Contrasting functional imaging parametric maps: the mislocation problem and alternative solutions. NeuroImage 169 200–211. 10.1016/j.neuroimage.2017.12.033 - DOI - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Cortical tracking of voice pitch in the presence of multiple speakers depends on selective attention

Affiliations

Cortical tracking of voice pitch in the presence of multiple speakers depends on selective attention

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Grants and funding

LinkOut - more resources

Full Text Sources