. 2010 Jan 13;30(2):629-38.

doi: 10.1523/JNEUROSCI.2742-09.2010.

How the human brain recognizes speech in the context of changing speakers

Katharina von Kriegstein¹, David R R Smith, Roy D Patterson, Stefan J Kiebel, Timothy D Griffiths

Affiliations

PMID: 20071527
PMCID: PMC2824128
DOI: 10.1523/JNEUROSCI.2742-09.2010

How the human brain recognizes speech in the context of changing speakers

Katharina von Kriegstein et al. J Neurosci. 2010.

. 2010 Jan 13;30(2):629-38.

doi: 10.1523/JNEUROSCI.2742-09.2010.

Authors

Katharina von Kriegstein¹, David R R Smith, Roy D Patterson, Stefan J Kiebel, Timothy D Griffiths

Affiliation

¹ Wellcome Trust Centre for Neuroimaging, University College London, London WC1N 3BG, United Kingdom. kriegstein@cbs.mpg.de

PMID: 20071527
PMCID: PMC2824128
DOI: 10.1523/JNEUROSCI.2742-09.2010

Abstract

We understand speech from different speakers with ease, whereas artificial speech recognition systems struggle with this task. It is unclear how the human brain solves this problem. The conventional view is that speech message recognition and speaker identification are two separate functions and that message processing takes place predominantly in the left hemisphere, whereas processing of speaker-specific information is located in the right hemisphere. Here, we distinguish the contribution of specific cortical regions, to speech recognition and speaker information processing, by controlled manipulation of task and resynthesized speaker parameters. Two functional magnetic resonance imaging studies provide evidence for a dynamic speech-processing network that questions the conventional view. We found that speech recognition regions in left posterior superior temporal gyrus/superior temporal sulcus (STG/STS) also encode speaker-related vocal tract parameters, which are reflected in the amplitude peaks of the speech spectrum, along with the speech message. Right posterior STG/STS activated specifically more to a speaker-related vocal tract parameter change during a speech recognition task compared with a voice recognition task. Left and right posterior STG/STS were functionally connected. Additionally, we found that speaker-related glottal fold parameters (e.g., pitch), which are not reflected in the amplitude peaks of the speech spectrum, are processed in areas immediately adjacent to primary auditory cortex, i.e., in areas in the auditory hierarchy earlier than STG/STS. Our results point to a network account of speech recognition, in which information about the speech message and the speaker's vocal tract are combined to solve the difficult task of understanding speech from different speakers.

PubMed Disclaimer

Figures

**Figure 1.**
The contribution of glottal fold and vocal tract parameters to the speech output. A, Shown is a sagittal section through a human head and neck. Green circle, Glottal folds; blue lines, extension of the vocal tract from glottal folds to tip of the nose and lips. B, The three plots show three different sounds determined by glottal fold parameters. In voiced speech, the vibration of the glottal folds results in lower voices (120 Hz GPR; top) or higher voices (200 Hz GPR; middle). If glottal folds are constricted, they produce a noise-like sound that is heard as whispered speech (0 Hz GPR; bottom). C, The vocal tract filters the sound wave coming from the glottal folds, which introduces amplitude peaks at certain frequencies (“formants”; blue lines). Note that the different glottal fold parameters do not influence the formant position. D, Both speech- and speaker-related vocal tract parameters influence the position of the formants. Here we show as an example the formant shifts associated with the speech sounds /u/ and /a/ (first and second plot) and an /a/ with a shorter and longer vocal tract length (second and third plot).

**Figure 2.**
BOLD responses associated with the main effect of VTL (red) and main effect of task (green) as revealed by the conjunction analysis of experiment 1 and experiment 2. The group mean structural image is overlaid with the statistical parametric maps for the respective contrasts. “Control task” refers to loudness task in experiment 1 and to speaker task in experiment 2. L, Left hemisphere; VTL, acoustic effect of vocal tract length. The dotted lines on the sagittal section indicate the slices displayed as horizontal and coronal sections. The plots show the parameter estimates for experiments 1 and 2 separately. The small bar graphs on top of the plots display the main effects and their significance threshold in a repeated-measures ANOVA. Results of *post hoc t* tests are indicated by the brackets within the plot. *p < 0.05, ***p < 0.001. ns, Nonsignificant. Error bars represent ±1 SEM.

**Figure 3.**
BOLD responses associated with the interaction between task and VTL. The contrast for experiment 1 is rendered in magenta and for experiment 2 in cyan. The plots show the parameter estimates for experiments 1 and 2 separately [MNI coordinates: experiment 1, (52, −22, 0); experiment 2, (68, −42, 16)]. The small bar graphs on top of the plots show the significant interaction and main effects and their significance threshold in a repeated-measures ANOVA. Results of *post hoc t* test are indicated by the brackets within the plot. *p < 0.05. ns, Nonsignificant. Error bars represent ±1 SEM.

**Figure 4.**
Overview of BOLD responses in right and left hemisphere. This figure also includes the BOLD responses reported in a previous study (von Kriegstein et al., 2007). The right-sided activation for the previous study is shown at a threshold of p < 0.003 for display purposes. The voxel with the maximum statistic for this study is at (60, −42, −2), Z = 3.12.

**Figure 5.**
Functional connectivity (PPI) between left and right posterior STG/STS. Seed regions were taken from individual subject clusters; here the group mean is shown (red). Target regions identified by the PPI analysis (VTL × task, connectivity) are shown in green [MNI coordinates: experiment 1, (58, −46, 20), Z = 3.03; experiment 2, (60, −52, 20), Z = 3.26)]. BOLD responses associated with the interaction between task and VTL (VTL × task, activity) are displayed to demonstrate their consistently close proximity to PPI target regions in right posterior STG/STS.

**Figure 6.**
BOLD responses for voiced and whispered speech. The group mean structural image is overlaid with the statistical parametric maps for the contrasts between (1) voiced > whispered speech (red), (2) whispered > voiced speech (yellow), and (3) pitch varies > VTL varies (cyan). The plot shows parameter estimates for voiced and whispered speech in Te1.2 and Te1.1 (volume of interest). Error bars represent ±1 SEM. A repeated-measures ANOVA with the factors location (Te1.1, Te1.2) and sound quality (voiced, whispered) revealed a significant interaction of sound quality × location (F_(1,17) = 28, p < 0.0001), indicating differential responsiveness to whispered sounds in Te1.1 and to voiced sounds in Te1.2. ***p < 0.001.

See this image and copyright information in PMC

Cited by

The Role of the Right Hemisphere in Processing Phonetic Variability Between Talkers.
Luthra S. Luthra S. Neurobiol Lang (Camb). 2021 Feb 1;2(1):138-151. doi: 10.1162/nol_a_00028. eCollection 2021. Neurobiol Lang (Camb). 2021. PMID: 37213418 Free PMC article. Review.
Investigating the neural correlates of voice versus speech-sound directed information in pre-school children.
Raschle NM, Smith SA, Zuk J, Dauvermann MR, Figuccio MJ, Gaab N. Raschle NM, et al. PLoS One. 2014 Dec 22;9(12):e115549. doi: 10.1371/journal.pone.0115549. eCollection 2014. PLoS One. 2014. PMID: 25532132 Free PMC article.
Using TMS to evaluate a causal role for right posterior temporal cortex in talker-specific phonetic processing.
Luthra S, Mechtenberg H, Giorio C, Theodore RM, Magnuson JS, Myers EB. Luthra S, et al. Brain Lang. 2023 May;240:105264. doi: 10.1016/j.bandl.2023.105264. Epub 2023 Apr 21. Brain Lang. 2023. PMID: 37087863 Free PMC article.
Mouth and Voice: A Relationship between Visual and Auditory Preference in the Human Superior Temporal Sulcus.
Zhu LL, Beauchamp MS. Zhu LL, et al. J Neurosci. 2017 Mar 8;37(10):2697-2708. doi: 10.1523/JNEUROSCI.2914-16.2017. Epub 2017 Feb 8. J Neurosci. 2017. PMID: 28179553 Free PMC article.
Voice-sensitive brain networks encode talker-specific phonetic detail.
Myers EB, Theodore RM. Myers EB, et al. Brain Lang. 2017 Feb;165:33-44. doi: 10.1016/j.bandl.2016.11.001. Epub 2016 Nov 27. Brain Lang. 2017. PMID: 27898342 Free PMC article.

See all "Cited by" articles

References

1. Abercrombie D. Edinburgh: Edinburgh UP; 1967. Elements of general phonetics.
1. Abrams DA, Nicol T, Zecker S, Kraus N. Right-hemisphere auditory cortex is dominant for coding syllable patterns in speech. J Neurosci. 2008;28:3958–3965. - PMC - PubMed
1. Adank P, Devlin JT. On-line plasticity in spoken sentence comprehension: adapting to time-compressed speech. Neuroimage. 2010;49:1124–1132. - PMC - PubMed
1. Adank P, van Hout R, Smits R. An acoustic description of the vowels of Northern and Southern Standard Dutch. J Acoust Soc Am. 2004;116:1729–1738. - PubMed
1. Ames H, Grossberg S. Speaker normalization using cortical strip maps: a neural model for steady-state vowel categorization. J Acoust Soc Am. 2008;124:3918–3936. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

How the human brain recognizes speech in the context of changing speakers

Affiliation

How the human brain recognizes speech in the context of changing speakers

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

Grants and funding

LinkOut - more resources

Full Text Sources