Speaker normalization using cortical strip maps: a neural model for steady-state vowel categorization
- PMID: 19206817
- DOI: 10.1121/1.2997478
Speaker normalization using cortical strip maps: a neural model for steady-state vowel categorization
Abstract
Auditory signals of speech are speaker dependent, but representations of language meaning are speaker independent. The transformation from speaker-dependent to speaker-independent language representations enables speech to be learned and understood from different speakers. A neural model is presented that performs speaker normalization to generate a pitch-independent representation of speech sounds, while also preserving information about speaker identity. This speaker-invariant representation is categorized into unitized speech items, which input to sequential working memories whose distributed patterns can be categorized, or chunked, into syllable and word representations. The proposed model fits into an emerging model of auditory streaming and speech categorization. The auditory streaming and speaker normalization parts of the model both use multiple strip representations and asymmetric competitive circuits, thereby suggesting that these two circuits arose from similar neural designs. The normalized speech items are rapidly categorized and stably remembered by adaptive resonance theory circuits. Simulations use synthesized steady-state vowels from the Peterson and Barney [Peterson, G. E., and Barney, H.L., J. Acoust. Soc. Am. 24, 175-184 (1952).] vowel database and achieve accuracy rates similar to those achieved by human listeners. These results are compared to behavioral data and other speaker normalization models.
Similar articles
-
Attentional influences on functional mapping of speech sounds in human auditory cortex.BMC Neurosci. 2004 Jul 21;5:24. doi: 10.1186/1471-2202-5-24. BMC Neurosci. 2004. PMID: 15268765 Free PMC article.
-
Nonuniform speaker normalization using affine transformation.J Acoust Soc Am. 2008 Sep;124(3):1727-38. doi: 10.1121/1.2951597. J Acoust Soc Am. 2008. PMID: 19045663
-
"Who" is saying "what"? Brain-based decoding of human voice and speech.Science. 2008 Nov 7;322(5903):970-3. doi: 10.1126/science.1164318. Science. 2008. PMID: 18988858
-
Functional imaging of auditory scene analysis.Hear Res. 2014 Jan;307:98-110. doi: 10.1016/j.heares.2013.08.003. Epub 2013 Aug 19. Hear Res. 2014. PMID: 23968821 Review.
-
Static, dynamic, and relational properties in vowel perception.J Acoust Soc Am. 1989 May;85(5):2088-113. doi: 10.1121/1.397861. J Acoust Soc Am. 1989. PMID: 2659638 Review.
Cited by
-
A Neural Model of Intrinsic and Extrinsic Hippocampal Theta Rhythms: Anatomy, Neurophysiology, and Function.Front Syst Neurosci. 2021 Apr 28;15:665052. doi: 10.3389/fnsys.2021.665052. eCollection 2021. Front Syst Neurosci. 2021. PMID: 33994965 Free PMC article. Review.
-
Developmental Designs and Adult Functions of Cortical Maps in Multiple Modalities: Perception, Attention, Navigation, Numbers, Streaming, Speech, and Cognition.Front Neuroinform. 2020 Feb 6;14:4. doi: 10.3389/fninf.2020.00004. eCollection 2020. Front Neuroinform. 2020. PMID: 32116628 Free PMC article. Review.
-
Toward Understanding the Brain Dynamics of Music: Learning and Conscious Performance of Lyrics and Melodies With Variable Rhythms and Beats.Front Syst Neurosci. 2022 Apr 8;16:766239. doi: 10.3389/fnsys.2022.766239. eCollection 2022. Front Syst Neurosci. 2022. PMID: 35465193 Free PMC article.
-
How the human brain recognizes speech in the context of changing speakers.J Neurosci. 2010 Jan 13;30(2):629-38. doi: 10.1523/JNEUROSCI.2742-09.2010. J Neurosci. 2010. PMID: 20071527 Free PMC article.
MeSH terms
LinkOut - more resources
Full Text Sources
