. 2023;38(4):419-445.

doi: 10.1080/23273798.2022.2105367. Epub 2022 Aug 8.

The acquisition of speech categories: Beyond perceptual narrowing, beyond unsupervised learning and beyond infancy

Bob McMurray¹

Affiliations

PMID: 38425732
PMCID: PMC10904032
DOI: 10.1080/23273798.2022.2105367

The acquisition of speech categories: Beyond perceptual narrowing, beyond unsupervised learning and beyond infancy

Bob McMurray. Lang Cogn Neurosci. 2023.

. 2023;38(4):419-445.

doi: 10.1080/23273798.2022.2105367. Epub 2022 Aug 8.

Author

Bob McMurray¹

Affiliation

¹ Dept. of Psychological and Brain Sciences, Dept. of Communication Sciences and Disorders, Dept. of Linguistics, University of Iowa and Haskins Laboratories.

PMID: 38425732
PMCID: PMC10904032
DOI: 10.1080/23273798.2022.2105367

Abstract

An early achievement in language is carving a variable acoustic space into categories. The canonical story is that infants accomplish this by the second year, when only unsupervised learning is plausible. I challenge this view, synthesizing five lines of developmental, phonetic and computational work. First, unsupervised learning may be insufficient given the statistics of speech (including infant-directed). Second, evidence that infants "have" speech categories rests on tenuous methodological assumptions. Third, the fact that the ecology of the learning environment is unsupervised does not rule out more powerful error driven learning mechanisms. Fourth, several implicit supervisory signals are available to older infants. Finally, development is protracted through adolescence, enabling richer avenues for development. Infancy may be a time of organizing the auditory space, but true categorization only arises via complex developmental cascades later in life. This has implications for critical periods, second language acquisition, and our basic framing of speech perception.

PubMed Disclaimer

Figures

**Figure 1:**
A) Spectrogram of the word *pet*. The first and second formants or energy bands (the critical cues for vowel discrimination) are marked. B) Formant frequencies measured for several hundred utterances containing the vowels /ɛ/ and /ʌ/ (Cole et al., 2010). This shows clustering in a 2-dimensional space. These utterances span 10 talkers, 6 neighboring contexts, and four neighboring vowels, thus capturing significant natural variation; factors which account for approximately 90% of the variance in the formant frequencies.

**Figure 2:**
Canonical results in a Categorical Perception (CP) paradigm. Step refers to position along a speech continuum, where step 1 might be a prototypical /b/ and step 10, a prototypical /p/. Identification (dashed line, left axis) is the proportion of responses matching one of the endpoints and shows a steep boundary at 5. Discrimination is assessed between neighboring points (e.g., between step 1 and 2, 2, and 3, and so forth). The peak indicates that discrimination is better between steps 5 and 6 (spanning the boundary) than between 1 and 2 (both /b/’s). This kind of isomorphism between discrimination and identification appeared to make it straightforward to assume categorization on the basis of differences in discrimination alone.

**Figure 3:**
A) Histograms of VOT from McMurray et al. (2013) (adult directed speech) show a clear bimodal distribution with peaks corresponding to /b/ and /p/. B) Similarly for sibilant fricatives (from McMurray & Jongman, 2011), the spectral mean (the frequency at which most energy is clustered) shows a bimodal distribution with peaks for /s/ and /ʃ/. C) However, for vowels (Cole et al., 2010) when the phonemic identity is not known (the same data as Figure 1B), the most important clustering is by talker gender.

**Figure 4.**
A schematic illustrating the difference between the ecology and mechanisms of learning. For simplification, the speech system (central white box) is as the aspect of the system that maps acoustic inputs to categories. In most models, learning mechanisms are presumed to operate *within* this system (even as they may get information from other systems). The ecology of the learning system includes properties of the environment such as the distribution of cues, or the availability of signals such as visual cues. It can include factors that are internal to the child such as the lexicon, knowledge of the reading system, or speech production (many of which are discussed later).

**Figure 5:**
Vowel formant measurements from McMurray et al. (2013) for A) adult directed speech; and B) infant directed speech. In each plot, the ellipses indicate the SD of the first and second formant for that vowel. Squares indicate the mean values. The connected circles reflect the other register (e.g., for panel A the mean of IDS, and for panel B, ADS) to show the change between registers.

**Figure 6:**
Typical results of experiments examining speech categorization in school-age children (e.g., Hazan & Barrett, 2000). Schematic results from tasks in which children hear a token from a speech continuum (e.g., spanning /b/ to /p/) and label it. Younger children typically show shallower slopes than older.

**Figure 7:**
Results from McMurray et al. (2018). Fixations in the Visual World Paradigm (VWP) are converted to measure analogous to standard phoneme identification (Figure 6) to assess how categorization unfolds over time. Here a fixation bias of −1 indicates that the listener is fully committed to /b/ at that time, while a bias of +1 indicates a complete commitment to /p/. At 300 msec, 7–8-year-old children show little departure from 0 at any step along the continuum, suggesting they have not yet begun to categorize the sounds. However, older children show some departure, with differences between 11–12 and 17–18 y.o. children. At later times (B-D), the categorization function expands, but developmental differences can be seen throughout. E) Proportion of looks to the competitor (area under the curve) as a function of distance from the category boundary, for trials in which the child chose the “correct” phoneme. Older children (blue lines), like adults, show a gradient response with competition falling off as the continuum step departs in either direction from the boundary at 0. This suggests robust sensitivity to fine grained detail. In contrast, younger children show heightened competition overall, and reduced sensitivity.

See this image and copyright information in PMC

Cited by

Moving away from deficiency models: Gradiency in bilingual speech categorization.
Kutlu E, Chiu S, McMurray B. Kutlu E, et al. Front Psychol. 2022 Nov 24;13:1033825. doi: 10.3389/fpsyg.2022.1033825. eCollection 2022. Front Psychol. 2022. PMID: 36507048 Free PMC article.
Brain-Inspired Multisensory Learning: A Systematic Review of Neuroplasticity and Cognitive Outcomes in Adult Multicultural and Second Language Acquisition.
Gkintoni E, Vassilopoulos SP, Nikolaou G. Gkintoni E, et al. Biomimetics (Basel). 2025 Jun 12;10(6):397. doi: 10.3390/biomimetics10060397. Biomimetics (Basel). 2025. PMID: 40558367 Free PMC article. Review.
The myth of categorical perception.
McMurray B. McMurray B. J Acoust Soc Am. 2022 Dec;152(6):3819. doi: 10.1121/10.0016614. J Acoust Soc Am. 2022. PMID: 36586868 Free PMC article. Review.

References

1. Albareda-Castellot B, Pons F, & Sebastián-Gallés N. (2011). The acquisition of phonetic categories in bilingual infants: New data from an anticipatory eye movement paradigm. Developmental Science, 14(2), 395–401. - PubMed
1. Andruski JE, Blumstein SE, & Burton MW (1994). The effect of subphonetic differences on lexical access. Cognition, 52, 163–187. - PubMed
1. Apfelbaum KS, Kutlu E, McMurray B, & Kapnoula E. (submitted). Don’t Force It! Gradient Speech Categorization Calls for Continuous Categorization Tasks. Journal of the Acoustical Society of America, https://psyarxiv.com/7w93f/. - PMC - PubMed
1. Apfelbaum KS, & McMurray B. (2011). Using variability to guide dimensional weighting: Associative mechanisms in early word learning. Cognitive Science, 35(6), 1105–1138. - PMC - PubMed
1. Benders T. (2013). Nature’s distributional-learning experiment The University of Amsterdam]. Amsterdam, The Netherlands.

Grants and funding

LinkOut - more resources

Full Text Sources
- Europe PubMed Central
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

The acquisition of speech categories: Beyond perceptual narrowing, beyond unsupervised learning and beyond infancy

Affiliation

The acquisition of speech categories: Beyond perceptual narrowing, beyond unsupervised learning and beyond infancy

Author

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Grants and funding

LinkOut - more resources

Full Text Sources

Abstract

Figures

Similar articles

Cited by

References

Related information

Grants and funding

LinkOut - more resources

Full Text Sources