Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023;38(4):419-445.
doi: 10.1080/23273798.2022.2105367. Epub 2022 Aug 8.

The acquisition of speech categories: Beyond perceptual narrowing, beyond unsupervised learning and beyond infancy

Affiliations

The acquisition of speech categories: Beyond perceptual narrowing, beyond unsupervised learning and beyond infancy

Bob McMurray. Lang Cogn Neurosci. 2023.

Abstract

An early achievement in language is carving a variable acoustic space into categories. The canonical story is that infants accomplish this by the second year, when only unsupervised learning is plausible. I challenge this view, synthesizing five lines of developmental, phonetic and computational work. First, unsupervised learning may be insufficient given the statistics of speech (including infant-directed). Second, evidence that infants "have" speech categories rests on tenuous methodological assumptions. Third, the fact that the ecology of the learning environment is unsupervised does not rule out more powerful error driven learning mechanisms. Fourth, several implicit supervisory signals are available to older infants. Finally, development is protracted through adolescence, enabling richer avenues for development. Infancy may be a time of organizing the auditory space, but true categorization only arises via complex developmental cascades later in life. This has implications for critical periods, second language acquisition, and our basic framing of speech perception.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
A) Spectrogram of the word pet. The first and second formants or energy bands (the critical cues for vowel discrimination) are marked. B) Formant frequencies measured for several hundred utterances containing the vowels /ɛ/ and /ʌ/ (Cole et al., 2010). This shows clustering in a 2-dimensional space. These utterances span 10 talkers, 6 neighboring contexts, and four neighboring vowels, thus capturing significant natural variation; factors which account for approximately 90% of the variance in the formant frequencies.
Figure 2:
Figure 2:
Canonical results in a Categorical Perception (CP) paradigm. Step refers to position along a speech continuum, where step 1 might be a prototypical /b/ and step 10, a prototypical /p/. Identification (dashed line, left axis) is the proportion of responses matching one of the endpoints and shows a steep boundary at 5. Discrimination is assessed between neighboring points (e.g., between step 1 and 2, 2, and 3, and so forth). The peak indicates that discrimination is better between steps 5 and 6 (spanning the boundary) than between 1 and 2 (both /b/’s). This kind of isomorphism between discrimination and identification appeared to make it straightforward to assume categorization on the basis of differences in discrimination alone.
Figure 3:
Figure 3:
A) Histograms of VOT from McMurray et al. (2013) (adult directed speech) show a clear bimodal distribution with peaks corresponding to /b/ and /p/. B) Similarly for sibilant fricatives (from McMurray & Jongman, 2011), the spectral mean (the frequency at which most energy is clustered) shows a bimodal distribution with peaks for /s/ and /ʃ/. C) However, for vowels (Cole et al., 2010) when the phonemic identity is not known (the same data as Figure 1B), the most important clustering is by talker gender.
Figure 4.
Figure 4.
A schematic illustrating the difference between the ecology and mechanisms of learning. For simplification, the speech system (central white box) is as the aspect of the system that maps acoustic inputs to categories. In most models, learning mechanisms are presumed to operate within this system (even as they may get information from other systems). The ecology of the learning system includes properties of the environment such as the distribution of cues, or the availability of signals such as visual cues. It can include factors that are internal to the child such as the lexicon, knowledge of the reading system, or speech production (many of which are discussed later).
Figure 5:
Figure 5:
Vowel formant measurements from McMurray et al. (2013) for A) adult directed speech; and B) infant directed speech. In each plot, the ellipses indicate the SD of the first and second formant for that vowel. Squares indicate the mean values. The connected circles reflect the other register (e.g., for panel A the mean of IDS, and for panel B, ADS) to show the change between registers.
Figure 6:
Figure 6:
Typical results of experiments examining speech categorization in school-age children (e.g., Hazan & Barrett, 2000). Schematic results from tasks in which children hear a token from a speech continuum (e.g., spanning /b/ to /p/) and label it. Younger children typically show shallower slopes than older.
Figure 7:
Figure 7:
Results from McMurray et al. (2018). Fixations in the Visual World Paradigm (VWP) are converted to measure analogous to standard phoneme identification (Figure 6) to assess how categorization unfolds over time. Here a fixation bias of −1 indicates that the listener is fully committed to /b/ at that time, while a bias of +1 indicates a complete commitment to /p/. At 300 msec, 7–8-year-old children show little departure from 0 at any step along the continuum, suggesting they have not yet begun to categorize the sounds. However, older children show some departure, with differences between 11–12 and 17–18 y.o. children. At later times (B-D), the categorization function expands, but developmental differences can be seen throughout. E) Proportion of looks to the competitor (area under the curve) as a function of distance from the category boundary, for trials in which the child chose the “correct” phoneme. Older children (blue lines), like adults, show a gradient response with competition falling off as the continuum step departs in either direction from the boundary at 0. This suggests robust sensitivity to fine grained detail. In contrast, younger children show heightened competition overall, and reduced sensitivity.

Similar articles

Cited by

References

    1. Albareda-Castellot B, Pons F, & Sebastián-Gallés N. (2011). The acquisition of phonetic categories in bilingual infants: New data from an anticipatory eye movement paradigm. Developmental Science, 14(2), 395–401. - PubMed
    1. Andruski JE, Blumstein SE, & Burton MW (1994). The effect of subphonetic differences on lexical access. Cognition, 52, 163–187. - PubMed
    1. Apfelbaum KS, Kutlu E, McMurray B, & Kapnoula E. (submitted). Don’t Force It! Gradient Speech Categorization Calls for Continuous Categorization Tasks. Journal of the Acoustical Society of America, https://psyarxiv.com/7w93f/. - PMC - PubMed
    1. Apfelbaum KS, & McMurray B. (2011). Using variability to guide dimensional weighting: Associative mechanisms in early word learning. Cognitive Science, 35(6), 1105–1138. - PMC - PubMed
    1. Benders T. (2013). Nature’s distributional-learning experiment The University of Amsterdam]. Amsterdam, The Netherlands.

LinkOut - more resources