Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Aug;34(4):218-224.
doi: 10.1177/09637214251318726. Epub 2025 Apr 6.

Speech Perception is Speech Learning

Affiliations

Speech Perception is Speech Learning

Lori L Holt. Curr Dir Psychol Sci. 2025 Aug.

Abstract

Speech conveys both linguistic messages and a wealth of social and identity information about a talker. This information arrives as complex variation across many acoustic dimensions. Ultimately, speech communication depends upon experience within a language community to develop shared long-term knowledge of the mapping from acoustic patterns to the category distinctions that support word recognition, emotion evaluation, and talker identification. A great deal of research has focused on the learning involved in acquiring long-term knowledge to support speech categorization. Inadvertently, this focus may give the impression of a mature learning endpoint. Instead, there seems to be no firm line between perception and learning in speech. The contributions of acoustic dimensions are malleably reweighted continuously as a function of regularities evolving in short term input. In this way, continuous learning across speech impacts the very nature of the mapping from sensory input to perceived category. Broadly, this presents a case study in understanding how incoming sensory input - and the learning that takes place across it -- interacts with existing knowledge to drive predictions that tune the system to support future behavior.

Keywords: Categorization; Perceptual Weights; Speech Perception; Statistical Learning.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Each square illustrates an utterance varying in F0 and VOT, with average perceptual categorization responses painted along a spectrum from blue (pier) to white (beer) in the top row. Notice the strong reliance on VOT in quiet, with secondary contribution from F0. This is quantified in the middle row as normalized perceptual weights. Perception of the same speech sounds shifts in quiet versus noisy listening contexts, with F0 more informative in perceptual categorization decisions in noise. The same listeners rely on different acoustic dimensions to categorize speech across different listening contexts. Stable individual differences in these perceptual weights exist, as shown by plotting the normalized perceptual weight of VOT according to the number of listeners exhibiting that perceptual weight (bottom row).
Figure 2.
Figure 2.. Dimension-based statistical learning.
The top left shows the same VOTxF0 stimulus space plotted in Figure 1. The yellow highlighted regions indicate selective sampling of the space to create short-term speech regularities that match American English norms (Canonical) or violate them to create an accent (Reverse). Each trial of passive listening across 8 of these exposure stimuli (yellow) followed by one of two F0-differentiated test stimuli (blue, orange). Participants categorize the final, test stimulus as beer or pier. The data at the right illustrate the influence of statistical learning across passive exposure on the effectiveness of F0 in signaling beer vs. pier. In the context of the accent, F0 is not a reliable cue to category identity (see Idemaru & Holt, 2011; Hodson et al., 2023).

Similar articles

References

    1. Baese-Berk MM, Chandrasekaran B, & Roark CL (2022). The nature of non-native speech sound representations. The Journal of the Acoustical Society of America, 152(5), 3025–3034. - PMC - PubMed
    1. Bernstein LE (1983). Perceptual development for labeling words varying in voice onset time and fundamental frequency. Journal of Phonetics, 11, 383–393.
    1. Blumstein SE & Stevens KN (1981). Phonetic features and acoustic invariance in speech. Cognition, 10, 25–32. - PubMed
    1. Bradlow AR, & Bent T (2008). Perceptual adaptation to non-native speech. Cognition, 106(2), 707–729. - PMC - PubMed
    1. Escudero P, & Boersma P (2004). Bridging the gap between L2 speech perception research and phonological theory. Studies in Second Language Acquisition, 26, 551–585.

LinkOut - more resources