Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jun;26(3):985-992.
doi: 10.3758/s13423-018-1551-5.

Distributional learning for speech reflects cumulative exposure to a talker's phonetic distributions

Affiliations

Distributional learning for speech reflects cumulative exposure to a talker's phonetic distributions

Rachel M Theodore et al. Psychon Bull Rev. 2019 Jun.

Abstract

Efficient speech perception requires listeners to maintain an exquisite tension between stability of the language architecture and flexibility to accommodate variation in the input, such as that associated with individual talker differences in speech production. Achieving this tension can be guided by top-down learning mechanisms, wherein lexical information constrains interpretation of speech input, and by bottom-up learning mechanisms, in which distributional information in the speech signal is used to optimize the mapping to speech sound categories. An open question for theories of perceptual learning concerns the nature of the representations that are built for individual talkers: do these representations reflect long-term, global exposure to a talker or rather only short-term, local exposure? Recent research suggests that when lexical knowledge is used to resolve a talker's ambiguous productions, listeners disregard previous experience with a talker and instead rely on only recent experience, a finding that is contrary to predictions of Bayesian belief-updating accounts of perceptual adaptation. Here we use a distributional learning paradigm in which lexical information is not explicitly required to resolve ambiguous input to provide an additional test of global versus local exposure accounts. Listeners completed two blocks of phonetic categorization for stimuli that differed in voice-onset-time, a probabilistic cue to the voicing contrast in English stop consonants. In each block, two distributions were presented, one specifying /g/ and one specifying /k/. Across the two blocks, variance of the distributions was manipulated to be either narrow or wide. The critical manipulation was order of the two blocks; half of the listeners were first exposed to the narrow distributions followed by the wide distributions, with the order reversed for the other half of the listeners. The results showed that for earlier trials, the identification slope was steeper for the narrow-wide group compared to the wide-narrow group, but this difference was attenuated for later trials. The between-group convergence was driven by an asymmetry in learning between the two orders such that only those in the narrow-wide group showed slope movement during exposure, a pattern that was mirrored by computational simulations in which the distributional statistics of the present talker were integrated with prior experience with English. This pattern of results suggests that listeners did not disregard all prior experience with the talker, and instead used cumulative exposure to guide phonetic decisions, which raises the possibility that accommodating a talker's phonetic signature entails maintaining representations that reflect global experience.

Keywords: Computational models; Distributional learning; Perceptual learning; Speech perception.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Histograms of the input distributions and predicted identification functions for the local versus global tracking hypotheses. Panel A shows the input distributions for the narrow and wide blocks, and the distributions formed by aggregating distributions across the two blocks. Panel B shows the categorization functions predicted by equation (1) for each order group in block one (left), for the local statistics in block two (middle), and for the global statistics in block two (right). The local statistics predictions were formed based on the input presented in each block; the global statistics predictions were formed considering the distributional information that was presented across the two blocks combined.
Figure 2
Figure 2
Panel A shows the predicted effect of VOT on voiceless responses in each block for the narrow-wide (NW) and wide-narrow (WN) order groups in terms of the fixed-effects of the GLMM described in the main text. To promote visualization, the abscissa range spans the four most intermediate VOTs of the input distributions. Panel B shows the simple slope (beta estimate) for VOT in each block for each order group; error bars indicate the standard error of the beta estimate. Panel C shows the simple slope (beta estimate) for VOT at trials 200, 325, and 450 for each order group; error bars indicate the standard error of the beta estimate.
Figure 3
Figure 3
Predicted categorization slopes from the computational simulations in experiment two. The three panels show simulations results for the three unique prior specifications (shown at left in each panel). The means of the distributions were manipulated across the prior specifications to be consistent with those presented in the behavioral test (/g/ = 40 ms, /k/ = 92 ms), shifted down ~10 ms (/g/ = 30 ms, /k/ = 80 ms), or shifted up ~10 ms (/g/ = 50 ms, /k/ = 100 ms) At right in each panel are the predicted slopes for the narrow-wide (NW) and wide-narrow (WN) order groups at three trials (trial 200, trial 235, and trial 450) for each of the three confidence parameters (200, 400, and 800). Error bars indicate standard deviation of the predicted slope for the 40 simulated listeners in each group.

Similar articles

Cited by

References

    1. Clayards M, Tanenhaus MK, Aslin RN, & Jacobs RA (2008). Perception of speech reflects optimal use of probabilistic speech cues. Cognition, 108, 804–809. - PMC - PubMed
    1. Hillenbrand J, Getty LA, Clark MJ, & Wheeler K (1995). Acoustic characteristics of American English vowels. Journal of the Acoustical society of America, 97(5), 3099–3111. - PubMed
    1. Idemaru K, & Holt LL (2011). Word recognition reflects dimension-based statistical learning. Journal of Experimental Psychology: Human Perception and Performance, 37(6), 1939–1956. - PMC - PubMed
    1. Jongman A, Wayland R, & Wong S (2000). Acoustic characteristics of English fricatives. Journal of the Acoustical Society of America, 108(3), 1252–1263. - PubMed
    1. Kleinschmidt DF (2017). beliefupdatr: Belief updating for phonetic adaptation. R package version 0.0.3.

LinkOut - more resources