Distributional learning for speech reflects cumulative exposure to a talker's phonetic distributions

Rachel M Theodore^{1

2}, Nicholas R Monto^{3

4}

Affiliations

¹ Department of Speech, Language, and Hearing Sciences, University of Connecticut, 850 Bolton Road, Unit 1085, Storrs, CT, 06269, USA. rachel.theodore@uconn.edu.
² Connecticut Institute for the Brain and Cognitive Sciences, University of Connecticut, 337 Mansfield Road, Unit 1872, Storrs, CT, 06269, USA. rachel.theodore@uconn.edu.
³ Department of Speech, Language, and Hearing Sciences, University of Connecticut, 850 Bolton Road, Unit 1085, Storrs, CT, 06269, USA.
⁴ Connecticut Institute for the Brain and Cognitive Sciences, University of Connecticut, 337 Mansfield Road, Unit 1872, Storrs, CT, 06269, USA.

PMID: 30604404
PMCID: PMC6559869
DOI: 10.3758/s13423-018-1551-5

Distributional learning for speech reflects cumulative exposure to a talker's phonetic distributions

Rachel M Theodore et al. Psychon Bull Rev. 2019 Jun.

. 2019 Jun;26(3):985-992.

doi: 10.3758/s13423-018-1551-5.

Authors

Rachel M Theodore^{1

2}, Nicholas R Monto^{3

4}

Affiliations

¹ Department of Speech, Language, and Hearing Sciences, University of Connecticut, 850 Bolton Road, Unit 1085, Storrs, CT, 06269, USA. rachel.theodore@uconn.edu.
² Connecticut Institute for the Brain and Cognitive Sciences, University of Connecticut, 337 Mansfield Road, Unit 1872, Storrs, CT, 06269, USA. rachel.theodore@uconn.edu.
³ Department of Speech, Language, and Hearing Sciences, University of Connecticut, 850 Bolton Road, Unit 1085, Storrs, CT, 06269, USA.
⁴ Connecticut Institute for the Brain and Cognitive Sciences, University of Connecticut, 337 Mansfield Road, Unit 1872, Storrs, CT, 06269, USA.

PMID: 30604404
PMCID: PMC6559869
DOI: 10.3758/s13423-018-1551-5

Abstract

Efficient speech perception requires listeners to maintain an exquisite tension between stability of the language architecture and flexibility to accommodate variation in the input, such as that associated with individual talker differences in speech production. Achieving this tension can be guided by top-down learning mechanisms, wherein lexical information constrains interpretation of speech input, and by bottom-up learning mechanisms, in which distributional information in the speech signal is used to optimize the mapping to speech sound categories. An open question for theories of perceptual learning concerns the nature of the representations that are built for individual talkers: do these representations reflect long-term, global exposure to a talker or rather only short-term, local exposure? Recent research suggests that when lexical knowledge is used to resolve a talker's ambiguous productions, listeners disregard previous experience with a talker and instead rely on only recent experience, a finding that is contrary to predictions of Bayesian belief-updating accounts of perceptual adaptation. Here we use a distributional learning paradigm in which lexical information is not explicitly required to resolve ambiguous input to provide an additional test of global versus local exposure accounts. Listeners completed two blocks of phonetic categorization for stimuli that differed in voice-onset-time, a probabilistic cue to the voicing contrast in English stop consonants. In each block, two distributions were presented, one specifying /g/ and one specifying /k/. Across the two blocks, variance of the distributions was manipulated to be either narrow or wide. The critical manipulation was order of the two blocks; half of the listeners were first exposed to the narrow distributions followed by the wide distributions, with the order reversed for the other half of the listeners. The results showed that for earlier trials, the identification slope was steeper for the narrow-wide group compared to the wide-narrow group, but this difference was attenuated for later trials. The between-group convergence was driven by an asymmetry in learning between the two orders such that only those in the narrow-wide group showed slope movement during exposure, a pattern that was mirrored by computational simulations in which the distributional statistics of the present talker were integrated with prior experience with English. This pattern of results suggests that listeners did not disregard all prior experience with the talker, and instead used cumulative exposure to guide phonetic decisions, which raises the possibility that accommodating a talker's phonetic signature entails maintaining representations that reflect global experience.

Keywords: Computational models; Distributional learning; Perceptual learning; Speech perception.

PubMed Disclaimer

Figures

**Figure 1**
Histograms of the input distributions and predicted identification functions for the local versus global tracking hypotheses. Panel A shows the input distributions for the narrow and wide blocks, and the distributions formed by aggregating distributions across the two blocks. Panel B shows the categorization functions predicted by equation (1) for each order group in block one (left), for the local statistics in block two (middle), and for the global statistics in block two (right). The local statistics predictions were formed based on the input presented in each block; the global statistics predictions were formed considering the distributional information that was presented across the two blocks combined.

**Figure 2**
Panel A shows the predicted effect of VOT on voiceless responses in each block for the narrow-wide (NW) and wide-narrow (WN) order groups in terms of the fixed-effects of the GLMM described in the main text. To promote visualization, the abscissa range spans the four most intermediate VOTs of the input distributions. Panel B shows the simple slope (beta estimate) for VOT in each block for each order group; error bars indicate the standard error of the beta estimate. Panel C shows the simple slope (beta estimate) for VOT at trials 200, 325, and 450 for each order group; error bars indicate the standard error of the beta estimate.

**Figure 3**
Predicted categorization slopes from the computational simulations in experiment two. The three panels show simulations results for the three unique prior specifications (shown at left in each panel). The means of the distributions were manipulated across the prior specifications to be consistent with those presented in the behavioral test (/g/ = 40 ms, /k/ = 92 ms), shifted down ~10 ms (/g/ = 30 ms, /k/ = 80 ms), or shifted up ~10 ms (/g/ = 50 ms, /k/ = 100 ms) At right in each panel are the predicted slopes for the narrow-wide (NW) and wide-narrow (WN) order groups at three trials (trial 200, trial 235, and trial 450) for each of the three confidence parameters (200, 400, and 800). Error bars indicate standard deviation of the predicted slope for the 40 simulated listeners in each group.

See this image and copyright information in PMC

Cited by

From first encounters to longitudinal exposure: a repeated exposure-test paradigm for monitoring speech adaptation.
Xie X, Kurumada C. Xie X, et al. Front Psychol. 2024 May 30;15:1383904. doi: 10.3389/fpsyg.2024.1383904. eCollection 2024. Front Psychol. 2024. PMID: 38873525 Free PMC article.
Computational Modeling of an Auditory Lexical Decision Experiment Using DIANA.
Nenadić F, Tucker BV, Ten Bosch L. Nenadić F, et al. Lang Speech. 2023 Sep;66(3):564-605. doi: 10.1177/00238309221111752. Epub 2022 Aug 24. Lang Speech. 2023. PMID: 36000386 Free PMC article.
The Role of the Right Hemisphere in Processing Phonetic Variability Between Talkers.
Luthra S. Luthra S. Neurobiol Lang (Camb). 2021 Feb 1;2(1):138-151. doi: 10.1162/nol_a_00028. eCollection 2021. Neurobiol Lang (Camb). 2021. PMID: 37213418 Free PMC article. Review.
A second chance for a first impression: Sensitivity to cumulative input statistics for lexically guided perceptual learning.
Tzeng CY, Nygaard LC, Theodore RM. Tzeng CY, et al. Psychon Bull Rev. 2021 Jun;28(3):1003-1014. doi: 10.3758/s13423-020-01840-6. Epub 2021 Jan 14. Psychon Bull Rev. 2021. PMID: 33443706
SingleMALD: Investigating practice effects in auditory lexical decision.
Nenadić F, Bujandrić K, Kelley MC, Tucker BV. Nenadić F, et al. Behav Res Methods. 2025 Apr 2;57(5):136. doi: 10.3758/s13428-025-02628-z. Behav Res Methods. 2025. PMID: 40175775 Free PMC article.

See all "Cited by" articles

References

1. Clayards M, Tanenhaus MK, Aslin RN, & Jacobs RA (2008). Perception of speech reflects optimal use of probabilistic speech cues. Cognition, 108, 804–809. - PMC - PubMed
1. Hillenbrand J, Getty LA, Clark MJ, & Wheeler K (1995). Acoustic characteristics of American English vowels. Journal of the Acoustical society of America, 97(5), 3099–3111. - PubMed
1. Idemaru K, & Holt LL (2011). Word recognition reflects dimension-based statistical learning. Journal of Experimental Psychology: Human Perception and Performance, 37(6), 1939–1956. - PMC - PubMed
1. Jongman A, Wayland R, & Wong S (2000). Acoustic characteristics of English fricatives. Journal of the Acoustical Society of America, 108(3), 1252–1263. - PubMed
1. Kleinschmidt DF (2017). beliefupdatr: Belief updating for phonetic adaptation. R package version 0.0.3.

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Distributional learning for speech reflects cumulative exposure to a talker's phonetic distributions

Affiliations

Distributional learning for speech reflects cumulative exposure to a talker's phonetic distributions

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources