Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jul;83(5):2217-2228.
doi: 10.3758/s13414-021-02261-w. Epub 2021 Mar 22.

Perceptual learning of multiple talkers requires additional exposure

Affiliations

Perceptual learning of multiple talkers requires additional exposure

Sahil Luthra et al. Atten Percept Psychophys. 2021 Jul.

Abstract

Because different talkers produce their speech sounds differently, listeners benefit from maintaining distinct generative models (sets of beliefs) about the correspondence between acoustic information and phonetic categories for different talkers. A robust literature on phonetic recalibration indicates that when listeners encounter a talker who produces their speech sounds idiosyncratically (e.g., a talker who produces their /s/ sound atypically), they can update their generative model for that talker. Such recalibration has been shown to occur in a relatively talker-specific way. Because listeners in ecological situations often meet several new talkers at once, the present study considered how the process of simultaneously updating two distinct generative models compares to updating one model at a time. Listeners were exposed to two talkers, one who produced /s/ atypically and one who produced /∫/ atypically. Critically, these talkers only produced these sounds in contexts where lexical information disambiguated the phoneme's identity (e.g., epi_ode, flouri_ing). When initial exposure to the two talkers was blocked by voice (Experiment 1), listeners recalibrated to these talkers after relatively little exposure to each talker (32 instances per talker, of which 16 contained ambiguous fricatives). However, when the talkers were intermixed during learning (Experiment 2), listeners required more exposure trials before they were able to adapt to the idiosyncratic productions of these talkers (64 instances per talker, of which 32 contained ambiguous fricatives). Results suggest that there is a perceptual cost to simultaneously updating multiple distinct generative models, potentially because listeners must first select which generative model to update.

Keywords: Perceptual learning; Speech perception.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
(A) General schematic for experiment task structure. For Experiment 1 (blocked), listeners were exposed to one talker and then immediately completed a phonetic categorization task for the same talker; this process was then repeated for the second talker. For Experiment 2 (mixed), listeners were exposed to the two talkers in an intermixed fashion. This exposure phase was followed by two blocks of phonetic categorization (grouped by talker). (B) Exposure task schematic. Participants first listened to the entire word, and were then allotted 4000 ms to complete the talker decision (indicate whether the word was spoken by the male or female talker). If feedback was a part of the experiment (Experiments 2B and 2D), then feedback followed immediately after the key press. After the key press and/or feedback display, there was an inter-stimulus interval of 1000 ms. (C) Phonetic categorization task schematic. Participants listened to a token from a 7-step continuum from sign to shine. They then indicated via key press whether the word sounded more like sign (sign image) or more like shine (sun image). There was no feedback. After the keypress, there was an ISI or 1000ms.
Figure 2.
Figure 2.
Results from the phonetic categorization task from Experiment 1 and 2A–D. Purple lines indicate categorization of the /∫/-biased talker while the red lines are for the /s/-biased talker. Listed along the x-axis are each step along a continuum from a clear “sign” production (step 1) and a clear “shine” production (step 7). Percent “shine” responses are indicated along the y-axis, from low to high. (A) Blocked talker exposure (Exp 1). (B) Experiment 2. Upper left: mixed talker exposure, low-exposure, no feedback. Upper right: mixed talker exposure, low-exposure, with feedback. Lower left: mixed talker exposure, high-exposure, no feedback. Lower right: mixed talker exposure, high exposure, with feedback. Error bars represent a 95% confidence interval.

References

    1. Bates D, Maechler M, Bolker B, & Walker S (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1), 1–48.
    1. Boersma P, & Weenik D (2017). Praat: Doing phonetics by computer.
    1. Bradlow AR, & Bent T (2008). Perceptual adaptation to non-native speech. Cognition, 106(2), 707–729. - PMC - PubMed
    1. Chandrasekaran B, Koslov SR, & Maddox WT (2014). Toward a dual-learning systems model of speech category learning. Frontiers in Psychology, 5(July), 1–17. - PMC - PubMed
    1. Chandrasekaran B, Yi HG, & Maddox WT (2014). Dual-learning systems during speech category learning. Psychonomic Bulletin and Review, 21(2), 488–495. - PMC - PubMed

LinkOut - more resources