Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Mar;145(3):1168.
doi: 10.1121/1.5091776.

Mice can learn phonetic categories

Affiliations

Mice can learn phonetic categories

Jonny L Saunders et al. J Acoust Soc Am. 2019 Mar.

Abstract

Speech is perceived as a series of relatively invariant phonemes despite extreme variability in the acoustic signal. To be perceived as nearly-identical phonemes, speech sounds that vary continuously over a range of acoustic parameters must be perceptually discretized by the auditory system. Such many-to-one mappings of undifferentiated sensory information to a finite number of discrete categories are ubiquitous in perception. Although many mechanistic models of phonetic perception have been proposed, they remain largely unconstrained by neurobiological data. Current human neurophysiological methods lack the necessary spatiotemporal resolution to provide it: speech is too fast, and the neural circuitry involved is too small. This study demonstrates that mice are capable of learning generalizable phonetic categories, and can thus serve as a model for phonetic perception. Mice learned to discriminate consonants and generalized consonant identity across novel vowel contexts and speakers, consistent with true category learning. A mouse model, given the powerful genetic and electrophysiological tools for probing neural circuits available for them, has the potential to powerfully augment a mechanistic understanding of phonetic perception.

PubMed Disclaimer

Figures

FIG. 1.
FIG. 1.
Stimuli and task design. (a) Spectrograms of stimuli. Left: Example of an original recording of an isolated CV token (/gI/). Center: the same token pitch-shifted upwards by 10× (3.3 octaves) into the mouse hearing range. Right: Recording of the pitch-shifted token presented in the behavior box. Stimuli retained their overall acoustic structure below 34 kHz (the upper limit of the speaker frequency response). For spectrograms of all 161 tokens see Supplemental Information. (b) Power spectra (dB, Welch's method) of tokens in (a). Black: Original (left frequency axis), red: Pitch-shifted (right frequency axis), blue: Box Recording (right frequency axis). (c) Mice initiated a trial by licking in a center port and responded by licking on one of two side ports. Correct responses were rewarded with water and incorrect responses were punished with a mildly-aversive white noise burst. (d) The difficulty of the task was gradually expanded by adding more tokens (squares), vowels (labels), and speakers (rows) before the mice were tested on novel tokens in a generalization task. (e) Mice (colored lines) varied widely in the duration of training required to reach the generalization phase. Mice were returned to previous levels if they remained at chance performance after reaching a new stage.
FIG. 2.
FIG. 2.
Generalization accuracy by novelty class. Mice generalized stop consonant discrimination to novel CV recordings. (a) Four types of novelty are possible with our stimuli: novel tokens from the speakers and vowels used in the training set (red), novel vowels (blue), novel speakers (purple), and novel speakers + novel vowels (orange). Tokens in the training set are indicated in black. Colors same throughout. (b) Mice that performed better on the training set were better at generalization. Each point shows the performance for a single mouse on a given novelty class, plotted against that mouse's performance on training tokens presented on during the generalization phase (both averaged across the entire generalization phase). Lines show linear regression for each novelty class. (c) Mean accuracy for each novelty class (gray lines indicate individual mice, thick black line is mean of all mice). (d) Mean accuracy for individual mice (colored bars indicate each novelty class). Error bars in (d) are 95% binomial confidence intervals. Mice were assigned one of two sets of training tokens, black and white boxes in (d).
FIG. 3.
FIG. 3.
Learning curve for novel tokens. Performance for both novel and training set tokens dropped transiently and recovered similarly after the transition to the generalization stage. Presentation 0 corresponds to the transition to the generalization stage. The final ten trials before the transition are shown in the gray dashed box. Mean accuracy and 95% binomial confidence intervals are collapsed across mice for novel (red, all novelty classes combined) or learned (black) tokens, by number of presentations in the generalization task. Logistic regression of binomial correct/incorrect responses fit to log-transformed presentation number (lines, shading is smoothed standard error).
FIG. 4.
FIG. 4.
Patterns of individual and group variation. (a) Mean accuracy (color, scale at top) for each mouse (columns) on tokens grouped by consonant, speaker, and vowel (rows). The different training sets (cells outlined with black boxes) led to different patterns of accuracy on the generalization set. (b) Ward clustering dendrogram, colored by cluster. (c) Training set cohorts differed in bias but not mean accuracy.
FIG. 5.
FIG. 5.
Acoustic-Behavior Correlates. F2 Onset-Vowel transitions do not explain observed response patterns. (a) Locus equations relating F2 at burst onset and vowel steady state (sustained) for each token (points), split by consonant [colors, same as (b)]. (b) As the difference of a token's distance from the ideal /g/ and /b/ locus equation lines increased (x axis, greater distance from /g/, smaller distance from /b/ in panel b), /b/ tokens obeyed the predicted categorization, while /g/ tokens did not (slopes of colored lines).

References

    1. Sussman H. M., Fruchter D., Hilbert J., and Sirosh J., “ Linear correlates in the speech signal: the orderly output constraint,” Behav. Brain Sci. 21(2), 241–259; discussion 260-99 (1998). - PubMed
    1. Holt L. L. and Lotto A. J., “ Speech perception as categorization,” Atten. Percept. Psychophys. 72(5), 1218–1227 (2010).10.3758/APP.72.5.1218 - DOI - PMC - PubMed
    1. Kronrod Y., Coppess E., and Feldman N. H., “ A unified account of categorical effects in phonetic perception,” Psychonom. Bull. Rev. 23(6), 1681–1712 (2016).10.3758/s13423-016-1049-y - DOI - PubMed
    1. Liberman A. M., Harris K. S., Hoffman H. S., and Griffith B. C., “ The discrimination of speech sounds within and across phoneme boundaries,” J. Exp. Pyschol. 54(5), 358–368 (1957).10.1037/h0044417 - DOI - PubMed
    1. Elman J. L. and Zipser D., “ Learning the hidden structure of speech.,” J. Acoust. Soc. Am. 83(4), 1615–1626 (1988).10.1121/1.395916 - DOI - PubMed

Publication types