Nonlinear auditory models yield new insights into representations of vowels

Laurel H Carney¹, Joyce M McDonough²

Affiliations

¹ Departments of Biomedical Engineering and Neuroscience, University of Rochester, 601 Elmwood Ave, Box 603, Rochester, NY, 14642, USA. Laurel.Carney@Rochester.edu.
² Department of Linguistics, University of Rochester, Rochester, NY, USA.

PMID: 30565098
PMCID: PMC6581637
DOI: 10.3758/s13414-018-01644-w

Nonlinear auditory models yield new insights into representations of vowels

Laurel H Carney et al. Atten Percept Psychophys. 2019 May.

. 2019 May;81(4):1034-1046.

doi: 10.3758/s13414-018-01644-w.

Authors

Laurel H Carney¹, Joyce M McDonough²

Affiliations

¹ Departments of Biomedical Engineering and Neuroscience, University of Rochester, 601 Elmwood Ave, Box 603, Rochester, NY, 14642, USA. Laurel.Carney@Rochester.edu.
² Department of Linguistics, University of Rochester, Rochester, NY, USA.

PMID: 30565098
PMCID: PMC6581637
DOI: 10.3758/s13414-018-01644-w

Abstract

Studies of vowel systems regularly appeal to the need to understand how the auditory system encodes and processes the information in the acoustic signal. The goal of this study is to present computational models to address this need, and to use the models to illustrate responses to vowels at two levels of the auditory pathway. Many of the models previously used to study auditory representations of speech are based on linear filter banks simulating the tuning of the inner ear. These models do not incorporate key nonlinear response properties of the inner ear that influence responses at conversational-speech sound levels. These nonlinear properties shape neural representations in ways that are important for understanding responses in the central nervous system. The model for auditory-nerve (AN) fibers used here incorporates realistic nonlinear properties associated with the basilar membrane, inner hair cells (IHCs), and the IHC-AN synapse. These nonlinearities set up profiles of f0-related fluctuations that vary in amplitude across the population of frequency-tuned AN fibers. Amplitude fluctuations in AN responses are smallest near formant peaks and largest at frequencies between formants. These f0-related fluctuations strongly excite or suppress neurons in the auditory midbrain, the first level of the auditory pathway where tuning for low-frequency fluctuations in sounds occurs. Formant-related amplitude fluctuations provide representations of the vowel spectrum in discharge rates of midbrain neurons. These representations in the midbrain are robust across a wide range of sound levels, including the entire range of conversational-speech levels, and in the presence of realistic background noise levels.

Keywords: Audition; Physiological psychology; Speech perception.

PubMed Disclaimer

Figures

**Figure 1.**
Saturating input/output function that describes transduction from input pressure to output voltage in the IHC. As implemented by Zhang et al. (2001) and used in Zilany et al. (2014).

**Figure 2.**
Phase-locking to temporal fine-structure and envelope features of vowels, and “capture” of neural timing by harmonics. The two forms of temporal information carried by AN fibers in response to a vowel are illustrated by peri-stimulus-time (PST) histograms (A) and by the dominant components, or the strongest periodicities in the AN responses, shown as a function of the AN fiber characteristic frequencies (CF) (B). Phase-locking to temporal fine structure near each fiber’s CF (A) appears as frequency components in the AN responses at harmonic frequencies near formants (B, horizontal dashed lines) or near the fiber’s CF (B, dashed curve). Temporal phase-locking to the vowel pitch, F0, is the result of the envelope fluctuations created by beating between two or more harmonics, observed in the AN responses as a strong periodicity locked to each pitch period (compare PST histograms in A to vowel waveform in C). Phase-locking to f0 (B, highlighted rectangle) is reduced in fibers tuned near formant peaks (B, vertical dashed lines) due to synchrony capture, or dominance by a single harmonic near the spectral peak (see text). Synchrony capture is apparent on the left in the responses of fibers tuned near formants, which have responses dominated by a single harmonic and reduced phase-locking to the pitch period. A) Modified from Delgutte (1987); B) modified from Delgutte & Kiang (1984).

**Figure 3.**
Modulation transfer functions (MTFs) of midbrain neurons in cat illustrate the sensitivity of these cells’ average discharge rates to amplitude fluctuations in a tone stimulus over a range of modulation frequencies (f_m). Stimuli were tones at each cell’s characteristic frequency (CF), sinusoidally modulated across a range of low frequencies. Percentages of several different MTF types from one physiological study are shown (bandpass, BP; lowpass, LP; highpass, HP; band-reject, BR; allpass, AP). Figure adapted from Nelson & Carney (2007).

**Figure 4.**
Schematic diagram of midbrain neuron models and modulation transfer functions. A) The SFIE model (blue, Nelson & Carney, 2004) is a simple combination of excitatory and inhibitory inputs, first at the level of the cochlear nucleus (CN) in the brainstem and again at the midbrain level. The band-suppressed model neuron (red) receives inhibition (white terminals) from the band-enhanced neuron, and excitation (black terminals) from the brainstem (Carney et al., 2015). B) Most shapes of modulation transfer functions (MTFs) observed in the IC can be explained by these two simple models. The blue curves are band-enhanced MTFs; different best modulation frequencies (MTFs) result from different durations of the excitatory and inhibitory potentials in the model. The red curves illustrate different types of band-suppressed MTFs; these curves are suppressed with respect to the response to an unmodulated tone over some range of modulation frequencies. (After Carney et al., 2015).

**Figure 5.**
A) Schematic diagram showing stimulus waveform, AN population model, and Brainstem/Midbrain population models. The stimulus is /α/ from Hillenbrand et al. (1995), spoken by a male with average f0=128 Hz, presented at 65 dB SPL. Formant frequencies: F₁ = 748 Hz, F₂ = 1293 Hz, F₃ = 2446 Hz, F₄ = 3383 Hz. B) Time-frequency population responses. All AN model fibers are high-spontaneous-rate fibers; 50 BF channels from 150 to 4000 Hz. Midbrain responses are for band-suppressed neurons created by band-enhanced cells with BMF = 128 Hz (see Fig. 4). C) Discharge rates averaged over time, for each BF channel, plotted on logarithmic frequency axes; model AN responses (left) and midbrain responses (right).

**Figure 6.**
Model AN and Midbrain responses to four vowels ( /i/, /e/, /æ/, and /u/, from top to bottom), spoken by 8 speakers in the Hillenbrand et al. (1995). All vowels were scaled to 70 dB SPL. The left column shows population average discharge rate profiles for model AN responses, and the right column shows band-suppressed (BS) model midbrain responses. Each midbrain model had a BMF that was matched to the average f0 of each speaker across the vowels in the Hillenbrand et al. (1995) database. Averages rate responses across the eight speakers are shown as thick black lines in each panel.

**Figure 7.**
Model AN and midbrain band-suppressed rate profiles for vowels at +5 dB SNR, in a background of LTASS noise. (Vowels were presented at 70 dB SPL, and the added LTASS noise was at 65 dB SPL). Note that at this moderate noise level, the AN responses are largely saturated, but the model midbrain response profiles (bold black lines in the right hand-column) still have peaks at many of the formant frequencies.

**Figure 8.**
Model response profiles for vowels in quiet (A, B) and in +5 dB SNR added noise (C, D). Model AN (A, C) and band-suppressed midbrain (B, D) profiles for the four vowels in Figs. 6 and 7, averaged across the eight speakers.

See this image and copyright information in PMC

References

1. Becker-Kristal R (2010). Acoustic typology of vowel inventories and Dispersion Theory: Insights from a large cross-linguistic corpus, Unpublished dissertation, University of California-Los Angeles.
1. Byrne D, Dillon H, Tran K, Arlinger S, Wilbraham K, Cox R, … & Kiessling J (1994). An international comparison of long-term average speech spectra. The Journal of the Acoustical Society of America, 96(4), 2108–2120.
1. Carlson R, & Granström B (1982). Towards an auditory spectrograph, in The Representation of Speech in the Peripheral Auditory System, Carlson R and Granström B, Eds, pp. 109–114.
1. Carney LH (1993). A model for the responses of low-frequency auditory-nerve fibers in cat. The Journal of the Acoustical Society of America, 93(1), 401–417. - PubMed
1. Carney LH (2018). Supra-Threshold Hearing and Fluctuation Profiles: Implications for Sensorineural and Hidden Hearing Loss. J Assoc Res Otolaryngol. 19(4), 331–352. - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions

Grants and funding

R01 DC001641/DC/NIDCD NIH HHS/United States

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Nonlinear auditory models yield new insights into representations of vowels

Affiliations

Nonlinear auditory models yield new insights into representations of vowels

Authors

Affiliations

Abstract

Figures

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources