Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 May;81(4):1034-1046.
doi: 10.3758/s13414-018-01644-w.

Nonlinear auditory models yield new insights into representations of vowels

Affiliations

Nonlinear auditory models yield new insights into representations of vowels

Laurel H Carney et al. Atten Percept Psychophys. 2019 May.

Abstract

Studies of vowel systems regularly appeal to the need to understand how the auditory system encodes and processes the information in the acoustic signal. The goal of this study is to present computational models to address this need, and to use the models to illustrate responses to vowels at two levels of the auditory pathway. Many of the models previously used to study auditory representations of speech are based on linear filter banks simulating the tuning of the inner ear. These models do not incorporate key nonlinear response properties of the inner ear that influence responses at conversational-speech sound levels. These nonlinear properties shape neural representations in ways that are important for understanding responses in the central nervous system. The model for auditory-nerve (AN) fibers used here incorporates realistic nonlinear properties associated with the basilar membrane, inner hair cells (IHCs), and the IHC-AN synapse. These nonlinearities set up profiles of f0-related fluctuations that vary in amplitude across the population of frequency-tuned AN fibers. Amplitude fluctuations in AN responses are smallest near formant peaks and largest at frequencies between formants. These f0-related fluctuations strongly excite or suppress neurons in the auditory midbrain, the first level of the auditory pathway where tuning for low-frequency fluctuations in sounds occurs. Formant-related amplitude fluctuations provide representations of the vowel spectrum in discharge rates of midbrain neurons. These representations in the midbrain are robust across a wide range of sound levels, including the entire range of conversational-speech levels, and in the presence of realistic background noise levels.

Keywords: Audition; Physiological psychology; Speech perception.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Saturating input/output function that describes transduction from input pressure to output voltage in the IHC. As implemented by Zhang et al. (2001) and used in Zilany et al. (2014).
Figure 2.
Figure 2.
Phase-locking to temporal fine-structure and envelope features of vowels, and “capture” of neural timing by harmonics. The two forms of temporal information carried by AN fibers in response to a vowel are illustrated by peri-stimulus-time (PST) histograms (A) and by the dominant components, or the strongest periodicities in the AN responses, shown as a function of the AN fiber characteristic frequencies (CF) (B). Phase-locking to temporal fine structure near each fiber’s CF (A) appears as frequency components in the AN responses at harmonic frequencies near formants (B, horizontal dashed lines) or near the fiber’s CF (B, dashed curve). Temporal phase-locking to the vowel pitch, F0, is the result of the envelope fluctuations created by beating between two or more harmonics, observed in the AN responses as a strong periodicity locked to each pitch period (compare PST histograms in A to vowel waveform in C). Phase-locking to f0 (B, highlighted rectangle) is reduced in fibers tuned near formant peaks (B, vertical dashed lines) due to synchrony capture, or dominance by a single harmonic near the spectral peak (see text). Synchrony capture is apparent on the left in the responses of fibers tuned near formants, which have responses dominated by a single harmonic and reduced phase-locking to the pitch period. A) Modified from Delgutte (1987); B) modified from Delgutte & Kiang (1984).
Figure 3.
Figure 3.
Modulation transfer functions (MTFs) of midbrain neurons in cat illustrate the sensitivity of these cells’ average discharge rates to amplitude fluctuations in a tone stimulus over a range of modulation frequencies (fm). Stimuli were tones at each cell’s characteristic frequency (CF), sinusoidally modulated across a range of low frequencies. Percentages of several different MTF types from one physiological study are shown (bandpass, BP; lowpass, LP; highpass, HP; band-reject, BR; allpass, AP). Figure adapted from Nelson & Carney (2007).
Figure 4.
Figure 4.
Schematic diagram of midbrain neuron models and modulation transfer functions. A) The SFIE model (blue, Nelson & Carney, 2004) is a simple combination of excitatory and inhibitory inputs, first at the level of the cochlear nucleus (CN) in the brainstem and again at the midbrain level. The band-suppressed model neuron (red) receives inhibition (white terminals) from the band-enhanced neuron, and excitation (black terminals) from the brainstem (Carney et al., 2015). B) Most shapes of modulation transfer functions (MTFs) observed in the IC can be explained by these two simple models. The blue curves are band-enhanced MTFs; different best modulation frequencies (MTFs) result from different durations of the excitatory and inhibitory potentials in the model. The red curves illustrate different types of band-suppressed MTFs; these curves are suppressed with respect to the response to an unmodulated tone over some range of modulation frequencies. (After Carney et al., 2015).
Figure 5.
Figure 5.
A) Schematic diagram showing stimulus waveform, AN population model, and Brainstem/Midbrain population models. The stimulus is /α/ from Hillenbrand et al. (1995), spoken by a male with average f0=128 Hz, presented at 65 dB SPL. Formant frequencies: F1 = 748 Hz, F2 = 1293 Hz, F3 = 2446 Hz, F4 = 3383 Hz. B) Time-frequency population responses. All AN model fibers are high-spontaneous-rate fibers; 50 BF channels from 150 to 4000 Hz. Midbrain responses are for band-suppressed neurons created by band-enhanced cells with BMF = 128 Hz (see Fig. 4). C) Discharge rates averaged over time, for each BF channel, plotted on logarithmic frequency axes; model AN responses (left) and midbrain responses (right).
Figure 6.
Figure 6.
Model AN and Midbrain responses to four vowels ( /i/, /e/, /æ/, and /u/, from top to bottom), spoken by 8 speakers in the Hillenbrand et al. (1995). All vowels were scaled to 70 dB SPL. The left column shows population average discharge rate profiles for model AN responses, and the right column shows band-suppressed (BS) model midbrain responses. Each midbrain model had a BMF that was matched to the average f0 of each speaker across the vowels in the Hillenbrand et al. (1995) database. Averages rate responses across the eight speakers are shown as thick black lines in each panel.
Figure 7.
Figure 7.
Model AN and midbrain band-suppressed rate profiles for vowels at +5 dB SNR, in a background of LTASS noise. (Vowels were presented at 70 dB SPL, and the added LTASS noise was at 65 dB SPL). Note that at this moderate noise level, the AN responses are largely saturated, but the model midbrain response profiles (bold black lines in the right hand-column) still have peaks at many of the formant frequencies.
Figure 8.
Figure 8.
Model response profiles for vowels in quiet (A, B) and in +5 dB SNR added noise (C, D). Model AN (A, C) and band-suppressed midbrain (B, D) profiles for the four vowels in Figs. 6 and 7, averaged across the eight speakers.

References

    1. Becker-Kristal R (2010). Acoustic typology of vowel inventories and Dispersion Theory: Insights from a large cross-linguistic corpus, Unpublished dissertation, University of California-Los Angeles.
    1. Byrne D, Dillon H, Tran K, Arlinger S, Wilbraham K, Cox R, … & Kiessling J (1994). An international comparison of long-term average speech spectra. The Journal of the Acoustical Society of America, 96(4), 2108–2120.
    1. Carlson R, & Granström B (1982). Towards an auditory spectrograph, in The Representation of Speech in the Peripheral Auditory System, Carlson R and Granström B, Eds, pp. 109–114.
    1. Carney LH (1993). A model for the responses of low-frequency auditory-nerve fibers in cat. The Journal of the Acoustical Society of America, 93(1), 401–417. - PubMed
    1. Carney LH (2018). Supra-Threshold Hearing and Fluctuation Profiles: Implications for Sensorineural and Hidden Hearing Loss. J Assoc Res Otolaryngol. 19(4), 331–352. - PMC - PubMed

LinkOut - more resources