Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2014 Oct 30:8:348.
doi: 10.3389/fnins.2014.00348. eCollection 2014.

Why do I hear but not understand? Stochastic undersampling as a model of degraded neural encoding of speech

Affiliations
Review

Why do I hear but not understand? Stochastic undersampling as a model of degraded neural encoding of speech

Enrique A Lopez-Poveda. Front Neurosci. .

Abstract

Hearing impairment is a serious disease with increasing prevalence. It is defined based on increased audiometric thresholds but increased thresholds are only partly responsible for the greater difficulty understanding speech in noisy environments experienced by some older listeners or by hearing-impaired listeners. Identifying the additional factors and mechanisms that impair intelligibility is fundamental to understanding hearing impairment but these factors remain uncertain. Traditionally, these additional factors have been sought in the way the speech spectrum is encoded in the pattern of impaired mechanical cochlear responses. Recent studies, however, are steering the focus toward impaired encoding of the speech waveform in the auditory nerve. In our recent work, we gave evidence that a significant factor might be the loss of afferent auditory nerve fibers, a pathology that comes with aging or noise overexposure. Our approach was based on a signal-processing analogy whereby the auditory nerve may be regarded as a stochastic sampler of the sound waveform and deafferentation may be described in terms of waveform undersampling. We showed that stochastic undersampling simultaneously degrades the encoding of soft and rapid waveform features, and that this degrades speech intelligibility in noise more than in quiet without significant increases in audiometric thresholds. Here, we review our recent work in a broader context and argue that the stochastic undersampling analogy may be extended to study the perceptual consequences of various different hearing pathologies and their treatment.

Keywords: aging; auditory deafferentation; auditory encoding; hearing impairment; hearing loss; speech intelligibility; speech processing; stochastic sampling.

PubMed Disclaimer

Figures

Figure 1
Figure 1
A schematic illustration of the effects of stochastic undersampling on speech intelligibility in noise and in quiet. Consider a speech intelligibility task (e.g., the identification of sentences) in different amounts of background noise. The blue trace depicts a hypothetical psychometric function showing performance (the percentage of correctly identified sentences) as a function of the amount of noise, with the latter expressed as the speech-to-noise (SNR) ratio in dB. The speech reception threshold (SRT) is, by definition, the SNR at which the listener correctly identifies 50% of the sentences. Consider now that stochastic undersampling reduces the effective SNR by a fixed amount, depicted by the red arrow. For a speech-in-quiet condition, such an SNR reduction barely degrades performance. By contrast, for a more challenging condition of speech in noise, the same SNR reduction degrades performance significantly.
Figure 2
Figure 2
An example simulation of stochastic undersampling by deafferentation and its consequences on the waveform representation in quiet. Consider a sound waveform (blue traces in A,C,D) and its full-wave rectified (FWR) version (green trace in A). Consider also four auditory nerve fibers each of which can fire along the sound waveform following a simple principle: the probability of firing is proportional to the instantaneous sound pressure in the FWR waveform. Since spikes are stochastic events, spike trains are different for the four fibers (B). The green traces in (C,D) illustrate neural representations of the sound waveform that result from time-wise summation of only the upper two (C) or all four (D) spike trains, respectively. Clearly, the sound waveform is better represented in (D) than in (C). To illustrate this more clearly, acoustical-waveform equivalents of the aggregated spikes trains are shown as red traces in (C,D). These were obtained by time-wise multiplication of the original waveform with an aggregated spike train obtained using a time-wise logical OR function (black spike trains in C,D). Clearly, the waveform reconstructed using four fibers resembles more closely the original waveform than that reconstructed using only two fibers (compare the red and blue traces in C,D). In other words, a reduction in the number of fibers degrades the neural representation of the sound waveform. For further details, see (Lopez-Poveda and Barrios, 2013).
Figure 3
Figure 3
A visual example to illustrate the consequences of stochastic undersampling of a signal in quiet and in noise. We used the stochastic sampling principles illustrated in Figure 1 (Lopez-Poveda and Barrios, 2013), whereby the probability of firing is proportional to intensity, or pixel darkness in this example. (A,B) The signal in quiet and in noise, respectively. The signal deliberately contains darker and lighter features that would correspond to intense and soft features in speech, respectively. It also contains thick and thin features that would correspond to low- and high-frequency features in speech, respectively. (C,D) Stochastically sampled images using 10 samplers per pixel. This number of samplers is sufficient to make this signal intelligible both in quiet (C) and in noise (D). (E,F) Stochastically sampled images using one stochastic sampler per pixel. Now the signal is still detectable and intelligible in quiet (E) but less so in noise (F). Particularly degraded are the low-intensity (lighter gray) and high-frequency (thinner lines) features of the signal, like the “lo” portion of the upper “hello” word.

Similar articles

Cited by

References

    1. ANSI S3.5. (2007). Methods for Calculation of the Speech Intelligibility Index. New York: American National Standards Institute
    1. Baer T., Moore B. C. J. (1993). Effects of spectral smearing on the intelligibility of sentences in noise. J. Acoust. Soc. Am. 94, 1241. 10.1121/1.408176 - DOI
    1. Bernstein J. G., Mehraei G., Shamma S., Gallun F. J., Theodoroff S. M., Leek M. R. (2013). Spectrotemporal modulation sensitivity as a predictor of speech intelligibility for hearing-impaired listeners. J. Am. Acad. Audiol. 24, 293–306. 10.3766/jaaa.24.4.5 - DOI - PMC - PubMed
    1. Brown G. J., Ferry R. T., Meddis R. (2010). A computer model of auditory efferent suppression: implications for the recognition of speech in noise. J. Acoust. Soc. Am. 127, 943–954. 10.1121/1.3273893 - DOI - PubMed
    1. CHABA. (1988). Speech understanding and aging. J. Acoust. Soc. Am. 83, 859–895. 10.1121/1.395965 - DOI - PubMed

LinkOut - more resources