Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Apr 13;8(2):ENEURO.0428-20.2021.
doi: 10.1523/ENEURO.0428-20.2021. Print 2021 Mar-Apr.

Otoacoustic Emissions Evoked by the Time-Varying Harmonic Structure of Speech

Affiliations

Otoacoustic Emissions Evoked by the Time-Varying Harmonic Structure of Speech

Marina Saiz-Alía et al. eNeuro. .

Abstract

The human auditory system is exceptional at comprehending an individual speaker even in complex acoustic environments. Because the inner ear, or cochlea, possesses an active mechanism that can be controlled by subsequent neural processing centers through descending nerve fibers, it may already contribute to speech processing. The cochlear activity can be assessed by recording otoacoustic emissions (OAEs), but employing these emissions to assess speech processing in the cochlea is obstructed by the complexity of natural speech. Here, we develop a novel methodology to measure OAEs that are related to the time-varying harmonic structure of speech [speech-distortion-product OAEs (DPOAEs)]. We then employ the method to investigate the effect of selective attention on the speech-DPOAEs. We provide tentative evidence that the speech-DPOAEs are larger when the corresponding speech signal is attended than when it is ignored. Our development of speech-DPOAEs opens up a path to further investigations of the contribution of the cochlea to the processing of complex real-world signals.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
The waveforms used to elicit and detect speech-DPOAEs. A, B, The spectrogram of the voiced parts of a male speech signal (A) or a female speech signal (B) shows the harmonic structure, with a fundamental frequency and many higher harmonics (note that the colormap represents lower power as dark and higher power as white). C, D, The waveforms used to elicit and to detect the speech-DPOAEs to the male voice (C) and to the female voice (D). A, C, We measure speech-DPOAEs related to the male voice by constructing waveforms w9(t) (red line) and w11(t) (purple line) that oscillate at the 9th and 11th harmonics of the fundamental frequency of the speech signal, respectively. The lower-sideband speech-DPOAE then emerges at the 7th harmonic and is measured through cross-correlation with the corresponding waveform w7(t) (dashed red line). B, D, The speech-DPOAEs related to the female voice are elicited by waveforms w6(t) (red line) and w8(t) (purple line) that correspond to the 6th and 8th harmonics. The speech-DPOAE is found at the 4th harmonic, we measure it through the waveform w4(t) (dashed red line). E, In our experiment, we presented subjects with speech stimuli to the left ear. The speech stimuli were either a single voice or two competing voices, a male and a female one. Two waveforms wn(t) and wm(t) that were derived from one of the speech stimuli were presented to the right ear. The microphone signal r(t) was recorded from the right ear as well, and the speech-DPOAE was derived from this recording.
Figure 2.
Figure 2.
Measurement of speech-DPOAEs. A–D, Complex cross-correlations of the microphone recording of a representative subject with the stimulating waveforms for the male voice [w9(t) and w11(t)], when the probe is placed inside the ear canal (A, C, respectively) and when it is hold outside the ear (B, D, respectively). The data from the representative subject show that the complex cross-correlation of each stimulation waveform with the microphone recording peaks at 0 ms (blue: real part, red: imaginary part, black: amplitude). These peaks occur both when the probe is placed inside (A, C) as well as outside the ear canal (B, D). E, An OAE is measured by computing the complex cross-correlation between the microphone recording and the waveform w7(t) that corresponds to the lower-sideband distortion. We refer to this emission as a speech-DPOAE. The amplitude peaks at a latency of 2.2 ms (dashed line). F, The speech-DPOAE measured outside the ear canal. When the probe is placed outside the ear canal, the cross-correlation does not show a significant peak, demonstrating that no emission could be detected. G, Individual peak values of speech-DPOAEs for male and female speech in isolation. In most subjects the amplitude of the speech-DPOAE (darker bar) was significantly above the noise floor (lighter superimposed bar). The population average of the speech-DPOAE related to the male voice was significantly larger than that related to the female voice.
Figure 3.
Figure 3.
Relation of speech-DPOAEs to pure-tone DPOAEs. A, The power spectrum of the microphone recording in response to pure tones of a representative subject. Pure-tone DPOAEs were measured in response to the two primary frequencies f1=1 kHz and f2 = 1.2 kHz, and emerged at the cubic distortion frequencies 2f1f2 and 2f2f1. B, The cross-correlation of the lower-sideband 2f1f2 with the microphone recording of the same subject shows an amplitude of about 5e-4 (upper panel), significantly higher than that obtained when the probe is placed outside the ear canal (lower panel). C, Comparison between the lower-sideband pure-tone DPOAEs analyzed through the two methods presented in A, B. The amplitude of the pure-tone DPOAEs when analyzed through the cross-correlation method (ordinates), strongly correlated with the amplitude obtained from the power spectrum across subjects (abscissas). D, Comparison between the pure-tone DPOAEs obtained through the power spectrum and the speech-DPOAEs peak responses. The amplitude of the speech-DPOAEs was strongly correlated, across subjects, to the amplitude of the lower-sideband DPOAE 2f1f2 as well.
Figure 4.
Figure 4.
Attentional modulation of speech-DPOAEs. Individual attentional modulations of speech-DPOAEs to male and female voices (A; the diamond markers represent outliers) and bootstrap distributions of the mean amplitude relative attentional modulation to the male voice (B) and female voice (C). A, The relative attentional modulation of the speech-DPOAEs related to the male voice is not significantly different from zero. Speech-DPOAEs related to the female voice are, however, significantly larger when the female voice is attended than when it is ignored. B, C, The bootstrapping procedure confirms that the results are stable, and that the attentional modulation related to the female voice has a large intersubject variability (C).

Similar articles

Cited by

References

    1. Abdala C (1996) Distortion product otoacoustic emission (2 f 1 − f 2) amplitude as a function of f 2/f 1 frequency ratio and primary tone level separation in human adults and neonates. J Acoust Soc Am 100:3726–3740. 10.1121/1.417234 - DOI - PubMed
    1. Avan P, Bonfils P (1992) Analysis of possible interactions of an attentional task with cochlear micromechanics. Hear Res 57:269–2675. 10.1016/0378-5955(92)90156-h - DOI - PubMed
    1. Beim JA, Oxenham AJ, Wojtczak M (2018) Examining replicability of an otoacoustic measure of cochlear function during selective attention. J Acoust Soc Am 144:2882–2895. 10.1121/1.5079311 - DOI - PMC - PubMed
    1. Beim JA, Oxenham AJ, Wojtczak M (2019) No effects of attention or visual perceptual load on cochlear function, as measured with stimulus-frequency otoacoustic emissions. J Acoust Soc Am 146:1475–1491. 10.1121/1.5123391 - DOI - PMC - PubMed
    1. Bergevin C, Freeman DM, Saunders JC, Shera CA (2008) Otoacoustic emissions in humans, birds, lizards, and frogs: evidence for multiple generation mechanisms. J Comp Physiol A Neuroethol Sens Neural Behav Physiol 194:665–683. 10.1007/s00359-008-0338-y - DOI - PMC - PubMed

LinkOut - more resources