Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Sep;134(3):2205-12.
doi: 10.1121/1.4816413.

Role and relative contribution of temporal envelope and fine structure cues in sentence recognition by normal-hearing listeners

Affiliations

Role and relative contribution of temporal envelope and fine structure cues in sentence recognition by normal-hearing listeners

Frédéric Apoux et al. J Acoust Soc Am. 2013 Sep.

Abstract

The present study investigated the role and relative contribution of envelope and temporal fine structure (TFS) to sentence recognition in noise. Target and masker stimuli were added at five different signal-to-noise ratios (SNRs) and filtered into 30 contiguous frequency bands. The envelope and TFS were extracted from each band by Hilbert decomposition. The final stimuli consisted of the envelope of the target/masker sound mixture at x dB SNR and the TFS of the same sound mixture at y dB SNR. A first experiment showed a very limited contribution of TFS cues, indicating that sentence recognition in noise relies almost exclusively on temporal envelope cues. A second experiment showed that replacing the carrier of a sound mixture with noise (vocoder processing) cannot be considered equivalent to disrupting the TFS of the target signal by adding a background noise. Accordingly, a re-evaluation of the vocoder approach as a model to further understand the role of TFS cues in noisy situations may be necessary. Overall, these data are consistent with the view that speech information is primarily extracted from the envelope while TFS cues are primarily used to detect glimpses of the target.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Schematic of the processing used to create the stimuli. The grayed area illustrates the processing within each of the 30 analysis filters.
Figure 2
Figure 2
Correlation coefficients between the original speech and the chimeric sound's envelopes (filled symbols) and TFS (open symbols) as a function of the SNR. The circles correspond to the SSN conditions while the squares correspond to the SPE conditions.
Figure 3
Figure 3
Average sentence recognition scores in speech-shaped noise (SSN) as a function of the SNR of the envelope (left panel) and as a function ofthe SNR of the TFS (right panel) with SNRtfs and SNRenv as parameter, respectively. In each panel, a bold line (REF) connects the data points for which SNRenv and SNRtfs were equal.
Figure 4
Figure 4
The same as Fig. 3 but for the speech masker (SPE).
Figure 5
Figure 5
Average sentence recognition scores as a function of the SNR of theenvelope (SNRenv). The parameter is the SNR of the TFS (SNRtfs). The left and right panels show scores in speech-shaped noise (SSN) and speech (SPE), respectively. In each panel, the filled symbols correspond to the data from Exp. 2, while the open symbol corresponds to selected data from Exp. 1.
Figure 6
Figure 6
Average sentence recognition scores as a function of the SNR of the envelope (SNRenv). The left and right panels show scores in speech-shaped noise (SSN) and speech (SPE), respectively. In each panel, the filled symbols correspond to the data for three different SNRtfs values while the black and white symbol corresponds to the noise-carrier data (i.e., vocoder).

References

    1. ANSI (2004). S3.21 (R2009), American National Standard Methods for Manual Pure-Tone Threshold Audiometry (Acoustical Society of America, New York: ).
    1. ANSI (2010). S3.6-2010, American National Standard Specification for Audiometers (Acoustical Society of America, New York: ).
    1. Apoux, F., and Bacon, S. P. (2004). “ Relative importance of temporal information in various frequency regions for consonant identification in quiet and in noise,” J. Acoust. Soc. Am. 116, 1671–1680.10.1121/1.1781329 - DOI - PubMed
    1. Apoux, F., and Healy, E. W. (2009). “ On the number of auditory filter outputs needed to understand speech: Further evidence for auditory channel independence,” Hear. Res. 255, 99–108.10.1016/j.heares.2009.06.005 - DOI - PMC - PubMed
    1. Apoux, F., and Healy, E. W. (2010). “ Relative contribution of off- and on-frequency spectral components of background noise to the masking of unprocessed and vocoded speech,” J. Acoust. Soc. Am. 128, 2075–2084.10.1121/1.3478845 - DOI - PMC - PubMed

Publication types