Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Nov-Dec;45(6):1444-1460.
doi: 10.1097/AUD.0000000000001532. Epub 2024 May 31.

The Optimal Speech-to-Background Ratio for Balancing Speech Recognition With Environmental Sound Recognition

Affiliations

The Optimal Speech-to-Background Ratio for Balancing Speech Recognition With Environmental Sound Recognition

Eric M Johnson et al. Ear Hear. 2024 Nov-Dec.

Abstract

Objectives: This study aimed to determine the speech-to-background ratios (SBRs) at which normal-hearing (NH) and hearing-impaired (HI) listeners can recognize both speech and environmental sounds when the two types of signals are mixed. Also examined were the effect of individual sounds on speech recognition and environmental sound recognition (ESR), and the impact of divided versus selective attention on these tasks.

Design: In Experiment 1 (divided attention), 11 NH and 10 HI listeners heard sentences mixed with environmental sounds at various SBRs and performed speech recognition and ESR tasks concurrently in each trial. In Experiment 2 (selective attention), 20 NH listeners performed these tasks in separate trials. Psychometric functions were generated for each task, listener group, and environmental sound. The range over which speech recognition and ESR were both high was determined, as was the optimal SBR for balancing recognition with ESR, defined as the point of intersection between each pair of normalized psychometric functions.

Results: The NH listeners achieved greater than 95% accuracy on concurrent speech recognition and ESR over an SBR range of approximately 20 dB or greater. The optimal SBR for maximizing both speech recognition and ESR for NH listeners was approximately +12 dB. For the HI listeners, the range over which 95% performance was observed on both tasks was far smaller (span of 1 dB), with an optimal value of +5 dB. Acoustic analyses indicated that the speech and environmental sound stimuli were similarly audible, regardless of the hearing status of the listener, but that the speech fluctuated more than the environmental sounds. Divided versus selective attention conditions produced differences in performance that were statistically significant yet only modest in magnitude. In all conditions and for both listener groups, recognition was higher for environmental sounds than for speech when presented at equal intensities (i.e., 0 dB SBR), indicating that the environmental sounds were more effective maskers of speech than the converse. Each of the 25 environmental sounds used in this study (with one exception) had a span of SBRs over which speech recognition and ESR were both higher than 95%. These ranges tended to overlap substantially.

Conclusions: A range of SBRs exists over which speech and environmental sounds can be simultaneously recognized with high accuracy by NH and HI listeners, but this range is larger for NH listeners. The single optimal SBR for jointly maximizing speech recognition and ESR also differs between NH and HI listeners. The greater masking effectiveness of the environmental sounds relative to the speech may be related to the lower degree of fluctuation present in the environmental sounds as well as possibly task differences between speech recognition and ESR (open versus closed set). The observed differences between the NH and HI results may possibly be related to the HI listeners' smaller fluctuating masker benefit. As noise-reduction systems become increasingly effective, the current results could potentially guide the design of future systems that provide listeners with highly intelligible speech without depriving them of access to important environmental sounds.

PubMed Disclaimer

Conflict of interest statement

The authors have no conflicts of interest to disclose.

Figures

Fig. 1.
Fig. 1.
Pure-tone air-conduction audiometric thresholds for the hearing-impaired listeners. Listeners are numbered in order of increasing degree of hearing loss. Right ears are represented by circles and left ears are represented by X’s. The limit of normal hearing (20 dB HL) is represented by a dotted horizontal line in each panel. Subject numbers, ages in years, and sexes are also provided.
Fig. 2.
Fig. 2.
The graphical user interface used to record listener responses for environmental sounds. The 25 environmental sounds were arranged in a 5 × 5 grid in alphabetical order. Pictures were provided to assist in locating the appropriate response button.
Fig. 3.
Fig. 3.
Normalized psychometric functions for speech recognition and ESR in conditions of divided attention based on the pooled performance of 11 normal-hearing listeners. Filled circles denote normalized percent words correct speech recognition, and open circles correspond to normalized percent correct ESR. Larger circles represent normalized group-mean performance, and smaller circles, which have been jittered for clarity, represent normalized mean performance values for individual participants at each speech-to-background ratio. The solid black line is the fitted psychometric function for speech recognition, and the dashed line is the fitted function for ESR. ESR indicates environmental sound recognition.
Fig. 4.
Fig. 4.
As Fig. 3, but for the 10 hearing-impaired listeners.
Fig. 5.
Fig. 5.
Level frequency diagrams for the speech (left panel) and environmental-sound (right panel) signals, based on short-time level percentile analysis (see text for details). Shown are long-term average spectra (black lines), 30th percentiles (lower border of blue boxes), 65th percentiles (border between red and blue boxes), and 99th percentiles (upper border of red boxes). The dynamic range within each frequency band is denoted by the difference between the 99th and 30th percentile (the combined height of a blue and red box).
Fig. 6.
Fig. 6.
As Fig. 3, but for the 20 normal-hearing subjects who performed the selective attention task in Experiment 2.
Fig. 7.
Fig. 7.
Psychometric functions for speech recognition and environmental sound recognition for 25 environmental background sounds. Each panel represents data from a different environmental sound, indicated by the label in the top right corner of the panel. As in other figures, solid black circles and lines denote speech recognition whereas open circles and dashed lines represent environmental sound recognition. Error bars indicate the standard error of the mean.
Fig. 8.
Fig. 8.
Range plot of normalized 95%-correct thresholds for 25 environmental background sounds. Thresholds for recognition and environmental sound recognition are denoted by filled and open circles, respectively. The shaded area indicates the range of speech-to-background ratios where performance is at least 95% correct on both tasks for all sounds (except for “waves”).

Similar articles

Cited by

References

    1. Alhanbali S, Dawes P, Lloyd S, Munro KJ (2017). Self-reported listening-related effort and fatigue in hearing-impaired adults. Ear Hear, 38, e39–e48. - PubMed
    1. ANSI (1997). “ANSI S3.5–1997, American national standard methods for calculation of the speech intelligibility index” (American National Standards Institute, New York: ).
    1. ANSI (2004). S3.21 (R2009), American National Standard Methods for Manual Pure-Tone Threshold Audiometry (American National Standards Institute, New York: ).
    1. ANSI (2010). S3.6, American National Standard Specification for Audiometers (American National Standards Institute, New York: ).
    1. Aniansson G (1978). Speech intelligibility in and speech interference levels of traffic noise in hearing-impaired and normal listeners. Acta Oto-Laryngologica, 86, 109–112. - PubMed

Publication types