Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2006 Oct;27(5):480-92.
doi: 10.1097/01.aud.0000233891.86809.df.

Determination of the potential benefit of time-frequency gain manipulation

Affiliations

Determination of the potential benefit of time-frequency gain manipulation

Michael C Anzalone et al. Ear Hear. 2006 Oct.

Abstract

Objective: The purpose of this study was to determine the maximum benefit provided by a time-frequency gain-manipulation algorithm for noise-reduction (NR) based on an ideal detector of speech energy. The amount of detected energy necessary to show benefit using this type of NR algorithm was examined, as well as the necessary speed and frequency resolution of the gain manipulation.

Design: NR was performed using time-frequency gain manipulation, wherein the gains of individual frequency bands depended on the absence or presence of speech energy within each band. Three different experiments were performed: (1) NR using ideal detectors, (2) NR with nonideal detectors, and (3) NR with ideal detectors and different processing speeds and frequency resolutions. All experiments were performed using the Hearing-in-Noise test (HINT). A total of 6 listeners with normal hearing and 14 listeners with hearing loss were tested.

Results: HINT thresholds improved for all listeners with NR based on the ideal detectors used in Experiment I. The nonideal detectors of Experiment II required detection of at least 90% of the speech energy before an improvement was seen in HINT thresholds. The results of Experiment III demonstrated that relatively high temporal resolution (<100 msec) was required by the NR algorithm to improve HINT thresholds.

Conclusions: The results indicated that a single-microphone NR system based on time-frequency gain manipulation improved the HINT thresholds of listeners. However, to obtain benefit in speech intelligibility, the detectors used in such a strategy were required to detect an unrealistically high percentage of the speech energy and to perform the gain manipulations on a fast temporal basis.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Ideal binary mask generation. (A), Spectrogram of the speech in quiet (“Her shoes were very dirty”) produced by filtering the speech with a filter bank with center frequencies matched to the analysis filter bank of the NR algorithm, with one-half-ERB bandwidths to reduce overlap. (B), The global criterion used to detect 99% of the speech energy (dashed line) is illustrated on a plot of energy versus time for the 414-Hz center frequency band (arrow in A). The gain in the binary mask (B) for this frequency band was set to 1.0 when the energy exceeded the criterion and to 0.2 when the energy was below the criterion. (C), The ensemble of gains (the ideal binary mask) shown in a manner similar to the spectrogram. Dark areas represent time periods and frequency bands for which the gain is 1.0. To create the mask, the filter output (A) was squared and filtered with a 300-Hz low-pass filter.
Fig. 2
Fig. 2
Application of the ideal binary mask. (A), Spectrogram for the sentence of Figure 1 (“Her shoes were very dirty”) for speech-spectrum noise added at a SNR of 0 dB. (B), Spectrogram of sentence after application of the ideal binary mask shown in Figure 1B. Note that the noise between speech components is attenuated by application of the ideal binary mask. Because the binary mask was applied on a sample-by-sample basis (with 50 μsec sampling time), the reduction occurred both between words as well as within the words themselves.
Fig. 3
Fig. 3
Degradation of the ideal binary mask. (A), Ideal binary mask for the example sentence. B, Energy versus time for the 414-Hz frequency band (arrow in A) with lines showing the criteria used to detect 99% (dotted line), 85% (dotdashed line), and 75% (dashed line) of the speech energy. (C), Binary mask based on 85% of speech energy. (D), Binary mask based on 75% of speech energy.
Fig. 4
Fig. 4
Audiograms for the listeners with hearing loss in Experiments I, II, and III are shown with solid, dashed, and dotted lines, respectively. All thresholds of listeners with normal hearing were less than 15 dB HL (heavy dashed line).
Fig. 5
Fig. 5
Individual and average HINT thresholds for each processing condition for listeners with hearing loss. A more negative threshold signifies better performance. Puretone average (PTA) thresholds for 500-, 1000-, and 2000-Hz tones are shown in each panel.
Fig. 6
Fig. 6
HINT thresholds for all conditions for listeners with normal hearing. All thresholds for unprocessed speech fell within the norms for the HINT (Nilsson et al., 1994). Arrows indicate that the subjects hit the floor of the processed SNRs, for which only the temporal envelope cues were available (see text).
Fig. 7
Fig. 7
Effect of degrading the ideal binary mask on listeners with hearing loss. HINT thresholds as a function of the percent of speech energy used to create the binary mask (see Fig. 3). All subjects showed a decreasing trend in HINT threshold as more speech energy was used to create the ideal binary mask. Performance for the ideal mask (the rightmost point of each line) matched the results in Experiment I.
Fig. 8
Fig. 8
Changes in HINT threshold for listeners with hearing loss as a function of the percent of speech energy used to create the binary mask. Solid line represents mean (±1 SD) change for all listeners with hearing loss. Listeners with hearing loss improved when the binary mask was based on greater than 90% of the speech energy. For binary masks based on less energy, there was either no change from the unprocessed condition, or a slight increase in HINT threshold. Asterisks represent statistically significant differences from the unprocessed condition (p < 0.05).
Fig. 9
Fig. 9
HINT thresholds for listeners with normal hearing improved as the binary mask approached the ideal binary mask. For binary masks based on higher percentages of speech energy, HINT tracks for listeners with normal hearing reached the minimum SNR used in the study (-10 dB).
Fig. 10
Fig. 10
Change in HINT threshold for listeners with normal hearing as a function of the amount of speech energy used to create the binary mask. Solid line represents mean (±1 SD) change for all listeners with normal hearing. Only the 99% condition was statistically different from the unprocessed condition (paired t-test, p < 0.05, asterisk).
Fig. 11
Fig. 11
HINT thresholds for two subjects with hearing loss for changes in the frequency-resolution and temporal smearing of the binary mask. Both subjects had improved thresholds for both frequency resolutions tested. When the binary mask was temporally smeared with a 15-msec rectangular window, one subject showed an improvement similar to that seen in Experiment I, whereas the other showed a variable response. When the binary mask was smeared with a 100-msec rectangular window, both subjects’ HINT thresholds matched those for unprocessed speech.

References

    1. ANSI S3.5 . American National Standard Methods for the Calculation of the Articulation Index. ANSI; New York: 1969. ANSI S3.5.
    1. ANSI S3.6 . American National Standard Specification for Audiometers. ANSI; New York: 1989. ANSI S3.6-1989.
    1. ANSI S3.5 . Methods for Calculation of the Speech Intelligibility Index. ANSI; New York: 1997.
    1. Bacon SP, Gleitman RM. Modulation detection in subjects with relatively flat hearing losses. Journal of Speech and Hearing Research. 1992;35:642–653. - PubMed
    1. Berouti M, Schwartz R, Makhoul J. Enhancement of speech corrupted by acoustic noise. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing. 1979;4:208–211.

Publication types