Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2008 Dec;12(4):332-53.
doi: 10.1177/1084713808326455. Epub 2008 Oct 30.

Time-frequency masking for speech separation and its potential for hearing aid design

Affiliations
Review

Time-frequency masking for speech separation and its potential for hearing aid design

DeLiang Wang. Trends Amplif. 2008 Dec.

Abstract

A new approach to the separation of speech from speech-in-noise mixtures is the use of time-frequency (T-F) masking. Originated in the field of computational auditory scene analysis, T-F masking performs separation in the time-frequency domain. This article introduces the T-F masking concept and reviews T-F masking algorithms that separate target speech from either monaural or binaural mixtures, as well as microphone-array recordings. The review emphasizes techniques that are promising for hearing aid design. This article also surveys recent studies that evaluate the perceptual effects of T-F masking techniques, particularly their effectiveness in improving human speech recognition in noise. An assessment is made of the potential benefits of T-F masking methods for the hearing impaired in light of the processing constraints of hearing aids. Finally, several issues pertinent to T-F masking are discussed.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Block diagram of a typical time-frequency (T-F) masking system for speech separation.
Figure 2.
Figure 2.
Binary time-frequency mask. (A) Cochleagram of a mixture of speech and trill telephone. (B) Target binary mask as segregation output, where white pixels denote 1 and black pixels denote 0.
Figure 3.
Figure 3.
Ideal binary mask. Top left: Cochleagram of a target utterance (“Primitive tribes have an upbeat attitude”). Top right: Cochleagram of an interfering utterance (“Only the best players enjoy popularity”). Middle left: Cochleagram of the mixture. Middle right: Ideal binary mask. Bottom left: Masked mixture using the ideal binary mask.
Figure 4.
Figure 4.
Two-dimensional smoothed histogram. The histogram is generated from two 6-source mixtures, where α indicates amplitude difference and δ indicates time difference.
Figure 5.
Figure 5.
Diagram of the Roman et al. (2006) system. An adaptive filter is applied for target cancellation in the first stage. The second stage computes a binary time-frequency mask by comparing the mixture signal and the adaptive filter output (DFT = discrete Fourier transform).
Figure 6.
Figure 6.
Two back-to-back cardioid responses. The front direction corresponds to θ = 0°.

Similar articles

Cited by

References

    1. Aarabi P., Shi G. (2004). Phase-based dual-microphone robust speech enhancement. IEEE Transactions on Systems, Man, and Cybernetics—Part B: Cybernetics, 34, 1763–1773 - PubMed
    1. Anzalone M. C., Calandruccio L., Doherty K. A., Carney L. H. (2006). Determination of the potential benefit of time-frequency gain manipulation. Ear and Hearing, 27, 480–492 - PMC - PubMed
    1. Araki S., Makino S., Blin A., Mukai R., Sawada H. (2004, May). Underdetermined blind separation for speech in speech in real environments with sparseness and ICA. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal processing (Vol. III, pp. 881–884), Montreal, Quebec, Canada.
    1. Araki S., Makino S., Sawada H., Mukai R. (2004). Underdetermined blind separation of convolutive mixtures of speech with directivity pattern based mask and ICA. In Puntonet C. G., Prieto A. (Eds.), Lecture notes in computer science: 3195. Independent component analysis and blind signal separation: Proceedings of the Fifth International Congress, ICA 2004 (pp. 898–905). Berlin: Springer
    1. Araki S., Makino S., Sawada H., Mukai R. (2005, March). Reducing musical noise by a fine-shift overlap-and-add method applied to source separation using a time-frequency mask. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (Vol. III, pp. 81–84), Philadelphia, PA.

Publication types