Time-frequency masking for speech separation and its potential for hearing aid design

DeLiang Wang¹

Affiliations

PMID: 18974204
PMCID: PMC4111459
DOI: 10.1177/1084713808326455

Review

Time-frequency masking for speech separation and its potential for hearing aid design

DeLiang Wang. Trends Amplif. 2008 Dec.

. 2008 Dec;12(4):332-53.

doi: 10.1177/1084713808326455. Epub 2008 Oct 30.

Author

DeLiang Wang¹

Affiliation

¹ Department of Computer Science & Engineering, Center for Cognitive Science, The Ohio State University, Columbus, OH 43210, USA. dwang@cse.ohio

PMID: 18974204
PMCID: PMC4111459
DOI: 10.1177/1084713808326455

Abstract

A new approach to the separation of speech from speech-in-noise mixtures is the use of time-frequency (T-F) masking. Originated in the field of computational auditory scene analysis, T-F masking performs separation in the time-frequency domain. This article introduces the T-F masking concept and reviews T-F masking algorithms that separate target speech from either monaural or binaural mixtures, as well as microphone-array recordings. The review emphasizes techniques that are promising for hearing aid design. This article also surveys recent studies that evaluate the perceptual effects of T-F masking techniques, particularly their effectiveness in improving human speech recognition in noise. An assessment is made of the potential benefits of T-F masking methods for the hearing impaired in light of the processing constraints of hearing aids. Finally, several issues pertinent to T-F masking are discussed.

PubMed Disclaimer

Figures

**Figure 1.**
Block diagram of a typical time-frequency (T-F) masking system for speech separation.

**Figure 2.**
Binary time-frequency mask. (A) Cochleagram of a mixture of speech and trill telephone. (B) Target binary mask as segregation output, where white pixels denote 1 and black pixels denote 0.

**Figure 3.**
Ideal binary mask. Top left: Cochleagram of a target utterance (“Primitive tribes have an upbeat attitude”). Top right: Cochleagram of an interfering utterance (“Only the best players enjoy popularity”). Middle left: Cochleagram of the mixture. Middle right: Ideal binary mask. Bottom left: Masked mixture using the ideal binary mask.

**Figure 4.**
Two-dimensional smoothed histogram. The histogram is generated from two 6-source mixtures, where α indicates amplitude difference and δ indicates time difference.

**Figure 5.**
Diagram of the Roman et al. (2006) system. An adaptive filter is applied for target cancellation in the first stage. The second stage computes a binary time-frequency mask by comparing the mixture signal and the adaptive filter output (DFT = discrete Fourier transform).

**Figure 6.**
Two back-to-back cardioid responses. The front direction corresponds to θ = 0°.

See this image and copyright information in PMC

References

1. Aarabi P., Shi G. (2004). Phase-based dual-microphone robust speech enhancement. IEEE Transactions on Systems, Man, and Cybernetics—Part B: Cybernetics, 34, 1763–1773 - PubMed
1. Anzalone M. C., Calandruccio L., Doherty K. A., Carney L. H. (2006). Determination of the potential benefit of time-frequency gain manipulation. Ear and Hearing, 27, 480–492 - PMC - PubMed
1. Araki S., Makino S., Blin A., Mukai R., Sawada H. (2004, May). Underdetermined blind separation for speech in speech in real environments with sparseness and ICA. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal processing (Vol. III, pp. 881–884), Montreal, Quebec, Canada.
1. Araki S., Makino S., Sawada H., Mukai R. (2004). Underdetermined blind separation of convolutive mixtures of speech with directivity pattern based mask and ICA. In Puntonet C. G., Prieto A. (Eds.), Lecture notes in computer science: 3195. Independent component analysis and blind signal separation: Proceedings of the Fifth International Congress, ICA 2004 (pp. 898–905). Berlin: Springer
1. Araki S., Makino S., Sawada H., Mukai R. (2005, March). Reducing musical noise by a fine-shift overlap-and-add method applied to source separation using a time-frequency mask. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (Vol. III, pp. 81–84), Philadelphia, PA.

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Medical
- MedlinePlus Health Information
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Time-frequency masking for speech separation and its potential for hearing aid design

Affiliation

Time-frequency masking for speech separation and its potential for hearing aid design

Author

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical

Miscellaneous