Review

. 2008 Dec;12(4):332-53.

doi: 10.1177/1084713808326455. Epub 2008 Oct 30.

Time-frequency masking for speech separation and its potential for hearing aid design

DeLiang Wang¹

Affiliations

PMID: 18974204
PMCID: PMC4111459
DOI: 10.1177/1084713808326455

Review

Time-frequency masking for speech separation and its potential for hearing aid design

DeLiang Wang. Trends Amplif. 2008 Dec.

. 2008 Dec;12(4):332-53.

doi: 10.1177/1084713808326455. Epub 2008 Oct 30.

Author

DeLiang Wang¹

Affiliation

¹ Department of Computer Science & Engineering, Center for Cognitive Science, The Ohio State University, Columbus, OH 43210, USA. dwang@cse.ohio

PMID: 18974204
PMCID: PMC4111459
DOI: 10.1177/1084713808326455

Abstract

A new approach to the separation of speech from speech-in-noise mixtures is the use of time-frequency (T-F) masking. Originated in the field of computational auditory scene analysis, T-F masking performs separation in the time-frequency domain. This article introduces the T-F masking concept and reviews T-F masking algorithms that separate target speech from either monaural or binaural mixtures, as well as microphone-array recordings. The review emphasizes techniques that are promising for hearing aid design. This article also surveys recent studies that evaluate the perceptual effects of T-F masking techniques, particularly their effectiveness in improving human speech recognition in noise. An assessment is made of the potential benefits of T-F masking methods for the hearing impaired in light of the processing constraints of hearing aids. Finally, several issues pertinent to T-F masking are discussed.

PubMed Disclaimer

Figures

**Figure 1.**
Block diagram of a typical time-frequency (T-F) masking system for speech separation.

**Figure 2.**
Binary time-frequency mask. (A) Cochleagram of a mixture of speech and trill telephone. (B) Target binary mask as segregation output, where white pixels denote 1 and black pixels denote 0.

**Figure 3.**
Ideal binary mask. Top left: Cochleagram of a target utterance (“Primitive tribes have an upbeat attitude”). Top right: Cochleagram of an interfering utterance (“Only the best players enjoy popularity”). Middle left: Cochleagram of the mixture. Middle right: Ideal binary mask. Bottom left: Masked mixture using the ideal binary mask.

**Figure 4.**
Two-dimensional smoothed histogram. The histogram is generated from two 6-source mixtures, where α indicates amplitude difference and δ indicates time difference.

**Figure 5.**
Diagram of the Roman et al. (2006) system. An adaptive filter is applied for target cancellation in the first stage. The second stage computes a binary time-frequency mask by comparing the mixture signal and the adaptive filter output (DFT = discrete Fourier transform).

**Figure 6.**
Two back-to-back cardioid responses. The front direction corresponds to θ = 0°.

See this image and copyright information in PMC

Cited by

Parameter tuning of time-frequency masking algorithms for reverberant artifact removal within the cochlear implant stimulus.
Shahidi LK, Collins LM, Mainsah BO. Shahidi LK, et al. Cochlear Implants Int. 2022 Nov;23(6):309-316. doi: 10.1080/14670100.2022.2096182. Epub 2022 Jul 23. Cochlear Implants Int. 2022. PMID: 35875863 Free PMC article.
Reconstruction techniques for improving the perceptual quality of binary masked speech.
Williamson DS, Wang Y, Wang D. Williamson DS, et al. J Acoust Soc Am. 2014 Aug;136(2):892-902. doi: 10.1121/1.4884759. J Acoust Soc Am. 2014. PMID: 25096123 Free PMC article.
A Competing Voices Test for Hearing-Impaired Listeners Applied to Spatial Separation and Ideal Time-Frequency Masks.
Bramsløw L, Vatti M, Rossing R, Naithani G, Henrik Pontoppidan N. Bramsløw L, et al. Trends Hear. 2019 Jan-Dec;23:2331216519848288. doi: 10.1177/2331216519848288. Trends Hear. 2019. PMID: 31104580 Free PMC article.
Hearing impairment, cognition and speech understanding: exploratory factor analyses of a comprehensive test battery for a group of hearing aid users, the n200 study.
Rönnberg J, Lunner T, Ng EH, Lidestam B, Zekveld AA, Sörqvist P, Lyxell B, Träff U, Yumba W, Classon E, Hällgren M, Larsby B, Signoret C, Pichora-Fuller MK, Rudner M, Danielsson H, Stenfelt S. Rönnberg J, et al. Int J Audiol. 2016 Nov;55(11):623-42. doi: 10.1080/14992027.2016.1219775. Epub 2016 Sep 2. Int J Audiol. 2016. PMID: 27589015 Free PMC article.
Harmonic Cancellation-A Fundamental of Auditory Scene Analysis.
de Cheveigné A. de Cheveigné A. Trends Hear. 2021 Jan-Dec;25:23312165211041422. doi: 10.1177/23312165211041422. Trends Hear. 2021. PMID: 34698574 Free PMC article.

See all "Cited by" articles

References

1. Aarabi P., Shi G. (2004). Phase-based dual-microphone robust speech enhancement. IEEE Transactions on Systems, Man, and Cybernetics—Part B: Cybernetics, 34, 1763–1773 - PubMed
1. Anzalone M. C., Calandruccio L., Doherty K. A., Carney L. H. (2006). Determination of the potential benefit of time-frequency gain manipulation. Ear and Hearing, 27, 480–492 - PMC - PubMed
1. Araki S., Makino S., Blin A., Mukai R., Sawada H. (2004, May). Underdetermined blind separation for speech in speech in real environments with sparseness and ICA. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal processing (Vol. III, pp. 881–884), Montreal, Quebec, Canada.
1. Araki S., Makino S., Sawada H., Mukai R. (2004). Underdetermined blind separation of convolutive mixtures of speech with directivity pattern based mask and ICA. In Puntonet C. G., Prieto A. (Eds.), Lecture notes in computer science: 3195. Independent component analysis and blind signal separation: Proceedings of the Fifth International Congress, ICA 2004 (pp. 898–905). Berlin: Springer
1. Araki S., Makino S., Sawada H., Mukai R. (2005, March). Reducing musical noise by a fine-shift overlap-and-add method applied to source separation using a time-frequency mask. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (Vol. III, pp. 81–84), Philadelphia, PA.

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Medical
- MedlinePlus Health Information
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Time-frequency masking for speech separation and its potential for hearing aid design

Affiliation

Time-frequency masking for speech separation and its potential for hearing aid design

Author

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical

Miscellaneous

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical

Miscellaneous