Noisy speech recognition using de-noised multiresolution analysis acoustic features
- PMID: 11757946
- DOI: 10.1121/1.1398054
Noisy speech recognition using de-noised multiresolution analysis acoustic features
Abstract
This paper describes a novel application of multiresolution analysis (MRA) in extracting acoustic features that possess de-noising capability for robust speech recognition. The MRA algorithm is used to construct a mel-scaled wavelet packet filter-bank, from which subband powers are computed as the feature parameters for speech recognition. Wiener filtering is applied to a few selected subbands at some intermediate stages of decomposition. For high-frequency bands, Wiener filters are designed based on a reduced fraction of the estimated noise power, making the consonant features much more prominent and contrastive. The proposed method is evaluated in phone recognition experiments with the TIMIT database. In the presence of stationary white noise at 10-dB SNR, the de-noised MRA features attain a phone recognition rate of 32%. There is a noticeable improvement compared with the accuracy of 29% and 20% attained by the commonly used mel-frequency cepstral coefficients (MFCC) with and without cepstral mean normalization (CMN), respectively. The effectiveness of the MRA features is also verified by the fact that they exhibit smaller distortion from clean speech.
Similar articles
-
A simulation framework for auditory discrimination experiments: Revealing the importance of across-frequency processing in speech perception.J Acoust Soc Am. 2016 May;139(5):2708. doi: 10.1121/1.4948772. J Acoust Soc Am. 2016. PMID: 27250164
-
Spectro-temporal modulation subspace-spanning filter bank features for robust automatic speech recognition.J Acoust Soc Am. 2012 May;131(5):4134-51. doi: 10.1121/1.3699200. J Acoust Soc Am. 2012. PMID: 22559385
-
Exploiting independent filter bandwidth of human factor cepstral coefficients in automatic speech recognition.J Acoust Soc Am. 2004 Sep;116(3):1774-80. doi: 10.1121/1.1777872. J Acoust Soc Am. 2004. PMID: 15478444
-
Analysis of acoustic parameters for consonant voicing classification in clean and telephone speech.J Acoust Soc Am. 2012 Mar;131(3):EL197-202. doi: 10.1121/1.3678667. J Acoust Soc Am. 2012. PMID: 22423808
-
Matrix sentence intelligibility prediction using an automatic speech recognition system.Int J Audiol. 2015;54 Suppl 2:100-7. doi: 10.3109/14992027.2015.1061708. Epub 2015 Sep 18. Int J Audiol. 2015. PMID: 26383042
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Medical
Miscellaneous