Exploiting independent filter bandwidth of human factor cepstral coefficients in automatic speech recognition
- PMID: 15478444
- DOI: 10.1121/1.1777872
Exploiting independent filter bandwidth of human factor cepstral coefficients in automatic speech recognition
Abstract
Mel frequency cepstral coefficients (MFCC) are the most widely used speech features in automatic speech recognition systems, primarily because the coefficients fit well with the assumptions used in hidden Markov models and because of the superior noise robustness of MFCC over alternative feature sets such as linear prediction-based coefficients. The authors have recently introduced human factor cepstral coefficients (HFCC), a modification of MFCC that uses the known relationship between center frequency and critical bandwidth from human psychoacoustics to decouple filter bandwidth from filter spacing. In this work, the authors introduce a variation of HFCC called HFCC-E in which filter bandwidth is linearly scaled in order to investigate the effects of wider filter bandwidth on noise robustness. Experimental results show an increase in signal-to-noise ratio of 7 dB over traditional MFCC algorithms when filter bandwidth increases in HFCC-E. An important attribute of both HFCC and HFCC-E is that the algorithms only differ from MFCC in the filter bank coefficients: increased noise robustness using wider filters is achieved with no additional computational cost.
Similar articles
-
Analysis and prediction of acoustic speech features from mel-frequency cepstral coefficients in distributed speech recognition architectures.J Acoust Soc Am. 2008 Dec;124(6):3989-4000. doi: 10.1121/1.2997436. J Acoust Soc Am. 2008. PMID: 19206822
-
Automatic recognition of pathological phoneme production.Folia Phoniatr Logop. 2008;60(6):323-31. doi: 10.1159/000170083. Epub 2008 Nov 11. Folia Phoniatr Logop. 2008. PMID: 19011305
-
Predicting fundamental frequency from mel-frequency cepstral coefficients to enable speech reconstruction.J Acoust Soc Am. 2005 Aug;118(2):1134-43. doi: 10.1121/1.1953269. J Acoust Soc Am. 2005. PMID: 16158667
-
Noisy speech recognition using de-noised multiresolution analysis acoustic features.J Acoust Soc Am. 2001 Nov;110(5 Pt 1):2567-74. doi: 10.1121/1.1398054. J Acoust Soc Am. 2001. PMID: 11757946
-
Automatic detection of laryngeal pathologies in records of sustained vowels by means of mel-frequency cepstral coefficient parameters and differentiation of patients by sex.Folia Phoniatr Logop. 2009;61(3):146-52. doi: 10.1159/000219950. Epub 2009 Jul 1. Folia Phoniatr Logop. 2009. PMID: 19571549 Review.
Cited by
-
Biologically-Inspired Spike-Based Automatic Speech Recognition of Isolated Digits Over a Reproducing Kernel Hilbert Space.Front Neurosci. 2018 Apr 3;12:194. doi: 10.3389/fnins.2018.00194. eCollection 2018. Front Neurosci. 2018. PMID: 29666568 Free PMC article.
-
Talker age estimation using machine learning.Proc Meet Acoust. 2017 Jun;30(1):040014. doi: 10.1121/2.0000921. Epub 2018 Oct 25. Proc Meet Acoust. 2017. PMID: 31666913 Free PMC article.
-
Swallow Detection with Acoustics and Accelerometric-Based Wearable Technology: A Scoping Review.Int J Environ Res Public Health. 2022 Dec 22;20(1):170. doi: 10.3390/ijerph20010170. Int J Environ Res Public Health. 2022. PMID: 36612490 Free PMC article.
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Miscellaneous