Analysis and prediction of acoustic speech features from mel-frequency cepstral coefficients in distributed speech recognition architectures
- PMID: 19206822
- DOI: 10.1121/1.2997436
Analysis and prediction of acoustic speech features from mel-frequency cepstral coefficients in distributed speech recognition architectures
Abstract
The aim of this work is to develop methods that enable acoustic speech features to be predicted from mel-frequency cepstral coefficient (MFCC) vectors as may be encountered in distributed speech recognition architectures. The work begins with a detailed analysis of the multiple correlation between acoustic speech features and MFCC vectors. This confirms the existence of correlation, which is found to be higher when measured within specific phonemes rather than globally across all speech sounds. The correlation analysis leads to the development of a statistical method of predicting acoustic speech features from MFCC vectors that utilizes a network of hidden Markov models (HMMs) to localize prediction to specific phonemes. Within each HMM, the joint density of acoustic features and MFCC vectors is modeled and used to make a maximum a posteriori prediction. Experimental results are presented across a range of conditions, such as with speaker-dependent, gender-dependent, and gender-independent constraints, and these show that acoustic speech features can be predicted from MFCC vectors with good accuracy. A comparison is also made against an alternative scheme that substitutes the higher-order MFCCs with acoustic features for transmission. This delivers accurate acoustic features but at the expense of a significant reduction in speech recognition accuracy.
Similar articles
-
Predicting fundamental frequency from mel-frequency cepstral coefficients to enable speech reconstruction.J Acoust Soc Am. 2005 Aug;118(2):1134-43. doi: 10.1121/1.1953269. J Acoust Soc Am. 2005. PMID: 16158667
-
Automatic recognition of pathological phoneme production.Folia Phoniatr Logop. 2008;60(6):323-31. doi: 10.1159/000170083. Epub 2008 Nov 11. Folia Phoniatr Logop. 2008. PMID: 19011305
-
Static features in real-time recognition of isolated vowels at high pitch.J Acoust Soc Am. 2007 Oct;122(4):2389-404. doi: 10.1121/1.2772228. J Acoust Soc Am. 2007. PMID: 17902873
-
Automatic detection of laryngeal pathologies in records of sustained vowels by means of mel-frequency cepstral coefficient parameters and differentiation of patients by sex.Folia Phoniatr Logop. 2009;61(3):146-52. doi: 10.1159/000219950. Epub 2009 Jul 1. Folia Phoniatr Logop. 2009. PMID: 19571549 Review.
-
[A review on the applications of acoustic analysis in diagnosing disease].Sheng Wu Yi Xue Gong Cheng Xue Za Zhi. 2007 Dec;24(6):1419-22. Sheng Wu Yi Xue Gong Cheng Xue Za Zhi. 2007. PMID: 18232507 Review. Chinese.
Cited by
-
A novel approach for acoustic estimation of neck fluid volume between men and women.Med Biol Eng Comput. 2018 Jan;56(1):113-123. doi: 10.1007/s11517-017-1675-1. Epub 2017 Jul 5. Med Biol Eng Comput. 2018. PMID: 28676955
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources