Statistical modeling of speech Poincaré sections in combination of frequency analysis to improve speech recognition performance
- PMID: 20887046
- DOI: 10.1063/1.3463722
Statistical modeling of speech Poincaré sections in combination of frequency analysis to improve speech recognition performance
Abstract
This paper introduces a combinational feature extraction approach to improve speech recognition systems. The main idea is to simultaneously benefit from some features obtained from Poincaré section applied to speech reconstructed phase space (RPS) and typical Mel frequency cepstral coefficients (MFCCs) which have a proved role in speech recognition field. With an appropriate dimension, the reconstructed phase space of speech signal is assured to be topologically equivalent to the dynamics of the speech production system, and could therefore include information that may be absent in linear analysis approaches. Moreover, complicated systems such as speech production system can present cyclic and oscillatory patterns and Poincaré sections could be used as an effective tool in analysis of such trajectories. In this research, a statistical modeling approach based on Gaussian mixture models (GMMs) is applied to Poincaré sections of speech RPS. A final pruned feature set is obtained by applying an efficient feature selection approach to the combination of the parameters of the GMM model and MFCC-based features. A hidden Markov model-based speech recognition system and TIMIT speech database are used to evaluate the performance of the proposed feature set by conducting isolated and continuous speech recognition experiments. By the proposed feature set, 5.7% absolute isolated phoneme recognition improvement is obtained against only MFCC-based features.
Similar articles
-
Auditory-model based robust feature selection for speech recognition.J Acoust Soc Am. 2010 Feb;127(2):EL73-9. doi: 10.1121/1.3284545. J Acoust Soc Am. 2010. PMID: 20136182
-
A probabilistic framework for landmark detection based on phonetic features for automatic speech recognition.J Acoust Soc Am. 2008 Feb;123(2):1154-68. doi: 10.1121/1.2823754. J Acoust Soc Am. 2008. PMID: 18247915
-
Analysis and prediction of acoustic speech features from mel-frequency cepstral coefficients in distributed speech recognition architectures.J Acoust Soc Am. 2008 Dec;124(6):3989-4000. doi: 10.1121/1.2997436. J Acoust Soc Am. 2008. PMID: 19206822
-
Hidden Markov models for speech and signal recognition.Electroencephalogr Clin Neurophysiol Suppl. 1996;45:137-52. Electroencephalogr Clin Neurophysiol Suppl. 1996. PMID: 8930521 Review.
-
Automatic detection of laryngeal pathologies in records of sustained vowels by means of mel-frequency cepstral coefficient parameters and differentiation of patients by sex.Folia Phoniatr Logop. 2009;61(3):146-52. doi: 10.1159/000219950. Epub 2009 Jul 1. Folia Phoniatr Logop. 2009. PMID: 19571549 Review.
MeSH terms
LinkOut - more resources
Full Text Sources