A bio-inspired feature extraction for robust speech recognition

doi:10.1186/2193-1801-3-651

. 2014 Nov 4:3:651.

doi: 10.1186/2193-1801-3-651. eCollection 2014.

A bio-inspired feature extraction for robust speech recognition

Youssef Zouhir¹, Kaïs Ouni¹

Affiliations

PMID: 25485194
PMCID: PMC4230714
DOI: 10.1186/2193-1801-3-651

A bio-inspired feature extraction for robust speech recognition

Youssef Zouhir et al. Springerplus. 2014.

. 2014 Nov 4:3:651.

doi: 10.1186/2193-1801-3-651. eCollection 2014.

Authors

Youssef Zouhir¹, Kaïs Ouni¹

Affiliation

¹ Research Unit: Signals and Mechatronic Systems, SMS, Higher School of Technology and Computer Science (ESTI), University of Carthage, Carthage, Tunisia.

PMID: 25485194
PMCID: PMC4230714
DOI: 10.1186/2193-1801-3-651

Abstract

In this paper, a feature extraction method for robust speech recognition in noisy environments is proposed. The proposed method is motivated by a biologically inspired auditory model which simulates the outer/middle ear filtering by a low-pass filter and the spectral behaviour of the cochlea by the Gammachirp auditory filterbank (GcFB). The speech recognition performance of our method is tested on speech signals corrupted by real-world noises. The evaluation results show that the proposed method gives better recognition rates compared to the classic techniques such as Perceptual Linear Prediction (PLP), Linear Predictive Coding (LPC), Linear Prediction Cepstral coefficients (LPCC) and Mel Frequency Cepstral Coefficients (MFCC). The used recognition system is based on the Hidden Markov Models with continuous Gaussian Mixture densities (HMM-GM).

Keywords: Auditory filter model; Feature extraction; Hidden Markov Models; Noisy speech recognition.

PubMed Disclaimer

Figures

**Figure 1**
**Automatic speech recognition system.**

**Figure 2**
**The Markov Model with 5 states simple model (**Young et al. 2009 ).

**Figure 3**
**Block diagram of PLP technique (**Hermansky 1990 ).

**Figure 4**
**The top panel represents the 25 ms waveform segment of the word “Water” (sampling frequency =16 kHz).** The bottom panel illustrates the simulation of BMM for the waveform segment.

**Figure 5**
**Block diagram of the proposed Perceptual linear predictive auditory Gammachirp (PLPaGc) method.**

**Figure 6**
**The temporal representations and the spectrograms of the used noises.**

See this image and copyright information in PMC

References

1. Atal BS. Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. J Acoust Soc Am. 1974;55(6):1304–12. doi: 10.1121/1.1914702. - DOI - PubMed
1. Atal BS, Hanauer SL. Speech analysis and synthesis by linear prediction of the speech wave. J Acoust Soc Am. 1971;50:637–55. doi: 10.1121/1.1912679. - DOI - PubMed
1. Beigi H. Fundamentals of Speaker Recognition. New York: Springer; 2011.
1. Bleeck S, Ives T, Patterson RD. Aim-mat: the auditry image model in MATLAB. Acta Acustica United Ac. 2004;90(4):781–787.
1. Davis SB, Mermelstein P. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans Acoust, Speech, Signal Processing. 1980;28(4):357–66. doi: 10.1109/TASSP.1980.1163420. - DOI

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Research Materials
- NCI CPTC Antibody Characterization Program

[1] Atal BS. Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. J Acoust Soc Am. 1974;55(6):1304–12. doi: 10.1121/1.1914702. - DOI - PubMed

[2] Atal BS. Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. J Acoust Soc Am. 1974;55(6):1304–12. doi: 10.1121/1.1914702. - DOI - PubMed

[3] Atal BS, Hanauer SL. Speech analysis and synthesis by linear prediction of the speech wave. J Acoust Soc Am. 1971;50:637–55. doi: 10.1121/1.1912679. - DOI - PubMed

[4] Atal BS, Hanauer SL. Speech analysis and synthesis by linear prediction of the speech wave. J Acoust Soc Am. 1971;50:637–55. doi: 10.1121/1.1912679. - DOI - PubMed

[5] Beigi H. Fundamentals of Speaker Recognition. New York: Springer; 2011.

[6] Beigi H. Fundamentals of Speaker Recognition. New York: Springer; 2011.

[7] Bleeck S, Ives T, Patterson RD. Aim-mat: the auditry image model in MATLAB. Acta Acustica United Ac. 2004;90(4):781–787.

[8] Bleeck S, Ives T, Patterson RD. Aim-mat: the auditry image model in MATLAB. Acta Acustica United Ac. 2004;90(4):781–787.

[9] Davis SB, Mermelstein P. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans Acoust, Speech, Signal Processing. 1980;28(4):357–66. doi: 10.1109/TASSP.1980.1163420. - DOI

[10] Davis SB, Mermelstein P. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans Acoust, Speech, Signal Processing. 1980;28(4):357–66. doi: 10.1109/TASSP.1980.1163420. - DOI

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A bio-inspired feature extraction for robust speech recognition

Affiliation

A bio-inspired feature extraction for robust speech recognition

Authors

Affiliation

Abstract

Figures

Similar articles

References

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials