Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Nov 4:3:651.
doi: 10.1186/2193-1801-3-651. eCollection 2014.

A bio-inspired feature extraction for robust speech recognition

Affiliations

A bio-inspired feature extraction for robust speech recognition

Youssef Zouhir et al. Springerplus. .

Abstract

In this paper, a feature extraction method for robust speech recognition in noisy environments is proposed. The proposed method is motivated by a biologically inspired auditory model which simulates the outer/middle ear filtering by a low-pass filter and the spectral behaviour of the cochlea by the Gammachirp auditory filterbank (GcFB). The speech recognition performance of our method is tested on speech signals corrupted by real-world noises. The evaluation results show that the proposed method gives better recognition rates compared to the classic techniques such as Perceptual Linear Prediction (PLP), Linear Predictive Coding (LPC), Linear Prediction Cepstral coefficients (LPCC) and Mel Frequency Cepstral Coefficients (MFCC). The used recognition system is based on the Hidden Markov Models with continuous Gaussian Mixture densities (HMM-GM).

Keywords: Auditory filter model; Feature extraction; Hidden Markov Models; Noisy speech recognition.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Automatic speech recognition system.
Figure 2
Figure 2
The Markov Model with 5 states simple model (Young et al. 2009 ).
Figure 3
Figure 3
Block diagram of PLP technique (Hermansky 1990 ).
Figure 4
Figure 4
The top panel represents the 25 ms waveform segment of the word “Water” (sampling frequency =16 kHz). The bottom panel illustrates the simulation of BMM for the waveform segment.
Figure 5
Figure 5
Block diagram of the proposed Perceptual linear predictive auditory Gammachirp (PLPaGc) method.
Figure 6
Figure 6
The temporal representations and the spectrograms of the used noises.

Similar articles

References

    1. Atal BS. Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. J Acoust Soc Am. 1974;55(6):1304–12. doi: 10.1121/1.1914702. - DOI - PubMed
    1. Atal BS, Hanauer SL. Speech analysis and synthesis by linear prediction of the speech wave. J Acoust Soc Am. 1971;50:637–55. doi: 10.1121/1.1912679. - DOI - PubMed
    1. Beigi H. Fundamentals of Speaker Recognition. New York: Springer; 2011.
    1. Bleeck S, Ives T, Patterson RD. Aim-mat: the auditry image model in MATLAB. Acta Acustica United Ac. 2004;90(4):781–787.
    1. Davis SB, Mermelstein P. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans Acoust, Speech, Signal Processing. 1980;28(4):357–66. doi: 10.1109/TASSP.1980.1163420. - DOI

LinkOut - more resources