Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015:2015:956249.
doi: 10.1155/2015/956249. Epub 2015 Nov 22.

Voice Disorder Classification Based on Multitaper Mel Frequency Cepstral Coefficients Features

Affiliations

Voice Disorder Classification Based on Multitaper Mel Frequency Cepstral Coefficients Features

Ömer Eskidere et al. Comput Math Methods Med. 2015.

Abstract

The Mel Frequency Cepstral Coefficients (MFCCs) are widely used in order to extract essential information from a voice signal and became a popular feature extractor used in audio processing. However, MFCC features are usually calculated from a single window (taper) characterized by large variance. This study shows investigations on reducing variance for the classification of two different voice qualities (normal voice and disordered voice) using multitaper MFCC features. We also compare their performance by newly proposed windowing techniques and conventional single-taper technique. The results demonstrate that adapted weighted Thomson multitaper method could distinguish between normal voice and disordered voice better than the results done by the conventional single-taper (Hamming window) technique and two newly proposed windowing methods. The multitaper MFCC features may be helpful in identifying voices at risk for a real pathology that has to be proven later.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Block diagram of single-taper and multitaper spectrum estimation based on MFCC feature extraction.
Figure 2
Figure 2
Single taper and different multitapers used for spectrum estimation: (a) Hamming window, (b) the sine tapers, (c) the multipeak tapers, and (d) the Thomson tapers. Window length is 480; m is the taper number.
Figure 3
Figure 3
(a) Normal voice and (b), (c), and (d) its estimated spectrum by the single taper (Hamming) and Thomson, multipeak, and SWCE multitaper methods for N = 3 tapers, for N = 9 tapers, and for N = 15 tapers, respectively.
Figure 4
Figure 4
(a) Pathological voice and (b), (c), and (d) its estimated spectrum by the single taper (Hamming) and Thomson, multipeak, and SWCE multitaper methods for N = 3 tapers, for N = 9 tapers, and for N = 15 tapers, respectively.
Figure 5
Figure 5
(a), (c), and (e) Sustained vowels /a/, /i/, and /u/ from normal subjects and (b), (d), and (f) their Thomson multitaper spectral estimates using uniform weights, eigenvalues as the weights, and adaptive weights.
Figure 6
Figure 6
(a), (c), and (e) Sustained vowels /a/, /i/, and /u/ from pathological subjects and (b), (d), and (f) their Thomson multitaper spectral estimates using uniform weights, eigenvalues as the weights, and adaptive weights.
Figure 7
Figure 7
The two novel window functions and Hamming window in the time domain.
Figure 8
Figure 8
Classification accuracies (%) using different number of tapers for (a) sustained vowel /a/, (b) sustained vowel /i/, and (c) sustained vowel /u/.
Figure 9
Figure 9
Voice quality classification accuracies (for /a/, /i/, and /u/) using the weights of Thomson multitaper method and Hamming window with (a)  N = 8, (b)  N = 12, and (c)  N = 16.
Figure 10
Figure 10
Classification performance comparisons of the two different window functions and Hamming window for /a/, /i/, and /u/ vowels.

References

    1. Omori K. Diagnosis of voice disorders. Japan Medical Association Journal. 2011;54(4):248–253.
    1. Amara F., Fezari M. Recent Advances in Biology, Medical Physics, Medical Chemistry, Biochemistry and Biomedical Engineering. 2013. Voice pathologies classification using GMM and SVM classifiers; p. p. 65.
    1. Henríquez P., Alonso J. B., Ferrer M. A., Travieso C. M., Godino-Llorente J. I., Díaz-de-María F. Characterization of healthy and pathological voice through measures based on nonlinear dynamics. IEEE Transactions on Audio, Speech, and Language Processing. 2009;17(6):1186–1195. doi: 10.1109/tasl.2009.2016734. - DOI
    1. Kundra P., Kumar V., Srinivasan K., Gopalakrishnan S., Krishnappa S. Laryngoscopic techniques to assess vocal cord mobility following thyroid surgery. ANZ Journal of Surgery. 2010;80(11):817–821. doi: 10.1111/j.1445-2197.2010.05441.x. - DOI - PubMed
    1. Carvalho R. T. S., Cavalcante C. C., Cortez P. C. Wavelet transform and artificial neural networks applied to voice disorders identification. Proceedings of the 3rd World Congress on Nature and Biologically Inspired Computing (NaBIC '11); October 2011; Salamanca, Spain. IEEE; pp. 371–376. - DOI

Publication types

LinkOut - more resources