A hybrid approach for binary and multi-class classification of voice disorders using a pre-trained model and ensemble classifiers

Mehtab Ur Rahman^{1

2}, Cem Direkoglu³

Affiliations

¹ Department of Language and Communication, Radboud University, Houtlaan, Nijmegen, Gelderland, 6525, Netherlands. mehtab.rahman@ru.nl.
² Electrical and Electronics Engineering Department, Middle East Technical University, Northern Cyprus Campus, Kalkanli, Güzelyurt, Mersin 10, 99738, Turkey. mehtab.rahman@ru.nl.
³ Electrical and Electronics Engineering Department, Middle East Technical University, Northern Cyprus Campus, Kalkanli, Güzelyurt, Mersin 10, 99738, Turkey.

PMID: 40312383
PMCID: PMC12044829
DOI: 10.1186/s12911-025-02978-w

A hybrid approach for binary and multi-class classification of voice disorders using a pre-trained model and ensemble classifiers

Mehtab Ur Rahman et al. BMC Med Inform Decis Mak. 2025.

. 2025 May 1;25(1):177.

doi: 10.1186/s12911-025-02978-w.

Authors

Mehtab Ur Rahman^{1

2}, Cem Direkoglu³

Affiliations

¹ Department of Language and Communication, Radboud University, Houtlaan, Nijmegen, Gelderland, 6525, Netherlands. mehtab.rahman@ru.nl.
² Electrical and Electronics Engineering Department, Middle East Technical University, Northern Cyprus Campus, Kalkanli, Güzelyurt, Mersin 10, 99738, Turkey. mehtab.rahman@ru.nl.
³ Electrical and Electronics Engineering Department, Middle East Technical University, Northern Cyprus Campus, Kalkanli, Güzelyurt, Mersin 10, 99738, Turkey.

PMID: 40312383
PMCID: PMC12044829
DOI: 10.1186/s12911-025-02978-w

Abstract

Recent advances in artificial intelligence-based audio and speech processing have increasingly focused on the binary and multi-class classification of voice disorders. Despite progress, achieving high accuracy in multi-class classification remains challenging. This paper proposes a novel hybrid approach using a two-stage framework to enhance voice disorders classification performance, and achieve state-of-the-art accuracies in multi-class classification. Our hybrid approach, combines deep learning features with various powerful classifiers. In the first stage, high-level feature embeddings are extracted from voice data spectrograms using a pre-trained VGGish model. In the second stage, these embeddings are used as input to four different classifiers: Support Vector Machine (SVM), Logistic Regression (LR), Multi-Layer Perceptron (MLP), and an Ensemble Classifier (EC). Experiments are conducted on a subset of the Saarbruecken Voice Database (SVD) for male, female, and combined speakers. For binary classification, VGGish-SVM achieved the highest accuracy for male speakers (82.45% for healthy vs. disordered; 75.45% for hyperfunctional dysphonia vs. vocal fold paresis), while VGGish-EC performed best for female speakers (71.54% for healthy vs. disordered; 68.42% for hyperfunctional dysphonia vs. vocal fold paresis). In multi-class classification, VGGish-SVM outperformed other models, achieving mean accuracies of 77.81% for male speakers, 63.11% for female speakers, and 70.53% for combined genders. We conducted a comparative analysis against related works, including the Mel frequency cepstral coefficient (MFCC), MFCC-glottal features, and features extracted using the wav2vec and HuBERT models with SVM classifier. Results demonstrate that our hybrid approach consistently outperforms these models, especially in multi-class classification tasks. The results show the feasibility of a hybrid framework for voice disorder classification, offering a foundation for refining automated tools that could support clinical assessments with further validation.

Keywords: Ensemble classifier; Multi-class classification; VGGish; Voice disorders.

PubMed Disclaimer

Conflict of interest statement

Declarations. Ethics approval and consent to participate: Not applicable. Consent for publication: Not applicable. Clinical trial number: Not applicable. Competing interests: The authors declare no competing interests.

Figures

**Fig. 1**
The proposed voice disorders classification system

**Fig. 4**
Normalized confusion matrix for healthy vs. disordered. The predicted classes are represented on the horizontal axis, while the true classes are represented on the vertical axis. Class labels: 0 for healthy and 1 for disordered

**Fig. 5**
Normalized confusion matrix for hyperfunctional dysphonia vs. vocal fold paresis. The predicted classes are represented on the horizontal axis, while the true classes are represented on the vertical axis. Class labels: 0 for hyperfunctional dysphonia and 1 for vocal fold paresis

**Fig. 6**
Normalized confusion matrix for multi-class classification. The predicted classes are represented on the horizontal axis, while the true classes are represented on the vertical axis. Class labels: 0 for healthy, 1 for hyperfunctional dysphonia, and 2 for vocal fold paresis

See this image and copyright information in PMC

References

1. Ramig LO, Verdolini K. Treatment efficacy. J Speech Lang Hear Res. 1998;41(1):S101–S116. 10.1044/jslhr.4101.s101. - PubMed
1. Robotti C, Mozzanica F, Barillari MR, Bono M, Cacioppo G, Dimattia F, Gitto M, Rocca S, Schindler A. Treatment of relapsing functional and organic dysphonia: a narrative literature review. Acta Otorhinolaryngol Ital. 2023;43(2 Suppl 1):84. 10.14639/0392-100x-suppl.1-43-2023-11. - PMC - PubMed
1. American Speech-Language-Hearing Association. (n.d.).Voice disorders. (Practice Portal). Accessed 14 Sept 2023. https://www.asha.org/practice-portal/clinical-topics/voice-disorders/
1. Ribas D, Pastor MA, Miguel A, Martnez D, Ortega A, Lleida E. Automatic voice disorder detection using self-supervised representations. IEE Access. 2023;11:14915–27. 10.1109/ACCESS.2023.3243986.
1. Xie Y, Ruiyu L, Liang Z, Huang C, Zou C, Schuller B. Speech emotion classification using attention-based LSTM. IEEE/ACM Trans Audio Speech Lang Process. 2019;27:1–1. 10.1109/TASLP.2019.2925934.

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- BioMed Central
- PubMed Central
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A hybrid approach for binary and multi-class classification of voice disorders using a pre-trained model and ensemble classifiers

Affiliations

A hybrid approach for binary and multi-class classification of voice disorders using a pre-trained model and ensemble classifiers

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

LinkOut - more resources

Full Text Sources

Medical