. 2021 Jun 8;23(6):e25247.

doi: 10.2196/25247.

Deep Learning Application for Vocal Fold Disease Prediction Through Voice Recognition: Preliminary Development Study

Hao-Chun Hu^{1

2

3}, Shyue-Yih Chang⁴, Chuen-Heng Wang⁵, Kai-Jun Li², Hsiao-Yun Cho^{2

6}, Yi-Ting Chen⁵, Chang-Jung Lu⁴, Tzu-Pei Tsai⁴, Oscar Kuang-Sheng Lee^{1

7

8

9}

Affiliations

¹ Institute of Clinical Medicine, National Yang Ming Chiao Tung University, Taipei, Taiwan.
² Department of Otorhinolaryngology-Head and Neck Surgery, Fu Jen Catholic University Hospital, Fu Jen Catholic University, New Taipei City, Taiwan.
³ School of Medicine, College of Medicine, Fu Jen Catholic University, New Taipei City, Taiwan.
⁴ Voice Center, Department of Otolaryngology, Cheng Hsin General Hospital, Taipei, Taiwan.
⁵ Muen Biomedical and Optoelectronic Technologist Inc, Taipei, Taiwan.
⁶ Graduate Institute of Business Administration, Fu Jen Catholic University, New Taipei City, Taiwan.
⁷ Department of Orthopedics, China Medical University Hospital, Taichung, Taiwan.
⁸ Stem Cell Research Center, National Yang Ming Chiao Tung University, Taipei, Taiwan.
⁹ Department of Medical Research, Taipei Veterans General Hospital, Taipei, Taiwan.

PMID: 34100770
PMCID: PMC8241431
DOI: 10.2196/25247

Deep Learning Application for Vocal Fold Disease Prediction Through Voice Recognition: Preliminary Development Study

Hao-Chun Hu et al. J Med Internet Res. 2021.

. 2021 Jun 8;23(6):e25247.

doi: 10.2196/25247.

Authors

Hao-Chun Hu^{1

2

3}, Shyue-Yih Chang⁴, Chuen-Heng Wang⁵, Kai-Jun Li², Hsiao-Yun Cho^{2

6}, Yi-Ting Chen⁵, Chang-Jung Lu⁴, Tzu-Pei Tsai⁴, Oscar Kuang-Sheng Lee^{1

7

8

9}

Affiliations

¹ Institute of Clinical Medicine, National Yang Ming Chiao Tung University, Taipei, Taiwan.
² Department of Otorhinolaryngology-Head and Neck Surgery, Fu Jen Catholic University Hospital, Fu Jen Catholic University, New Taipei City, Taiwan.
³ School of Medicine, College of Medicine, Fu Jen Catholic University, New Taipei City, Taiwan.
⁴ Voice Center, Department of Otolaryngology, Cheng Hsin General Hospital, Taipei, Taiwan.
⁵ Muen Biomedical and Optoelectronic Technologist Inc, Taipei, Taiwan.
⁶ Graduate Institute of Business Administration, Fu Jen Catholic University, New Taipei City, Taiwan.
⁷ Department of Orthopedics, China Medical University Hospital, Taichung, Taiwan.
⁸ Stem Cell Research Center, National Yang Ming Chiao Tung University, Taipei, Taiwan.
⁹ Department of Medical Research, Taipei Veterans General Hospital, Taipei, Taiwan.

PMID: 34100770
PMCID: PMC8241431
DOI: 10.2196/25247

Abstract

Background: Dysphonia influences the quality of life by interfering with communication. However, a laryngoscopic examination is expensive and not readily accessible in primary care units. Experienced laryngologists are required to achieve an accurate diagnosis.

Objective: This study sought to detect various vocal fold diseases through pathological voice recognition using artificial intelligence.

Methods: We collected 189 normal voice samples and 552 samples of individuals with voice disorders, including vocal atrophy (n=224), unilateral vocal paralysis (n=50), organic vocal fold lesions (n=248), and adductor spasmodic dysphonia (n=30). The 741 samples were divided into 2 sets: 593 samples as the training set and 148 samples as the testing set. A convolutional neural network approach was applied to train the model, and findings were compared with those of human specialists.

Results: The convolutional neural network model achieved a sensitivity of 0.66, a specificity of 0.91, and an overall accuracy of 66.9% for distinguishing normal voice, vocal atrophy, unilateral vocal paralysis, organic vocal fold lesions, and adductor spasmodic dysphonia. Compared with the accuracy of human specialists, the overall accuracy rates were 60.1% and 56.1% for the 2 laryngologists and 51.4% and 43.2% for the 2 general ear, nose, and throat doctors.

Conclusions: Voice alone could be used for common vocal fold disease recognition through a deep learning approach after training with our Mandarin pathological voice database. This approach involving artificial intelligence could be clinically useful for screening general vocal fold disease using the voice. The approach includes a quick survey and a general health examination. It can be applied during telemedicine in areas with primary care units lacking laryngoscopic abilities. It could support physicians when prescreening cases by allowing for invasive examinations to be performed only for cases involving problems with automatic recognition or listening and for professional analyses of other clinical examination results that reveal doubts about the presence of pathologies.

Keywords: artificial intelligence; convolutional neural network; dysphonia; pathological voice; vocal fold disease; voice pathology identification.

©Hao-Chun Hu, Shyue-Yih Chang, Chuen-Heng Wang, Kai-Jun Li, Hsiao-Yun Cho, Yi-Ting Chen, Chang-Jung Lu, Tzu-Pei Tsai, Oscar Kuang-Sheng Lee. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 08.06.2021.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: None declared.

Figures

**Figure 1**
Illustration of the changes of the loss function value over the training and validation sets.

**Figure 2**
Confusion matrix of 2, 3, 4, and 5 classifications. AN = pathological voice; NC = normal voice; SD = adductor spasmodic dysphonia; PAATOL = unilateral vocal paralysis/vocal atrophy/organic vocal fold lesions; OL = organic vocal fold lesions; PAAT = unilateral vocal paralysis/vocal atrophy; PA = unilateral vocal paralysis; AT = vocal atrophy.

**Figure 3**
Receiver operating characteristic curves of 2, 3, 4, and 5 classifications. NC = normal voice; SD = adductor spasmodic dysphonia; PAATOL = unilateral vocal paralysis/vocal atrophy/organic vocal fold lesions; OL = organic vocal fold lesions; PAAT = unilateral vocal paralysis/vocal atrophy; PA = unilateral vocal paralysis; AT = vocal atrophy.

**Figure 4**
Confusion matrix of 5 classifications in human specialists. NC = normal voice; SD = adductor spasmodic dysphonia; OL = organic vocal fold lesions; PA = unilateral vocal paralysis; AT = vocal atrophy.

See this image and copyright information in PMC

Cited by

Optimal Deep Learning-Based Vocal Fold Disorder Detection and Classification Model on High-Speed Video Endoscopy.
Sakthivel S, Prabhu V. Sakthivel S, et al. J Healthc Eng. 2022 Oct 17;2022:4248938. doi: 10.1155/2022/4248938. eCollection 2022. J Healthc Eng. 2022. PMID: 36353680 Free PMC article.
New developments in the application of artificial intelligence to laryngology.
Torborg SR, Kim AYE, Rameau A. Torborg SR, et al. Curr Opin Otolaryngol Head Neck Surg. 2024 Dec 1;32(6):391-397. doi: 10.1097/MOO.0000000000000999. Epub 2024 Jul 24. Curr Opin Otolaryngol Head Neck Surg. 2024. PMID: 39146248 Review.
Voice disorder recognition using machine learning: a scoping review protocol.
Gupta R, Gunjawate DR, Nguyen DD, Jin C, Madill C. Gupta R, et al. BMJ Open. 2024 Feb 24;14(2):e076998. doi: 10.1136/bmjopen-2023-076998. BMJ Open. 2024. PMID: 38401896 Free PMC article.
Exploring the Role of Artificial Intelligence Chatbots in Preoperative Counseling for Head and Neck Cancer Surgery.
Lee JC, Hamill CS, Shnayder Y, Buczek E, Kakarala K, Bur AM. Lee JC, et al. Laryngoscope. 2024 Jun;134(6):2757-2761. doi: 10.1002/lary.31243. Epub 2023 Dec 21. Laryngoscope. 2024. PMID: 38126511 Free PMC article.
Severity Classification Using Dynamic Time Warping-Based Voice Biomarkers for Patients With COVID-19: Feasibility Cross-Sectional Study.
Watase T, Omiya Y, Tokuno S. Watase T, et al. JMIR Biomed Eng. 2023 Nov 6;8:e50924. doi: 10.2196/50924. eCollection 2023. JMIR Biomed Eng. 2023. PMID: 37982072 Free PMC article.

See all "Cited by" articles

References

1. Cohen SM, Dupont WD, Courey MS. Quality-of-life impact of non-neoplastic voice disorders: a meta-analysis. Ann Otol Rhinol Laryngol. 2006 Feb;115(2):128–134. doi: 10.1177/000348940611500209. - DOI - PubMed
1. Cohen SM, Kim J, Roy N, Asche C, Courey M. Prevalence and causes of dysphonia in a large treatment-seeking population. Laryngoscope. 2012 Feb;122(2):343–348. doi: 10.1002/lary.22426. - DOI - PubMed
1. Davids T, Klein AM, Johns MM. Current dysphonia trends in patients over the age of 65: is vocal atrophy becoming more prevalent? Laryngoscope. 2012 Feb;122(2):332–335. doi: 10.1002/lary.22397. - DOI - PubMed
1. Stachler RJ, Francis DO, Schwartz SR, Damask CC, Digoy GP, Krouse HJ, McCoy SJ, Ouellette DR, Patel RR, Reavis CCW, Smith LJ, Smith M, Strode SW, Woo P, Nnacheta LC. Clinical Practice Guideline: Hoarseness (Dysphonia) (Update) Otolaryngol Head Neck Surg. 2018 Mar;158(1_suppl):S1–S42. doi: 10.1177/0194599817751030. - DOI - PubMed
1. Fang SH, Tsao Y, Hsiao MJ, Chen JY, Lai YH, Lin FC, Wang CT. Detection of Pathological Voice Using Cepstrum Vectors: A Deep Learning Approach. J Voice. 2019 Sep;33(5):634–641. doi: 10.1016/j.jvoice.2018.02.003. - DOI - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Deep Learning Application for Vocal Fold Disease Prediction Through Voice Recognition: Preliminary Development Study

Affiliations

Deep Learning Application for Vocal Fold Disease Prediction Through Voice Recognition: Preliminary Development Study

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources