Comparison of Convolutional Neural Network Models for Determination of Vocal Fold Normality in Laryngoscopic Images

Won Ki Cho¹, Seung-Ho Choi²

Affiliations

¹ Departments of Otorhinolaryngology-Head and Neck Surgery, Asan Medical Center, University of Ulsan College of Medicine, Seoul, South Korea.
² Departments of Otorhinolaryngology-Head and Neck Surgery, Asan Medical Center, University of Ulsan College of Medicine, Seoul, South Korea. Electronic address: shchoi@amc.seoul.kr.

PMID: 32873430
DOI: 10.1016/j.jvoice.2020.08.003

Comparison of Convolutional Neural Network Models for Determination of Vocal Fold Normality in Laryngoscopic Images

Won Ki Cho et al. J Voice. 2022 Sep.

. 2022 Sep;36(5):590-598.

doi: 10.1016/j.jvoice.2020.08.003. Epub 2020 Aug 30.

Authors

Won Ki Cho¹, Seung-Ho Choi²

Affiliations

¹ Departments of Otorhinolaryngology-Head and Neck Surgery, Asan Medical Center, University of Ulsan College of Medicine, Seoul, South Korea.
² Departments of Otorhinolaryngology-Head and Neck Surgery, Asan Medical Center, University of Ulsan College of Medicine, Seoul, South Korea. Electronic address: shchoi@amc.seoul.kr.

PMID: 32873430
DOI: 10.1016/j.jvoice.2020.08.003

Abstract

Objectives: Deep learning using convolutional neural networks (CNNs) is widely used in medical imaging research. This study was performed to investigate if vocal fold normality in laryngoscopic images can be determined by CNN-based deep learning and to compare accuracy of CNN models and explore the feasibility of application of deep learning on laryngoscopy.

Methods: Laryngoscopy videos were screen-captured and each image was cropped to include abducted vocal fold regions. A total of 2216 image (899 normal, 1317 abnormal) were allocated to training, validation, and test sets. Augmentation of training sets was used to train a constructed CNN model with six layers (CNN6), VGG16, Inception V3, and Xception models. Trained models were applied to the test set; for each model, receiver operating characteristic curves and cutoff values were obtained. Sensitivity, specificity, positive predictive value, negative predictive value, and accuracy were calculated. The best model was employed in video-streams and localization of features was attempted using Grad-CAM.

Results: All of the trained models showed high area under the receiver operating characteristic curve and the most discriminative cutoff levels of probability of normality were determined to be 35.6%, 61.8%, 13.5%, 39.7% for CNN6, VGG16, Inception V3, and Xception models, respectively. Accuracy of the CNN models selecting normal and abnormal vocal folds in the test set was 82.3%, 99.7%, 99.1%, and 83.8%, respectively.

Conclusion: All four models showed acceptable diagnostic accuracy. Performance of VGG16 and Inception V3 was better than the simple CNN6 model and the recently published Xception model. Real-time classification with a combination of the VGG16 model, OpenCV, and Grad-CAM on a video stream showed the potential clinical applications of the deep learning model in laryngoscopy.

Keywords: Computer—Computer-assisted––Deep learning—Diagnosis—Laryngoscopic images—Neural networks—Vocal cords.

PubMed Disclaimer

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- ClinicalKey
- Elsevier Science

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Comparison of Convolutional Neural Network Models for Determination of Vocal Fold Normality in Laryngoscopic Images

Affiliations

Comparison of Convolutional Neural Network Models for Determination of Vocal Fold Normality in Laryngoscopic Images

Authors

Affiliations

Abstract

MeSH terms

LinkOut - more resources

Full Text Sources