Convolutional Neural Network Classifies Pathological Voice Change in Laryngeal Cancer with High Accuracy

HyunBum Kim¹, Juhyeong Jeon², Yeon Jae Han³, YoungHoon Joo¹, Jonghwan Lee², Seungchul Lee^{2

4}, Sun Im³

Affiliations

¹ Department of Otolaryngology-Head and Neck Surgery, Bucheon St. Mary's Hospital, College of Medicine, The Catholic University of Korea, Seoul 06591, Korea.
² Department of Mechanical Engineering, Pohang University of Science and Technology (POSTECH), Pohang 37673, Korea.
³ Department of Rehabilitation Medicine, Bucheon St. Mary's Hospital, College of Medicine, The Catholic University of Korea, Seoul 06591, Korea.
⁴ Graduate School of Artificial Intelligence, Pohang University of Science and Technology (POSTECH), Pohang 37673, Korea.

PMID: 33113785
PMCID: PMC7692693
DOI: 10.3390/jcm9113415

Convolutional Neural Network Classifies Pathological Voice Change in Laryngeal Cancer with High Accuracy

HyunBum Kim et al. J Clin Med. 2020.

. 2020 Oct 25;9(11):3415.

doi: 10.3390/jcm9113415.

Authors

HyunBum Kim¹, Juhyeong Jeon², Yeon Jae Han³, YoungHoon Joo¹, Jonghwan Lee², Seungchul Lee^{2

4}, Sun Im³

Affiliations

¹ Department of Otolaryngology-Head and Neck Surgery, Bucheon St. Mary's Hospital, College of Medicine, The Catholic University of Korea, Seoul 06591, Korea.
² Department of Mechanical Engineering, Pohang University of Science and Technology (POSTECH), Pohang 37673, Korea.
³ Department of Rehabilitation Medicine, Bucheon St. Mary's Hospital, College of Medicine, The Catholic University of Korea, Seoul 06591, Korea.
⁴ Graduate School of Artificial Intelligence, Pohang University of Science and Technology (POSTECH), Pohang 37673, Korea.

PMID: 33113785
PMCID: PMC7692693
DOI: 10.3390/jcm9113415

Abstract

Voice changes may be the earliest signs in laryngeal cancer. We investigated whether automated voice signal analysis can be used to distinguish patients with laryngeal cancer from healthy subjects. We extracted features using the software package for speech analysis in phonetics (PRAAT) and calculated the Mel-frequency cepstral coefficients (MFCCs) from voice samples of a vowel sound of /a:/. The proposed method was tested with six algorithms: support vector machine (SVM), extreme gradient boosting (XGBoost), light gradient boosted machine (LGBM), artificial neural network (ANN), one-dimensional convolutional neural network (1D-CNN) and two-dimensional convolutional neural network (2D-CNN). Their performances were evaluated in terms of accuracy, sensitivity, and specificity. The result was compared with human performance. A total of four volunteers, two of whom were trained laryngologists, rated the same files. The 1D-CNN showed the highest accuracy of 85% and sensitivity and sensitivity and specificity levels of 78% and 93%. The two laryngologists achieved accuracy of 69.9% but sensitivity levels of 44%. Automated analysis of voice signals could differentiate subjects with laryngeal cancer from those of healthy subjects with higher diagnostic properties than those performed by the four volunteers.

Keywords: deep learning; larynx cancer; machine learning; voice change; voice pathology classification.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

**Figure 1**
The graphic presentation of transformation from raw signal into Mel-frequency cepstral coefficients (MFCCs) image, a necessary process to comply with the two-dimensional convolutional neural network input shape. (a) Plot of signals down sampled to 16,000 Hz; (b) plot of signals normalized between −1 and 1; (c) image of signals after MFCCs transformation.

**Figure 2**
The flowchart of Mel-frequency cepstral coefficients (MFCCs) transformation. (a) and presentation of Mel filter banks (b). The triangular filter banks are densely located towards low frequency range, reflecting the distinctive nature of the human voice in that range. Abbreviations: FFT, fast Fourier transform; DFT, discrete Fourier transform.

**Figure 3**
Illustration of five-fold validation. A given data set is split into five subsections where each fold is used as a testing set, a useful method to use all data where data is limited.

**Figure 4**
Illustration of one-dimensional convolutional neural network model structure.

**Figure 5**
Illustration of two-dimensional convolutional neural network model structure.

**Figure 6**
Feature importance analysis of XGBoost. The plot demonstrates the relative information gains based on the feature importance classification task of male voice samples.

**Figure 7**
ROC (receiver operating characteristic) curve analysis of the different models for the classification of laryngeal cancer. ROC curves algorithms for classification task of only male voice samples. Abbreviations: LGBM, LightGBM; XBG, XGBoost; SVM, support vector machine; ANN, artificial neural network; 1D-CNN, one-dimensional convolutional neural network; 2D-CNN, two-dimensional convolutional neural network; MFCCs, Mel-frequency cepstral coefficients; STFT, short time Fourier transform.

See this image and copyright information in PMC

References

1. Siegel R.L., Miller K.D., Jemal A. Cancer statistics. CA Cancer J. Clin. 2020;70:7–30. doi: 10.3322/caac.21590. - DOI - PubMed
1. Nieminen M., Aro K., Jouhi L., Back L., Makitie A., Atula T. Causes for delay before specialist consultation in head and neck cancer. Acta Oncol. 2018;57:1677–1686. doi: 10.1080/0284186X.2018.1497297. - DOI - PubMed
1. Polesel J., Furlan C., Birri S., Giacomarra V., Vaccher E., Grando G., Gobitti C., Navarria F., Schioppa O., Minatel E., et al. The impact of time to treatment initiation on survival from head and neck cancer in north-eastern Italy. Oral Oncol. 2017;67:175–182. doi: 10.1016/j.oraloncology.2017.02.009. - DOI - PubMed
1. Aylward A., Park J., Abdelaziz S., Hunt J.P., Buchmann L.O., Cannon R.B., Rowe K., Snyder J., Deshmukh V., Newman M., et al. Individualized prediction of late-onset dysphagia in head and neck cancer survivors. Head Neck. 2020;42:708–718. doi: 10.1002/hed.26039. - DOI - PubMed
1. Balaguer M., Pommee T., Farinas J., Pinquier J., Woisard V., Speyer R. Effects of oral and oropharyngeal cancer on speech intelligibility using acoustic analysis: Systematic review. Head Neck. 2020;42:111–130. doi: 10.1002/hed.25949. - DOI - PubMed

Grants and funding

LinkOut - more resources

Full Text Sources
Medical
- ClinicalTrials.gov

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Convolutional Neural Network Classifies Pathological Voice Change in Laryngeal Cancer with High Accuracy

Affiliations

Convolutional Neural Network Classifies Pathological Voice Change in Laryngeal Cancer with High Accuracy

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Grants and funding

LinkOut - more resources

Full Text Sources

Medical