Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Oct 25;9(11):3415.
doi: 10.3390/jcm9113415.

Convolutional Neural Network Classifies Pathological Voice Change in Laryngeal Cancer with High Accuracy

Affiliations

Convolutional Neural Network Classifies Pathological Voice Change in Laryngeal Cancer with High Accuracy

HyunBum Kim et al. J Clin Med. .

Abstract

Voice changes may be the earliest signs in laryngeal cancer. We investigated whether automated voice signal analysis can be used to distinguish patients with laryngeal cancer from healthy subjects. We extracted features using the software package for speech analysis in phonetics (PRAAT) and calculated the Mel-frequency cepstral coefficients (MFCCs) from voice samples of a vowel sound of /a:/. The proposed method was tested with six algorithms: support vector machine (SVM), extreme gradient boosting (XGBoost), light gradient boosted machine (LGBM), artificial neural network (ANN), one-dimensional convolutional neural network (1D-CNN) and two-dimensional convolutional neural network (2D-CNN). Their performances were evaluated in terms of accuracy, sensitivity, and specificity. The result was compared with human performance. A total of four volunteers, two of whom were trained laryngologists, rated the same files. The 1D-CNN showed the highest accuracy of 85% and sensitivity and sensitivity and specificity levels of 78% and 93%. The two laryngologists achieved accuracy of 69.9% but sensitivity levels of 44%. Automated analysis of voice signals could differentiate subjects with laryngeal cancer from those of healthy subjects with higher diagnostic properties than those performed by the four volunteers.

Keywords: deep learning; larynx cancer; machine learning; voice change; voice pathology classification.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
The graphic presentation of transformation from raw signal into Mel-frequency cepstral coefficients (MFCCs) image, a necessary process to comply with the two-dimensional convolutional neural network input shape. (a) Plot of signals down sampled to 16,000 Hz; (b) plot of signals normalized between −1 and 1; (c) image of signals after MFCCs transformation.
Figure 2
Figure 2
The flowchart of Mel-frequency cepstral coefficients (MFCCs) transformation. (a) and presentation of Mel filter banks (b). The triangular filter banks are densely located towards low frequency range, reflecting the distinctive nature of the human voice in that range. Abbreviations: FFT, fast Fourier transform; DFT, discrete Fourier transform.
Figure 3
Figure 3
Illustration of five-fold validation. A given data set is split into five subsections where each fold is used as a testing set, a useful method to use all data where data is limited.
Figure 4
Figure 4
Illustration of one-dimensional convolutional neural network model structure.
Figure 5
Figure 5
Illustration of two-dimensional convolutional neural network model structure.
Figure 6
Figure 6
Feature importance analysis of XGBoost. The plot demonstrates the relative information gains based on the feature importance classification task of male voice samples.
Figure 7
Figure 7
ROC (receiver operating characteristic) curve analysis of the different models for the classification of laryngeal cancer. ROC curves algorithms for classification task of only male voice samples. Abbreviations: LGBM, LightGBM; XBG, XGBoost; SVM, support vector machine; ANN, artificial neural network; 1D-CNN, one-dimensional convolutional neural network; 2D-CNN, two-dimensional convolutional neural network; MFCCs, Mel-frequency cepstral coefficients; STFT, short time Fourier transform.

References

    1. Siegel R.L., Miller K.D., Jemal A. Cancer statistics. CA Cancer J. Clin. 2020;70:7–30. doi: 10.3322/caac.21590. - DOI - PubMed
    1. Nieminen M., Aro K., Jouhi L., Back L., Makitie A., Atula T. Causes for delay before specialist consultation in head and neck cancer. Acta Oncol. 2018;57:1677–1686. doi: 10.1080/0284186X.2018.1497297. - DOI - PubMed
    1. Polesel J., Furlan C., Birri S., Giacomarra V., Vaccher E., Grando G., Gobitti C., Navarria F., Schioppa O., Minatel E., et al. The impact of time to treatment initiation on survival from head and neck cancer in north-eastern Italy. Oral Oncol. 2017;67:175–182. doi: 10.1016/j.oraloncology.2017.02.009. - DOI - PubMed
    1. Aylward A., Park J., Abdelaziz S., Hunt J.P., Buchmann L.O., Cannon R.B., Rowe K., Snyder J., Deshmukh V., Newman M., et al. Individualized prediction of late-onset dysphagia in head and neck cancer survivors. Head Neck. 2020;42:708–718. doi: 10.1002/hed.26039. - DOI - PubMed
    1. Balaguer M., Pommee T., Farinas J., Pinquier J., Woisard V., Speyer R. Effects of oral and oropharyngeal cancer on speech intelligibility using acoustic analysis: Systematic review. Head Neck. 2020;42:111–130. doi: 10.1002/hed.25949. - DOI - PubMed

LinkOut - more resources