. 2024 Dec 31;19(12):e0315452.

doi: 10.1371/journal.pone.0315452. eCollection 2024.

Computing nasalance with MFCCs and Convolutional Neural Networks

Andrés Lozano¹, Enrique Nava¹, María Dolores García Méndez², Ignacio Moreno-Torres²

Affiliations

¹ Department of Communication Engineering, University of Málaga, Málaga, Spain.
² Department of Spanish Philology, University of Málaga, Málaga, Spain.

PMID: 39739659
PMCID: PMC11687758
DOI: 10.1371/journal.pone.0315452

Computing nasalance with MFCCs and Convolutional Neural Networks

Andrés Lozano et al. PLoS One. 2024.

. 2024 Dec 31;19(12):e0315452.

doi: 10.1371/journal.pone.0315452. eCollection 2024.

Authors

Andrés Lozano¹, Enrique Nava¹, María Dolores García Méndez², Ignacio Moreno-Torres²

Affiliations

¹ Department of Communication Engineering, University of Málaga, Málaga, Spain.
² Department of Spanish Philology, University of Málaga, Málaga, Spain.

PMID: 39739659
PMCID: PMC11687758
DOI: 10.1371/journal.pone.0315452

Abstract

Nasalance is a valuable clinical biomarker for hypernasality. It is computed as the ratio of acoustic energy emitted through the nose to the total energy emitted through the mouth and nose (eNasalance). A new approach is proposed to compute nasalance using Convolutional Neural Networks (CNNs) trained with Mel-Frequency Cepstrum Coefficients (mfccNasalance). mfccNasalance is evaluated by examining its accuracy: 1) when the train and test data are from the same or from different dialects; 2) with test data that differs in dynamicity (e.g. rapidly produced diadochokinetic syllables versus short words); and 3) using multiple CNN configurations (i.e. kernel shape and use of 1 × 1 pointwise convolution). Dual-channel Nasometer speech data from healthy speakers from different dialects: Costa Rica, more(+) nasal, Spain and Chile, less(-) nasal, are recorded. The input to the CNN models were sequences of 39 MFCC vectors computed from 250 ms moving windows. The test data were recorded in Spain and included short words (-dynamic), sentences (+dynamic), and diadochokinetic syllables (+dynamic). The accuracy of a CNN model was defined as the Spearman correlation between the mfccNasalance for that model and the perceptual nasality scores of human experts. In the same-dialect condition, mfccNasalance was more accurate than eNasalance independently of the CNN configuration; using a 1 × 1 kernel resulted in increased accuracy for +dynamic utterances (p < .000), though not for -dynamic utterances. The kernel shape had a significant impact for -dynamic utterances (p < .000) exclusively. In the different-dialect condition, the scores were significantly less accurate than in the same-dialect condition, particularly for Costa Rica trained models. We conclude that mfccNasalance is a flexible and useful alternative to eNasalance. Future studies should explore how to optimize mfccNasalance by selecting the most adequate CNN model as a function of the dynamicity of the target speech data.

Copyright: © 2024 Lozano et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

**Fig 1. Overview of the proposed CNN nasalance model for hypernasality prediction.**

**Fig 2. Convolutional Neural Network model for hypernasality prediction.**

**Fig 3. Kernel shapes and phonetic information.**

**Fig 4. Train data annotation and classification.**

**Fig 5. Train data annotation and classification.**

**Fig 6. Correlation between e-*Nasalance* and perceptual scores (orange rectangle), and *mfccNasalance* and perceptual scores (blue) in the same-dialect condition (Spain).**
Left figure shows the results for all the CNN configurations. The right figure shows the results for the CNNs using the optimal configuration (Spain: Syllable k11 = True, Words Kernels = Temporal, Sentences k11 = True).

**Fig 7. Correlation between *e-Nasalance* and perceptual scores (orange rectangle), and *mfccNasalance* and perceptual scores (blue) in the different-dialect condition.**
Top is for Costa Rica trained models, down for Chile. Left figures shows the results for all the CNN configurations. The right figures show the results for the CNNs using the optimal configuration (Costa Rica: Syllable k11 = True, Words Kernels = Temporal, Sentences k11 = True. Chile: Syllable Kernels = Spectral, Words k11 = False, Sentences Kernels = Spectral).

See this image and copyright information in PMC

References

1. Kummer A., "Cleft Palate and Craniofacial Anomalies: Effects on Speech and Resonance, ed 3. Clifton Park," ed: Delmar Publishing, 2013.
1. Kuehn D. P. and Moller K. T., "Speech and language issues in the cleft palate population: the state of the art," The Cleft palate-craniofacial journal, vol. 37, no. 4, pp. 1–35, 2000. - PubMed
1. Howard S. and Lohmander A., Cleft palate speech: assessment and intervention. John Wiley & Sons, 2011.
1. John A., Sell D., Sweeney T., Harding-Bell A., and Williams A., "The cleft audit protocol for speech—augmented: A validated and reliable measure for auditing cleft speech," The Cleft palate-craniofacial journal, vol. 43, no. 3, pp. 272–288, 2006. doi: 10.1597/04-141.1 - DOI - PubMed
1. Bettens K., Wuyts F. L., and Van Lierde K. M., "Instrumental assessment of velopharyngeal function and resonance: A review," Journal of communication disorders, vol. 52, pp. 170–183, 2014. doi: 10.1016/j.jcomdis.2014.05.004 - DOI - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- PubMed Central
- Public Library of Science

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Computing nasalance with MFCCs and Convolutional Neural Networks

Affiliations

Computing nasalance with MFCCs and Convolutional Neural Networks

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

LinkOut - more resources

Full Text Sources