Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jul 8:S0892-1997(23)00179-0.
doi: 10.1016/j.jvoice.2023.06.006. Online ahead of print.

A Machine-Learning Algorithm for the Automated Perceptual Evaluation of Dysphonia Severity

Affiliations

A Machine-Learning Algorithm for the Automated Perceptual Evaluation of Dysphonia Severity

Benjamin van der Woerd et al. J Voice. .

Abstract

Objectives: Auditory-perceptual assessments are the gold standard for assessing voice quality. This project aims to develop a machine-learning model for measuring perceptual dysphonia severity of audio samples consistent with assessments by expert raters.

Methods: The Perceptual Voice Qualities Database samples were used, including sustained vowel and Consensus Auditory-Perceptual Evaluation of Voice sentences, which were previously expertly rated on a 0-100 scale. The OpenSMILE (audEERING GmbH, Gilching, Germany) toolkit was used to extract acoustic (Mel-Frequency Cepstral Coefficient-based, n = 1428) and prosodic (n = 152) features, pitch onsets, and recording duration. We utilized a support vector machine and these features (n = 1582) for automated assessment of dysphonia severity. Recordings were separated into vowels (V) and sentences (S) and features were extracted separately from each. Final voice quality predictions were made by combining the features extracted from the individual components with the whole audio (WA) sample (three file sets: S, V, WA).

Results: This algorithm has a high correlation (r = 0.847) with estimates of expert raters. The root mean square error was 13.36. Increasing signal complexity resulted in better estimation of dysphonia, whereby combining the features outperformed WA, S, and V sets individually.

Conclusion: A novel machine-learning algorithm was able to perform perceptual estimates of dysphonia severity using standardized audio samples on a 100-point scale. This was highly correlated to expert raters. This suggests that ML algorithms could offer an objective method for evaluating voice samples for dysphonia severity.

Keywords: Artificial intelligence; Automation; Machine learning; Perceptual voice evaluation; Voice evaluation.

PubMed Disclaimer

Conflict of interest statement

Declaration of Competing Interest The authors affirm that they have no conflicts of interest, financial or otherwise, that could be perceived as potentially influencing the objectivity or integrity of the research presented in this publication. We hereby declare that no competing interests exist, ensuring that this work has been conducted with complete transparency and in accordance with ethical guidelines.

LinkOut - more resources