Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Aug 2:12:1433087.
doi: 10.3389/fbioe.2024.1433087. eCollection 2024.

A deep learning approach to dysphagia-aspiration detecting algorithm through pre- and post-swallowing voice changes

Affiliations

A deep learning approach to dysphagia-aspiration detecting algorithm through pre- and post-swallowing voice changes

Jung-Min Kim et al. Front Bioeng Biotechnol. .

Abstract

Introduction: This study aimed to identify differences in voice characteristics and changes between patients with dysphagia-aspiration and healthy individuals using a deep learning model, with a focus on under-researched areas of pre- and post-swallowing voice changes in patients with dysphagia. We hypothesized that these variations may be due to weakened muscles and blocked airways in patients with dysphagia.

Methods: A prospective cohort study was conducted on 198 participants aged >40 years at the Seoul National University Bundang Hospital from October 2021 to February 2023. Pre- and post-swallowing voice data of the participants were converted to a 64-kbps mp3 format, and all voice data were trimmed to a length of 2 s. The data were divided for 10-fold cross-validation and stored in HDF5 format with anonymized IDs and labels for the normal and aspiration groups. During preprocessing, the data were converted to Mel spectrograms, and the EfficientAT model was modified using the final layer of MobileNetV3 to effectively detect voice changes and analyze pre- and post-swallowing voices. This enabled the model to probabilistically categorize new patient voices as normal or aspirated.

Results: In a study of the machine-learning model for aspiration detection, area under the receiver operating characteristic curve (AUC) values were analyzed across sexes under different configurations. The average AUC values for males ranged from 0.8117 to 0.8319, with the best performance achieved at a learning rate of 3.00e-5 and a batch size of 16. The average AUC values for females improved from 0.6975 to 0.7331, with the best performance observed at a learning rate of 5.00e-5 and a batch size of 32. As there were fewer female participants, a combined model was developed to maintain the sex balance. In the combined model, the average AUC values ranged from 0.7746 to 0.7997, and optimal performance was achieved at a learning rate of 3.00e-5 and a batch size of 16.

Conclusion: This study evaluated a voice analysis-based program to detect pre- and post-swallowing changes in patients with dysphagia, potentially aiding in real-time monitoring. Such a system can provide healthcare professionals with daily insights into the conditions of patients, allowing for personalized interventions.

Clinical trial registration: ClinicalTrials.gov, identifier NCT05149976.

Keywords: aspiration detection model; deep learning; dysphagia-aspiration; voice changes pre-and post-swallowing; voice-based non-face-to-face monitoring.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

FIGURE 1
FIGURE 1
Study flow for the participant selection.
FIGURE 2
FIGURE 2
Voice data transformation and preprocessing.
FIGURE 3
FIGURE 3
Development of the voice-change detection model and inference windows.
FIGURE 4
FIGURE 4
ROC analysis results for all models. (A) Male Models. (B) Female Models. (C) Combined (Male + Female) Models. The ROC curve shown represents the average ROC curve for the 10 folds under the parameter combination (learning rate, batch size) that resulted in the highest AUC value for models according to sex (male, female, combined). For the male model, the highest AUC value was 0.8319, achieved with a learning rate of 3.00e-5 and a batch size of 16. For the female model, the highest AUC value was 0.7331, achieved with a learning rate of 5.00e-5 and a batch size of 32. For the combined model, the highest AUC value was 0.7997, achieved with a learning rate of 3.00e-5 and a batch size of 16. Among the sex-specific models, the male model showed the highest overall AUC value, as it had the most even distribution of data across groups. Although the female model displayed an accuracy similar to that of the male model, a significant imbalance between the normal group and the aspiration group led to the relatively lowest AUC value.

Similar articles

Cited by

References

    1. Abdel Jalil A. A., Katzka D. A., Castell D. O. (2015). Approach to the patient with dysphagia. Am. J. Med. 128, 1138.e17–1138.e23. 10.1016/j.amjmed.2015.04.026 - DOI - PubMed
    1. Borders J. C., Brates D. (2020). Use of the penetration-aspiration scale in dysphagia research: a systematic review. Dysphagia 35, 583–597. 10.1007/s00455-019-10064-3 - DOI - PubMed
    1. Bowdish D. M. (2019). The aging lung: is lung health good health for older adults? Chest 155, 391–400. 10.1016/j.chest.2018.09.003 - DOI - PubMed
    1. Brodsky M. B., Suiter D. M., Gonzalez-Fernandez M., Michtalik H. J., Frymark T. B., Venediktov R., et al. (2016). Screening accuracy for aspiration using bedside water swallow tests: a systematic review and meta-analysis. Chest 150, 148–163. 10.1016/j.chest.2016.03.059 - DOI - PMC - PubMed
    1. Clave P., Arreola V., Romea M., Medina L., Palomera E., Serra-Prat M. (2008). Accuracy of the volume-viscosity swallow test for clinical screening of oropharyngeal dysphagia and aspiration. Clin. Nutr. 27, 806–815. 10.1016/j.clnu.2008.06.011 - DOI - PubMed

Associated data

LinkOut - more resources