. 2024 Aug 2:12:1433087.

doi: 10.3389/fbioe.2024.1433087. eCollection 2024.

A deep learning approach to dysphagia-aspiration detecting algorithm through pre- and post-swallowing voice changes

Jung-Min Kim^{1

2}, Min-Seop Kim³, Sun-Young Choi², Kyogu Lee⁴, Ju Seok Ryu^{2

5}

Affiliations

¹ Department of Health Science and Technology, Graduate School of Convergence Science and Technology, Seoul National University, Seoul, Republic of Korea.
² Department of Rehabilitation Medicine, Seoul National University Bundang Hospital, Seongnam, Republic of Korea.
³ Department of Multimedia Engineering, Dongguk University, Seoul, Republic of Korea.
⁴ Music and Audio Research Group, Department of Intelligence and Information, Graduate School of Convergence Science and Technology, Seoul National University, Seoul, Republic of Korea.
⁵ Seoul National University College of Medicine, Seoul, Republic of Korea.

PMID: 39157445
PMCID: PMC11327512
DOI: 10.3389/fbioe.2024.1433087

A deep learning approach to dysphagia-aspiration detecting algorithm through pre- and post-swallowing voice changes

Jung-Min Kim et al. Front Bioeng Biotechnol. 2024.

. 2024 Aug 2:12:1433087.

doi: 10.3389/fbioe.2024.1433087. eCollection 2024.

Authors

Jung-Min Kim^{1

2}, Min-Seop Kim³, Sun-Young Choi², Kyogu Lee⁴, Ju Seok Ryu^{2

5}

Affiliations

¹ Department of Health Science and Technology, Graduate School of Convergence Science and Technology, Seoul National University, Seoul, Republic of Korea.
² Department of Rehabilitation Medicine, Seoul National University Bundang Hospital, Seongnam, Republic of Korea.
³ Department of Multimedia Engineering, Dongguk University, Seoul, Republic of Korea.
⁴ Music and Audio Research Group, Department of Intelligence and Information, Graduate School of Convergence Science and Technology, Seoul National University, Seoul, Republic of Korea.
⁵ Seoul National University College of Medicine, Seoul, Republic of Korea.

PMID: 39157445
PMCID: PMC11327512
DOI: 10.3389/fbioe.2024.1433087

Abstract

Introduction: This study aimed to identify differences in voice characteristics and changes between patients with dysphagia-aspiration and healthy individuals using a deep learning model, with a focus on under-researched areas of pre- and post-swallowing voice changes in patients with dysphagia. We hypothesized that these variations may be due to weakened muscles and blocked airways in patients with dysphagia.

Methods: A prospective cohort study was conducted on 198 participants aged >40 years at the Seoul National University Bundang Hospital from October 2021 to February 2023. Pre- and post-swallowing voice data of the participants were converted to a 64-kbps mp3 format, and all voice data were trimmed to a length of 2 s. The data were divided for 10-fold cross-validation and stored in HDF5 format with anonymized IDs and labels for the normal and aspiration groups. During preprocessing, the data were converted to Mel spectrograms, and the EfficientAT model was modified using the final layer of MobileNetV3 to effectively detect voice changes and analyze pre- and post-swallowing voices. This enabled the model to probabilistically categorize new patient voices as normal or aspirated.

Results: In a study of the machine-learning model for aspiration detection, area under the receiver operating characteristic curve (AUC) values were analyzed across sexes under different configurations. The average AUC values for males ranged from 0.8117 to 0.8319, with the best performance achieved at a learning rate of 3.00e-5 and a batch size of 16. The average AUC values for females improved from 0.6975 to 0.7331, with the best performance observed at a learning rate of 5.00e-5 and a batch size of 32. As there were fewer female participants, a combined model was developed to maintain the sex balance. In the combined model, the average AUC values ranged from 0.7746 to 0.7997, and optimal performance was achieved at a learning rate of 3.00e-5 and a batch size of 16.

Conclusion: This study evaluated a voice analysis-based program to detect pre- and post-swallowing changes in patients with dysphagia, potentially aiding in real-time monitoring. Such a system can provide healthcare professionals with daily insights into the conditions of patients, allowing for personalized interventions.

Clinical trial registration: ClinicalTrials.gov, identifier NCT05149976.

Keywords: aspiration detection model; deep learning; dysphagia-aspiration; voice changes pre-and post-swallowing; voice-based non-face-to-face monitoring.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

**FIGURE 1**
Study flow for the participant selection.

**FIGURE 2**
Voice data transformation and preprocessing.

**FIGURE 3**
Development of the voice-change detection model and inference windows.

**FIGURE 4**
ROC analysis results for all models. **(A)** Male Models. **(B)** Female Models. **(C)** Combined (Male + Female) Models. The ROC curve shown represents the average ROC curve for the 10 folds under the parameter combination (learning rate, batch size) that resulted in the highest AUC value for models according to sex (male, female, combined). For the male model, the highest AUC value was 0.8319, achieved with a learning rate of 3.00e-5 and a batch size of 16. For the female model, the highest AUC value was 0.7331, achieved with a learning rate of 5.00e-5 and a batch size of 32. For the combined model, the highest AUC value was 0.7997, achieved with a learning rate of 3.00e-5 and a batch size of 16. Among the sex-specific models, the male model showed the highest overall AUC value, as it had the most even distribution of data across groups. Although the female model displayed an accuracy similar to that of the male model, a significant imbalance between the normal group and the aspiration group led to the relatively lowest AUC value.

See this image and copyright information in PMC

Cited by

Artificial Intelligence for Diagnosis and Treatment of Dysphagia.
Jotz GP, Jotz AV, Arnold D, Borelli WV. Jotz GP, et al. Int Arch Otorhinolaryngol. 2025 Jan 23;29(1):1-2. doi: 10.1055/s-0044-1801781. eCollection 2025 Jan. Int Arch Otorhinolaryngol. 2025. PMID: 39850498 Free PMC article. No abstract available.
A Machine Learning Pipeline for Automated Bolus Segmentation and Area Measurement in Swallowing Videofluoroscopy Images of an Infant Pig Model.
Sarmet M, Kaczmarek E, Fauveau A, Steer K, Velasco AA, Smith A, Kennedy M, Shideler H, Wallace S, Stroud T, Blilie M, Mayerl CJ. Sarmet M, et al. Dysphagia. 2025 Apr 28:10.1007/s00455-025-10829-z. doi: 10.1007/s00455-025-10829-z. Online ahead of print. Dysphagia. 2025. PMID: 40293507

References

1. Abdel Jalil A. A., Katzka D. A., Castell D. O. (2015). Approach to the patient with dysphagia. Am. J. Med. 128, 1138.e17–1138.e23. 10.1016/j.amjmed.2015.04.026 - DOI - PubMed
1. Borders J. C., Brates D. (2020). Use of the penetration-aspiration scale in dysphagia research: a systematic review. Dysphagia 35, 583–597. 10.1007/s00455-019-10064-3 - DOI - PubMed
1. Bowdish D. M. (2019). The aging lung: is lung health good health for older adults? Chest 155, 391–400. 10.1016/j.chest.2018.09.003 - DOI - PubMed
1. Brodsky M. B., Suiter D. M., Gonzalez-Fernandez M., Michtalik H. J., Frymark T. B., Venediktov R., et al. (2016). Screening accuracy for aspiration using bedside water swallow tests: a systematic review and meta-analysis. Chest 150, 148–163. 10.1016/j.chest.2016.03.059 - DOI - PMC - PubMed
1. Clave P., Arreola V., Romea M., Medina L., Palomera E., Serra-Prat M. (2008). Accuracy of the volume-viscosity swallow test for clinical screening of oropharyngeal dysphagia and aspiration. Clin. Nutr. 27, 806–815. 10.1016/j.clnu.2008.06.011 - DOI - PubMed

Associated data

Actions
- Search in PubMed
- Search in ClinicalTrials.gov

LinkOut - more resources

Full Text Sources
- Frontiers Media SA
- PubMed Central
Medical
- ClinicalTrials.gov
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A deep learning approach to dysphagia-aspiration detecting algorithm through pre- and post-swallowing voice changes

Affiliations

A deep learning approach to dysphagia-aspiration detecting algorithm through pre- and post-swallowing voice changes

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Associated data

LinkOut - more resources

Full Text Sources

Medical

Research Materials

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Associated data

Related information

LinkOut - more resources

Full Text Sources

Medical

Research Materials