Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Nov 24:17:1302132.
doi: 10.3389/fnins.2023.1302132. eCollection 2023.

Machine-learning assisted swallowing assessment: a deep learning-based quality improvement tool to screen for post-stroke dysphagia

Affiliations

Machine-learning assisted swallowing assessment: a deep learning-based quality improvement tool to screen for post-stroke dysphagia

Rami Saab et al. Front Neurosci. .

Abstract

Introduction: Post-stroke dysphagia is common and associated with significant morbidity and mortality, rendering bedside screening of significant clinical importance. Using voice as a biomarker coupled with deep learning has the potential to improve patient access to screening and mitigate the subjectivity associated with detecting voice change, a component of several validated screening protocols.

Methods: In this single-center study, we developed a proof-of-concept model for automated dysphagia screening and evaluated the performance of this model on training and testing cohorts. Patients were admitted to a comprehensive stroke center, where primary English speakers could follow commands without significant aphasia and participated on a rolling basis. The primary outcome was classification either as a pass or fail equivalent using a dysphagia screening test as a label. Voice data was recorded from patients who spoke a standardized set of vowels, words, and sentences from the National Institute of Health Stroke Scale. Seventy patients were recruited and 68 were included in the analysis, with 40 in training and 28 in testing cohorts, respectively. Speech from patients was segmented into 1,579 audio clips, from which 6,655 Mel-spectrogram images were computed and used as inputs for deep-learning models (DenseNet and ConvNext, separately and together). Clip-level and participant-level swallowing status predictions were obtained through a voting method.

Results: The models demonstrated clip-level dysphagia screening sensitivity of 71% and specificity of 77% (F1 = 0.73, AUC = 0.80 [95% CI: 0.78-0.82]). At the participant level, the sensitivity and specificity were 89 and 79%, respectively (F1 = 0.81, AUC = 0.91 [95% CI: 0.77-1.05]).

Discussion: This study is the first to demonstrate the feasibility of applying deep learning to classify vocalizations to detect post-stroke dysphagia. Our findings suggest potential for enhancing dysphagia screening in clinical settings. https://github.com/UofTNeurology/masa-open-source.

Keywords: Artificial Intelligence; dysphagia; machine learning; neural technology; original research stroke; quality improvement; stroke; swallowing.

PubMed Disclaimer

Conflict of interest statement

HK was an associate editor for Frontiers in Neurology. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. The authors declared that they were an editorial board member of Frontiers, at the time of submission. This had no impact on the peer review process and the final decision.

Figures

FIGURE 1
FIGURE 1
Training and testing deep learning classifiers to distinguish audio recordings based on dysphagia status. (A) Audio clips were recorded from each patient using a standardized assessment of vowels as well as words and sentences from the NIHSS language assessment, (B) and then segmented into 0.5 s windows. (C) Each clip from a given patient was then converted to Mel-spectrogram images using either the RGB (shown here) or three-channel approaches. Each Mel-spectrogram image was used as an input into the CNN (either DenseNet, ConvNext-Tiny or fusion networks) which generated an output class along with an output probability for each clip. (D) The average of all clip level output probabilities per patient were used to generate a final participant-level output class prediction.
FIGURE 2
FIGURE 2
Mel-spectrogram processing methods, comparing data processing pipelines between the standard RGB Mel-spectrogram approach (top) and three-channel Mel-spectrogram (bottom) involving depth-wise concatenation of three separate Mel-spectrograms with different FFT lengths to produce a single composite image.
FIGURE 3
FIGURE 3
Training and validation accuracy and loss curves for ConvNext-Tiny (left) and DenseNet-121 (right).
FIGURE 4
FIGURE 4
Confusion matrices for fusion model applied at the clip-level on training set (left) and test set (right).

References

    1. Appelros P., Terént A. (2004). Characteristics of the National Institute of Health Stroke Scale: Results from a population-based stroke cohort at baseline and after one year. Cerebrovasc. Dis. 17 21–27. 10.1159/000073894 - DOI - PubMed
    1. Cohen D., Roffe C., Beavan J. (2016). Post-stroke dysphagia: A review and design considerations for future trials. Int. J. Stroke 11 399–411. - PubMed
    1. Dave B., Srivastava K. (2023). Convolutional neural networks for audio classification: An ensemble approach. Lecture Notes Netw. Syst. 428 253–262. 10.1007/978-981-19-2225-1_23/COVER - DOI
    1. Feigin V., Stark B., Johnson C. (2021). Global, regional, and national burden of stroke and its risk factors, 1990-2019: a systematic analysis for the Global Burden of Disease Study 2019. Lancet Neurol. 20 795–820. 10.1016/S1474-4422(21)00252-0 - DOI - PMC - PubMed
    1. Fritz M., Howell R., Brodsky M., Suiter D., Dhar S., Rameau A., et al. (2021). Moving forward with dysphagia care: Implementing strategies during the COVID-19 pandemic and beyond. Dysphagia 36 161–169. 10.1007/s00455-020-10144-9 - DOI - PMC - PubMed

LinkOut - more resources