. 2025 Jan 7;8(1):14.

doi: 10.1038/s41746-024-01417-w.

Multimodal deep ensemble classification system with wearable vibration sensor for detecting throat-related events

Yonghun Song^#¹, Inyeol Yun^#², Sandra Giovanoli³, Chris Awai Easthope³, Yoonyoung Chung^{4

5

6}

Affiliations

¹ Department of Electrical Engineering, Pohang University of Science and Technology, Pohang, Korea.
² Future IT Innovation Laboratory, Pohang University of Science and Technology, Pohang, Korea.
³ Data Analytics & Rehabilitation Technology (DART), Lake Lucerne Institute (LLUI) & cereneo Center for Interdisciplinary Research (CEFIR), Vitznau, Switzerland.
⁴ Department of Electrical Engineering, Pohang University of Science and Technology, Pohang, Korea. ychung@postech.ac.kr.
⁵ Department of Semiconductor Engineering, Pohang University of Science and Technology, Pohang, Korea. ychung@postech.ac.kr.
⁶ Center for Semiconductor Technology Convergence, Pohang University of Science and Technology, Pohang, Korea. ychung@postech.ac.kr.

^# Contributed equally.

PMID: 39775108
PMCID: PMC11706958
DOI: 10.1038/s41746-024-01417-w

Multimodal deep ensemble classification system with wearable vibration sensor for detecting throat-related events

Yonghun Song et al. NPJ Digit Med. 2025.

. 2025 Jan 7;8(1):14.

doi: 10.1038/s41746-024-01417-w.

Authors

Yonghun Song^#¹, Inyeol Yun^#², Sandra Giovanoli³, Chris Awai Easthope³, Yoonyoung Chung^{4

5

6}

Affiliations

¹ Department of Electrical Engineering, Pohang University of Science and Technology, Pohang, Korea.
² Future IT Innovation Laboratory, Pohang University of Science and Technology, Pohang, Korea.
³ Data Analytics & Rehabilitation Technology (DART), Lake Lucerne Institute (LLUI) & cereneo Center for Interdisciplinary Research (CEFIR), Vitznau, Switzerland.
⁴ Department of Electrical Engineering, Pohang University of Science and Technology, Pohang, Korea. ychung@postech.ac.kr.
⁵ Department of Semiconductor Engineering, Pohang University of Science and Technology, Pohang, Korea. ychung@postech.ac.kr.
⁶ Center for Semiconductor Technology Convergence, Pohang University of Science and Technology, Pohang, Korea. ychung@postech.ac.kr.

^# Contributed equally.

PMID: 39775108
PMCID: PMC11706958
DOI: 10.1038/s41746-024-01417-w

Abstract

Dysphagia, a swallowing disorder, requires continuous monitoring of throat-related events to obtain comprehensive insights into the patient's pharyngeal and laryngeal functions. However, conventional assessments were performed by medical professionals in clinical settings, limiting persistent monitoring. We demonstrate feasibility of a ubiquitous monitoring system for autonomously detecting throat-related events utilizing a soft skin-attachable throat vibration sensor (STVS). The STVS accurately records throat vibrations without interference from surrounding noise, enabling measurement of subtle sounds such as swallowing. Out of the continuous data stream, we automatically classify events of interest using an ensemble-based deep learning model. The proposed model integrates multiple deep neural networks based on multi-modal acoustic features of throat-related events to enhance robustness and accuracy of classification. The performance of our model outperforms previous studies with a classification accuracy of 95.96%. These results show the potential of wearable solutions for improving dysphagia management and patient outcomes outside of clinical environments.

PubMed Disclaimer

Conflict of interest statement

Competing interests: The authors declare no competing interests.

Figures

**Fig. 1. Overview of soft skin-attachable throat vibration sensor (STVS) system for classifying throat-related events: coughing, speaking, swallowing, and throat clearing.**
a Photo of a subject wearing the STVS above the laryngeal prominence position. b Magnified image of the STVS without polymer encapsulation. A sensing part of the STVS forms a conformal contact with the neck skin. Scale bar: 1 cm. c Image of the STVS with integrated components. d Experimental process for designing and applying an ensemble-based deep learning model to classify throat-related events. The training dataset consists of events obtained from subjects using English. In contrast, the test dataset comprises events from subjects with diverse linguistic backgrounds, such as English, French, German, Spanish, and Korean. e Comparison of the classification accuracy and the number of classifiable events with previous studies that detect major throat-related events using microphone devices.

**Fig. 2. Characteristics of the serpentine interconnect in soft skin-attachable throat vibration sensor.**
a Image of experimental setup for the stretchability test. The serpentine interconnect was in a pristine state (left) and stretched to 100% (right). Scale bar: 1 cm. b Resistance variations in response to the applied strain. The relative resistance was increased by 0.19% after a 100% stretch. c Resistance across two further points connected by the serpentine interconnect. The resistance value did not change even after 5000 cyclic loadings.

**Fig. 3. Experimental protocol and acquired signals from the soft skin-attachable throat vibration sensor (STVS).**
a Schematic of measurement protocol and data preprocessing steps for acquiring throat-related events using the STVS. Subjects repeated four distinct events—coughing, speaking, swallowing, and throat clearing—five times each. All subjects followed the same controlled protocol. The data was segmented based on the peak amplitude and utilized as inputs for network training. b Example waveforms and spectrograms for each event. Signals related to the vibration of the pharynx and larynx were captured around the neck. The STVS was configured with a sampling rate of 6400 Hz and a dynamic range of ±4 g. The spectrogram data was obtained using a short-time Fourier transform with a Hanning window frame width of 40 ms and an overlap of 75%.

**Fig. 4. Visualization of activation regions in the image-based classification network.**
a Mel spectrograms and b its gradient-weighted class activation mapping (Grad-CAM) results from the EfficientNet for various events: coughing, speaking, swallowing, and throat clearing. The Grad-CAM provided a visual representation of crucial pixels in the input image as a heatmap. The heatmap of the speaking highlights the harmonic components, while the swallowing event generates the prominent heatmap over several spike-shaped signals. The coughing and throat-clearing events show significant heatmaps around the 0.1 s.

**Fig. 5. Ensemble-based deep learning model for event classification.**
Our deep learning architecture combines various deep neural networks into an ensemble method. After augmentation, a training dataset was pre-processed into time series (waveform and fundamental frequency) and images (spectrogram and mel spectrogram). WaveNet was trained on time-series data, capturing sequential patterns, while ResNet50 and EfficientNet were trained on image data, focusing on spatial features. The networks were trained using a fivefold cross-validation method, and each prediction from the validation dataset served as the input data for the ensemble model. The ensemble-based deep learning model, trained with various features, accurately classifies throat-related events.

**Fig. 6. Performance metric of our ensemble-based deep learning model.**
a Normalized confusion matrix of the ensemble model on the test dataset. All event classification accuracies exceed 90%. The evaluation of b accuracy and c macro-averaged receiver operating characteristic (ROC) curves across single neural networks and proposed ensemble model on the test dataset. The abbreviations denoted distinct preprocessing techniques: ‘w’ for waveform, ‘f₀’ for fundamental frequency, ‘s’ for spectrogram, and ‘m’ for mel spectrogram. The ensemble model achieves the highest accuracy of 95.96% and an area under the ROC curve (AUC) value of 0.99 for classifying the four events.

**Fig. 7. Throat vibration monitoring in daily life with the soft skin-attachable throat vibration sensor (STVS).**
a Experimental setup of vibration signal measurement during walking on a running track with a 4 km/h speed. b Vibration signal measured by the STVS. The outstanding stretchability of the STVS mitigated motion artifacts caused by walking. The throat-related events, such as coughing, speaking, swallowing, and throat clearing, were precisely detected through the proposed classification model. c An experimental setup was used to measure throat vibration signals while a subject was conversing with another person. d Vocal signal measured by a conventional acoustic microphone. The sounds caused by throat-related events in the subject interfered with conversations of another subject and ambient noise. e Throat vibration signal measured by the STVS. The measured signal is clear and accurately classified.

See this image and copyright information in PMC

References

1. Clavé, P. & Shaker, R. Dysphagia: current reality and scope of the problem. Nat. Rev. Gastroenterol. Hepatol.12, 259–270 (2015). - PubMed
1. Sungsinchai, S., Niamnuy, C., Wattanapan, P., Charoenchaitrakool, M. & Devahastin, S. Texture modification technologies and their opportunities for the production of dysphagia foods: a review. Compr. Rev. Food Sci. Food Saf.18, 1898–1912 (2019). - PubMed
1. Labeit, B. et al. The assessment of dysphagia after stroke: state of the art and future directions. Lancet Neurol.22, 858–870 (2023). - PubMed
1. Rommel, N. & Hamdy, S. Oropharyngeal dysphagia: manifestations and diagnosis. Nat. Rev. Gastroenterol. Hepatol.13, 49–59 (2016). - PubMed
1. Murry, T., Carrau, R. L. & Chan, K. Clinical Management of Swallowing Disorders (Plural Publishing, 2020).

LinkOut - more resources

Full Text Sources
- Nature Publishing Group
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Multimodal deep ensemble classification system with wearable vibration sensor for detecting throat-related events

Affiliations

Multimodal deep ensemble classification system with wearable vibration sensor for detecting throat-related events

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

References

LinkOut - more resources

Full Text Sources

Abstract

Conflict of interest statement

Figures

Similar articles

References

Related information

LinkOut - more resources

Full Text Sources