Deep learning and machine learning-based voice analysis for the detection of COVID-19: A proposal and comparison of architectures

Giovanni Costantini¹, Valerio Cesarini Dr¹, Carlo Robotti^{2

3}, Marco Benazzo^{2

3}, Filomena Pietrantonio⁴, Stefano Di Girolamo⁵, Antonio Pisani^{6

7}, Pietro Canzi², Simone Mauramati², Giulia Bertino², Irene Cassaniti⁸, Fausto Baldanti^{3

8}, Giovanni Saggio¹

Affiliations

¹ Department of Electronic Engineering, University of Rome Tor Vergata, Rome, Italy.
² Department of Otolaryngology - Head and Neck Surgery, Fondazione IRCCS Policlinico San Matteo, Pavia, Italy.
³ Department of Clinical, Surgical, Diagnostic and Pediatric Sciences, University of Pavia, Pavia, Italy.
⁴ Internal Medicine Unit, Ospedale dei Castelli ASL Roma 6, Ariccia, Italy.
⁵ Department of Otorhinolaryngology, University of Rome Tor Vergata, Rome, Italy.
⁶ Department of Brain and Behavioral Sciences, University of Pavia, Pavia, Italy.
⁷ IRCCS Mondino Foundation, Pavia, Italy.
⁸ Molecular Virology Unit, Microbiology and Virology Department, Fondazione IRCCS Policlinico San Matteo, Pavia, Italy.

PMID: 35915642
PMCID: PMC9328841
DOI: 10.1016/j.knosys.2022.109539

Deep learning and machine learning-based voice analysis for the detection of COVID-19: A proposal and comparison of architectures

Giovanni Costantini et al. Knowl Based Syst. 2022.

. 2022 Oct 11:253:109539.

doi: 10.1016/j.knosys.2022.109539. Epub 2022 Jul 28.

Authors

Affiliations

¹ Department of Electronic Engineering, University of Rome Tor Vergata, Rome, Italy.
² Department of Otolaryngology - Head and Neck Surgery, Fondazione IRCCS Policlinico San Matteo, Pavia, Italy.
³ Department of Clinical, Surgical, Diagnostic and Pediatric Sciences, University of Pavia, Pavia, Italy.
⁴ Internal Medicine Unit, Ospedale dei Castelli ASL Roma 6, Ariccia, Italy.
⁵ Department of Otorhinolaryngology, University of Rome Tor Vergata, Rome, Italy.
⁶ Department of Brain and Behavioral Sciences, University of Pavia, Pavia, Italy.
⁷ IRCCS Mondino Foundation, Pavia, Italy.
⁸ Molecular Virology Unit, Microbiology and Virology Department, Fondazione IRCCS Policlinico San Matteo, Pavia, Italy.

PMID: 35915642
PMCID: PMC9328841
DOI: 10.1016/j.knosys.2022.109539

Abstract

Alongside the currently used nasal swab testing, the COVID-19 pandemic situation would gain noticeable advantages from low-cost tests that are available at any-time, anywhere, at a large-scale, and with real time answers. A novel approach for COVID-19 assessment is adopted here, discriminating negative subjects versus positive or recovered subjects. The scope is to identify potential discriminating features, highlight mid and short-term effects of COVID on the voice and compare two custom algorithms. A pool of 310 subjects took part in the study; recordings were collected in a low-noise, controlled setting employing three different vocal tasks. Binary classifications followed, using two different custom algorithms. The first was based on the coupling of boosting and bagging, with an AdaBoost classifier using Random Forest learners. A feature selection process was employed for the training, identifying a subset of features acting as clinically relevant biomarkers. The other approach was centered on two custom CNN architectures applied to mel-Spectrograms, with a custom knowledge-based data augmentation. Performances, evaluated on an independent test set, were comparable: Adaboost and CNN differentiated COVID-19 positive from negative with accuracies of 100% and 95% respectively, and recovered from negative individuals with accuracies of 86.1% and 75% respectively. This study highlights the possibility to identify COVID-19 positive subjects, foreseeing a tool for on-site screening, while also considering recovered subjects and the effects of COVID-19 on the voice. The two proposed novel architectures allow for the identification of biomarkers and demonstrate the ongoing relevance of traditional ML versus deep learning in speech analysis.

Keywords: 1E, Vowel /e/ vocal task; 2S, Sentence vocal task; 3C, Cough vocal task; Adaboost; CFS, Correlation-based Feature Selection; CNN, Convolutional Neural Network; COVID-19; Classification; DL, Deep Learning; Deep learning; H, Healthy control subjects; MFCC, Mel-frequency Cepstral Coefficients; ML, Machine Learning; NS, Nasal Swab; P, Positive subjects; PCR, Polymerase Chain Reaction-based molecular swabs; PvsH, Positive versus Healthy subjects comparison; R, Recovered subjects; RF, Random Forest; ROC, Receiver-Operating Curve; ReLu, Rectified Linear Unit; RvsH, Recovered versus Healthy subjects comparison; SVM, Support Vector Machine; Speech processing.

PubMed Disclaimer

Conflict of interest statement

The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: Authors Giovanni Costantini, Giovanni Saggio, and Antonio Pisani are advisory members of VoiceWise S.r.l., spin-off company of University of Rome Tor Vergata (Rome, Italy) developing voice analysis solutions for diagnostic purposes; Valerio Cesarini cooperates with VoiceWise and is employed by CloudWise S.r.l., a company developing cloud data storage and software solutions.

Figures

**Fig. 1**
Flowchart describing the complete pipeline of the Machine Learning approach based on the Adaboost classifier (exemplified for the PvsH comparison).

**Fig. 2**
Visualization of the data augmentation techniques applied to mel-frequency spectrograms. Top left: original sample spectrogram, top right: pink noise addition, bottom left: time masking, bottom right: frequency masking. .

**Fig. 3**
CNN1 architecture. A “Conv Block” is described in the higher box and is comprised of a convolutional layer followed by a batch normalization layer and a ReLu (Rectified Linear Unit) activation function. The number after “Conv Block” indicates the number of parallel convolutional filters/neurons in the layer. Max Pool: max pooling layer; FC: Fully connected layer: the number in the round brackets indicates the number of neurons.

**Fig. 4**
CNN2 architecture (for the sole 3C — Cough vocal task). A “Conv Block” is described in the higher box and is comprised of a convolutional layer followed by a batch normalization layer and a ReLu (Rectified Linear Unit) activation function. The number after “Conv Block” indicates the number of parallel convolutional filters/neurons in the layer. Max Pool: max pooling layer; FC: Fully connected layer: the number in the round brackets indicates the number of neurons. .

**Fig. 5**
ROC curves. Above: ROC curve for the PvsH (Positive versus Healthy) comparison. Below: ROC curve for the RvsH (Recovered versus Healthy) comparison. Red line refers to the 1E — vowel/e/ vocal task sub-classifier; blue line refers to the 2S — sentence vocal task sub-classifier; green line refers to the 3C — cough vocal task sub-classifier. Axes span from 0 to 1. AUC (Area Under the Curve) values are reported in the manuscript.

**Fig. 6**
Radar plot for the PvsH-3C sub-classifier. PvsH: Positive versus Healthy; 3C: Cough vocal task. Radar plot was built on the top 20 features (as ranked by the linear wrapped SVM ranker), averaged over all the subjects, and normalized by the H class. Blue unit circle (colored area) represents the H class, red curve represents the P class. .

See this image and copyright information in PMC

Cited by

Severity Classification Using Dynamic Time Warping-Based Voice Biomarkers for Patients With COVID-19: Feasibility Cross-Sectional Study.
Watase T, Omiya Y, Tokuno S. Watase T, et al. JMIR Biomed Eng. 2023 Nov 6;8:e50924. doi: 10.2196/50924. eCollection 2023. JMIR Biomed Eng. 2023. PMID: 37982072 Free PMC article.
Artificial Intelligence-Based Voice Assessment of Patients with Parkinson's Disease Off and On Treatment: Machine vs. Deep-Learning Comparison.
Costantini G, Cesarini V, Di Leo P, Amato F, Suppa A, Asci F, Pisani A, Calculli A, Saggio G. Costantini G, et al. Sensors (Basel). 2023 Feb 18;23(4):2293. doi: 10.3390/s23042293. Sensors (Basel). 2023. PMID: 36850893 Free PMC article.
Acoustic analysis in stuttering: a machine-learning study.
Asci F, Marsili L, Suppa A, Saggio G, Michetti E, Di Leo P, Patera M, Longo L, Ruoppolo G, Del Gado F, Tomaiuoli D, Costantini G. Asci F, et al. Front Neurol. 2023 Jun 30;14:1169707. doi: 10.3389/fneur.2023.1169707. eCollection 2023. Front Neurol. 2023. PMID: 37456655 Free PMC article.
Applications to augment patient care for Internal Medicine specialists: a position paper from the EFIM working group on telemedicine, innovative technologies & digital health.
Pietrantonio F, Florczak M, Kuhn S, Kärberg K, Leung T, Said Criado I, Sikorski S, Ruggeri M, Signorini A, Rosiello F, Drago C, Vinci A, Barreto V, Montano N, Dicker D, Gomez Huelgas R. Pietrantonio F, et al. Front Public Health. 2024 Jun 28;12:1370555. doi: 10.3389/fpubh.2024.1370555. eCollection 2024. Front Public Health. 2024. PMID: 39005984 Free PMC article. Review.
Detection of disease on nasal breath sound by new lightweight architecture: Using COVID-19 as an example.
She J, Shi L, Li P, Dong Z, Li R, Li S, Gu L, Tong Z, Yang Z, Ji Y, Feng L, Chen J. She J, et al. Digit Health. 2025 May 28;11:20552076251339284. doi: 10.1177/20552076251339284. eCollection 2025 Jan-Dec. Digit Health. 2025. PMID: 40443578 Free PMC article.

See all "Cited by" articles

References

1. Suppa A., Asci F., Saggio G., Di Leo P., Zarezadeh Z., Ferrazzano G., Ruoppolo G., Berardelli A., Costantini G. Voice analysis with machine learning: one step closer to an objective diagnosis of essential tremor. Mov. Disorders: Off. J. Mov. Disorder Soc. 2021;36(6):1401–1410. doi: 10.1002/mds.28508. - DOI - PubMed
1. J.P. Teixeira, J. Fernandes, F. Teixeira, P.O. Fernandes, Acoustic analysis of chronic laryngitis-statistical analysis of sustained speech parameters, in: 11th International Joint Conference on Biomedical Engineering Systems and Technologies, 2018, pp. 168–175.
1. S.C. Costa, B.G.A. Neto, J.M. Fechine, M. Muppa, Short-Term Cepstral Analysis Applied to Vocal Fold Edema Detection, in: BIOSIGNALS (2), 2008, pp. 110–115.
1. Petrovic-Lazic M., Jovanovic N., Kulic M., Babac S., Jurisic V. Acoustic and perceptual characteristics of the voice in patients with vocal polyps after surgery and voice therapy. J. Voice. 2015;29(2):241–246. - PubMed
1. Alves M., Krüger E., Pillay B., Van Lierde K., Van der Linde J. The effect of hydration on voice quality in adults: A systematic review. J. Voice. 2019;33(1):125–e13. - PubMed

LinkOut - more resources

Full Text Sources
- Europe PubMed Central
- PubMed Central
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Deep learning and machine learning-based voice analysis for the detection of COVID-19: A proposal and comparison of architectures

Affiliations

Deep learning and machine learning-based voice analysis for the detection of COVID-19: A proposal and comparison of architectures

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

LinkOut - more resources

Full Text Sources

Miscellaneous