Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2021 Apr 16;5(1):78-88.
doi: 10.1159/000515346. eCollection 2021 Jan-Apr.

Voice for Health: The Use of Vocal Biomarkers from Research to Clinical Practice

Affiliations
Review

Voice for Health: The Use of Vocal Biomarkers from Research to Clinical Practice

Guy Fagherazzi et al. Digit Biomark. .

Abstract

Diseases can affect organs such as the heart, lungs, brain, muscles, or vocal folds, which can then alter an individual's voice. Therefore, voice analysis using artificial intelligence opens new opportunities for healthcare. From using vocal biomarkers for diagnosis, risk prediction, and remote monitoring of various clinical outcomes and symptoms, we offer in this review an overview of the various applications of voice for health-related purposes. We discuss the potential of this rapidly evolving environment from a research, patient, and clinical perspective. We also discuss the key challenges to overcome in the near future for a substantial and efficient use of voice in healthcare.

Keywords: Artificial intelligence; COVID-19; Signal decomposition; Smart home; Vocal biomarker; Voice.

PubMed Disclaimer

Conflict of interest statement

The authors have no conflicts of interest to declare.

Figures

Fig. 1
Fig. 1
Pipeline for vocal biomarker identification, from research to practice.
Fig. 2
Fig. 2
Representation of a typical voice signal pre-processing and feature extraction using MFCCs. Representation of a typical voice signal pre-processing and linguistic and acoustic feature extraction. Voice signal represents the sound of the following sentence (e.g., “Luxembourg is a resolutely multilingual environment”). ASR refers to automatic speech recognition. Linguistic annotation includes part-of-speech, dependency and constituency parses, and sense tagging. In this diagram, linguistic annotation is applied using tools like CoreNLP. The number of pauses, speech rate, and noun rate are linguistic features and extracted using the BlaBla package, which is a clinical linguistic feature extraction tool. Acoustic features are extracted using MFCCs. The framing step refers to a signal segmentation into N samples. Windowing is multiplying of the signal sample by a window function like Hamming to minimize discontinuous signals that can cause noise in the subsequent fast Fourier transform (FFT) step. In this diagram, dimension reduction is represented by the principal component analysis (PCA) method, reducing feature space to a one-dimensional vector.
Fig. 3
Fig. 3
Overview of present and future use of vocal biomarkers for health.

References

    1. Grossmann T, Vaish A, Franz J, Schroeder R, Stoneking M, Friederici AD. Emotional voice processing: investigating the role of genetic variation in the serotonin transporter across development. PLoS One. 2013;8:e68377. - PMC - PubMed
    1. VynZ Research Voice assistant market. [cited 15 Feb 2021]. Available from https://www.vynzresearch.com/
    1. Global Voice Assistant Market is Set to Reach USD 5,843.8 million by 2024 In: Globenewswire.com.
    1. Robin J, Harrison JE, Kaufman LD, Rudzicz F, Simpson W, Yancheva M. Evaluation of Speech-Based Digital Biomarkers: review and Recommendations. Digit Biomark. 2020 Oct;4((3)):99–108. - PMC - PubMed
    1. Biomarkers Definitions Working Group. Biomarkers and surrogate endpoints: preferred definitions and conceptual framework. Clin Pharmacol Ther. 2001 Mar;69((3)):89–95. - PubMed