Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Apr 5;15(1):11687.
doi: 10.1038/s41598-025-96575-6.

Explainable artificial intelligence to diagnose early Parkinson's disease via voice analysis

Affiliations

Explainable artificial intelligence to diagnose early Parkinson's disease via voice analysis

Matthew Shen et al. Sci Rep. .

Abstract

Parkinson's disease (PD) is a neurodegenerative disorder affecting motor control, leading to symptoms such as tremors and stiffness. Early diagnosis is essential for effective treatment, but traditional methods are often time-consuming and expensive. This study leverages Artificial Intelligence (AI) and Machine Learning (ML) techniques, using voice analysis to detect early signs of PD. We applied a hybrid model combining Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), Multiple Kernel Learning (MKL), and Multilayer Perceptron (MLP) to a dataset of 81 voice recordings. Acoustic features such as Mel-Frequency Cepstral Coefficients (MFCCs), jitter, and shimmer were analyzed. The model achieved 91.11% accuracy, 92.50% recall, 89.84% precision, 91.13% F1 score, and an area-under-the-curve (AUC) of 0.9125. SHapley Additive exPlanations (SHAP) provided data explainability, identifying key features driving the PD diagnosis, thus enhancing AI interpretability and trustability. Furthermore, a probability-based scoring system was developed to enable PD patients and clinicians to track disease progression. This AI-driven approach offers a non-invasive, cost-effective, and rapid tool for early PD detection, facilitating personalized treatment through vocal biomarkers.

Keywords: Deep learning; Explainable AI; Parkinson’s disease; Vocal biomarkers.

PubMed Disclaimer

Conflict of interest statement

Declarations. Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
(a) Color-coded accuracy line graph of tested models across 5 cross-validation folds. (b) Color-coded cross-entropy loss line graph of tested models across 5 cross-validation folds.
Fig. 2
Fig. 2
Color-coded bar graph of average performance metrics of tested models across 5 cross-validation folds.
Fig. 3
Fig. 3
Color-coded line graph of ROC curves of tested models. Dashed blank line represents the ROC of a random chance diagnosis.
Fig. 4
Fig. 4
The SHAP feature importance plot displays the impact of various acoustic features on the model’s output. The x-axis represents SHAP values, where negative values decrease the likelihood of PD, and positive values increase it. The y-axis lists the most influential features on the model’s prediction. Each dot represents an individual data point. The position of the dot indicates the contribution of that feature to the prediction. The color gradient reflects feature values, where red signifies high values, blue indicates low values, and purple represents intermediate values.
Fig. 5
Fig. 5
The box and whisker plots show the minimum, first quartile, median, third quartile, maximum, and outlier points for mean pitch, local jitter, local shimmer, and mean HNR.
Fig. 6
Fig. 6
(a) The spectrogram of an HC subject exhibits clear, stable harmonic structures with consistent frequency bands. This visualization demonstrates stronger and more evenly spaced harmonics than (b), indicating better vocal stability. The bright regions on the decibal scale signify higher signal intensity, distinguishing HC voices from PD-affected speech. (b) The spectrogram of a PD patient’s voice shows irregular frequency patterns and decreased signal stability. The disrupted harmonic structures and weaker intensity bands reflect vocal instability, which is characteristic of PD-related dysphonia. The color scale represents amplitude in decibels, with lower-intensity regions indicating reduced vocal control.
Fig. 7
Fig. 7
The champion model’s sequential pipeline is shown with arrows indicating a series of steps that utilize the unique advantages of each neural network, eventually forming a diagnosis.

Similar articles

Cited by

References

    1. Little, M. A. et al. Suitability of dysphonia measurements for telemonitoring of Parkinson’s disease. IEEE Trans. Biomed. Eng.56, 1015–1022. 10.1109/TBME.2008.2005954 (2009). - PMC - PubMed
    1. Tsanas, A. et al. Accurate telemonitoring of Parkinson’s disease progression by noninvasive speech tests. IEEE Trans. Biomed. Eng.57, 884–893. 10.1109/TBME.2009.2036000 (2010). - PubMed
    1. Alhanai, T., Au, R. & Glass, J. Detecting depression with audio/text sequence modeling of interviews. Interspeech 1716–1720. 10.21437/Interspeech.2018-2522 (2018).
    1. Alissa, M. et al. Parkinson’s disease diagnosis using convolutional neural networks and figure-copying tasks. Neural Comput. Appl.34, 1433–1453. 10.1007/s00521-021-06469-7 (2022).
    1. Iqbal, S. et al. On the analyses of medical images using traditional machine learning techniques and convolutional neural networks. Arch. Comput. Methods Eng.30, 3173–3233. 10.1007/s11831-023-09899-9 (2023). - PMC - PubMed

LinkOut - more resources