. 2025 May 25;22(1):116.

doi: 10.1186/s12984-025-01645-5.

A data-centric and interpretable EEG framework for depression severity grading using SHAP-based insights

Anruo Shen^{1

2}, Jingnan Sun¹, Xiaogang Chen³, Xiaorong Gao⁴

Affiliations

¹ Department of Biomedical Engineering, Tsinghua University, Beijing, 100084, China.
² Department of Biomedical Engineering, Johns Hopkins University School of Medicine, Baltimore, MD, 21218, USA.
³ Institute of Biomedical Engineering, Chinese Academy of Medical Sciences and Peking Union Medical College, Tianjin, 300192, China.
⁴ Department of Biomedical Engineering, Tsinghua University, Beijing, 100084, China. gxr-dea@tsinghua.edu.cn.

PMID: 40414904
PMCID: PMC12103758
DOI: 10.1186/s12984-025-01645-5

A data-centric and interpretable EEG framework for depression severity grading using SHAP-based insights

Anruo Shen et al. J Neuroeng Rehabil. 2025.

. 2025 May 25;22(1):116.

doi: 10.1186/s12984-025-01645-5.

Authors

Anruo Shen^{1

2}, Jingnan Sun¹, Xiaogang Chen³, Xiaorong Gao⁴

Affiliations

¹ Department of Biomedical Engineering, Tsinghua University, Beijing, 100084, China.
² Department of Biomedical Engineering, Johns Hopkins University School of Medicine, Baltimore, MD, 21218, USA.
³ Institute of Biomedical Engineering, Chinese Academy of Medical Sciences and Peking Union Medical College, Tianjin, 300192, China.
⁴ Department of Biomedical Engineering, Tsinghua University, Beijing, 100084, China. gxr-dea@tsinghua.edu.cn.

PMID: 40414904
PMCID: PMC12103758
DOI: 10.1186/s12984-025-01645-5

Abstract

Background: Major Depressive Disorder is a leading cause of disability worldwide. An accurate assessment of depression severity is critical for diagnosis, treatment planning, and monitoring, yet current clinical tools are largely subjective, relying on self-report and clinician judgment via traditional assessment scales. EEG has emerged as a promising, non-invasive modality for capturing neural correlates of depression. However, most EEG-based machine learning diagnostic studies focus on boosting classification accuracy through complex algorithms and small, homogenous datasets. These black-box approaches often yield results that are difficult to interpret and poorly generalizable, making clinical translation impractical. Therefore there remains a critical need for models that are not only accurate but also transparent, robust, and grounded in the physiological properties of the data itself.

Methods: We proposed a data-centric, interpretable framework for EEG-based depression severity grading. A hybrid feature selection method was used, combining p-value and SHapley Additive exPlanations (SHAP) methods to select features that are both independently significant and jointly informative. The system was trained and evaluated on a large-scale, multi-site resting-state EEG dataset, using random forest for both classification and regression tasks. The SHAP method, an explainable artificial intelligence technique, is also used post-hoc to infer the key electrophysiological features and key brain regions associated with MDD mechanism to further increase interpretability.

Results: The proposed system achieved 74.5% (95% CI [70.97%, 78.80%], p < 0.001) ten-fold classification accuracy and a correlation coefficient of 0.56 (95% CI [0.407, 0.683], p < 0.001) for severity estimation. SHAP analysis identified consistent, clinically meaningful EEG features, particularly in the left parietal-occipital lobe. Through in-depth SHAP value analysis, we identified critical disease-related brain areas in the left occipital and parietal lobes, along with key features including relative beta power in the left parietal lobe, time-domain features at the parietal midline, 1/f intercept, left occipital relative beta power, and global brain alpha energy.

Conclusion: This study proposes a data-centric, interpretable depression grading system built on large-scale, multi-center EEG data, using simple models and hybrid feature selection to emphasize explainability, generalizability and data fidelity. By shifting the focus from algorithmic complexity to data transparency and feature-level insight, the model offers a practical and trustworthy path toward real-world mental health assessment.

Keywords: Depression; EEG; Feature selection; Interpretable; Neurophysiological biomarkers; Resting state; SHAP.

PubMed Disclaimer

Conflict of interest statement

Declarations. Ethics approval and consent to participate: Not applicable. The human data used in this study was acquired from a public dataset, where ethical consent is believed to be a part of the original study. Consent for publication: Not applicable. Same reason as above. Competing interests: The authors declare no competing interests.

Figures

**Fig. 1**
Overview of the study pipeline. Resting-state EEG signals collected across multiple centers and devices were analyzed to investigate depression. To facilitate clinical translation, interpretability was prioritized by integrating SHAP analysis into three key components: feature selection, severity grading model development, and inference of pathological brain regions

**Fig. 3**
Comparison of classification accuracies across different classifiers

**Fig. 4**
Model performance and interpretability analysis. a Macro-averaged ROC curve of the classification model. b Regression of HAMD17 scores using the selected 10 features. c SHAP analysis of classification features: left—mean absolute SHAP values indicating average impact on model output; right—individual SHAP values showing directional influence on predictions

**Fig. 5**
Feature selection visualization and its impact on model performance. a Features mapped in the p-value vs. SHAP-value coordinate space. The five most statistically significant features (lowest p-values) improved severity grading accuracy by 15% above chance, while the five most influential SHAP-selected features improved it by 10%. b MDD severity classification accuracy as a function of the number of added features, starting from the top five p-value features. Performance is compared across feature selection strategies: SHAP-based, p-value-based, and random selection. The combined p-value + SHAP approach yielded the highest accuracy

**Fig. 6**
Cross-device generalizability of the classification model. Classification accuracies are shown for each dataset under two conditions: models trained and tested within the same dataset (beige bars), and models trained on other datasets and tested on the target dataset (blue bars). The red dashed line indicates overall accuracy, while the orange dashed line marks chance level. Results demonstrate consistent above-chance performance across datasets, with reduced—but still reliable—generalization when applied to unseen sites

**Fig. 7**
Experimental workflow and performance across electrode configurations. a Overview of the data experiment pipeline. MDD severity classification accuracy was evaluated under four conditions: (1) full-density electrodes across the whole brain, (2) full-density electrodes on half the brain (upper, lower, left, or right), (3) full-density electrodes in the overlapping quarter region from the half-brain conditions, and (4) sparse-density electrodes across the whole brain. b Ten-fold cross-validation accuracy for each configuration. The full-density whole-brain setup achieved significantly higher accuracy than other conditions. Among hemispheric subsets, the lower and left brain regions yielded relatively better performance. No significant difference was observed between local (quarter-brain) and global (sparse whole-brain) setups

**Fig. 8**
Workflow of spatially condensed power feature calculation

See this image and copyright information in PMC

References

1. Institute of Health Metrics and Evaluation. Global Health Data Exchange (GHDx). https://vizhub.healthdata.org/gbd-results/. Accessed 4 Mar 2023.
1. Beck AT, Ward CH, Mendelson M, Mock J, Erbaugh J. An inventory for measuring depression. Arch Gen Psychiatry. 1961;4(6):561–71. - PubMed
1. Otte C, Gold SM, Penninx BW, Pariante CM, Etkin A, Fava M, et al. Major depressive disorder. Nat Rev Dis Primers. 2016;2:16065. - PubMed
1. Zimmerman M, Martinez JH, Young D, Chelminski I, Dalrymple K. Severity classification on the Hamilton Depression Rating Scale. J Affect Disord. 2013;150(2):384–8. - PubMed
1. Sun J, Xie Z, Sun Y, Shen A, Li R, Yuan X, et al. Precise prediction of cerebrospinal fluid amyloid beta protein for early Alzheimer’s disease detection using multimodal data. MedComm. 2024;5(5):532. - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
- BioMed Central
- PubMed Central
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A data-centric and interpretable EEG framework for depression severity grading using SHAP-based insights

Affiliations

A data-centric and interpretable EEG framework for depression severity grading using SHAP-based insights

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Medical

Abstract

Conflict of interest statement

Figures

Similar articles

References

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Medical