Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 May 25;22(1):116.
doi: 10.1186/s12984-025-01645-5.

A data-centric and interpretable EEG framework for depression severity grading using SHAP-based insights

Affiliations

A data-centric and interpretable EEG framework for depression severity grading using SHAP-based insights

Anruo Shen et al. J Neuroeng Rehabil. .

Abstract

Background: Major Depressive Disorder is a leading cause of disability worldwide. An accurate assessment of depression severity is critical for diagnosis, treatment planning, and monitoring, yet current clinical tools are largely subjective, relying on self-report and clinician judgment via traditional assessment scales. EEG has emerged as a promising, non-invasive modality for capturing neural correlates of depression. However, most EEG-based machine learning diagnostic studies focus on boosting classification accuracy through complex algorithms and small, homogenous datasets. These black-box approaches often yield results that are difficult to interpret and poorly generalizable, making clinical translation impractical. Therefore there remains a critical need for models that are not only accurate but also transparent, robust, and grounded in the physiological properties of the data itself.

Methods: We proposed a data-centric, interpretable framework for EEG-based depression severity grading. A hybrid feature selection method was used, combining p-value and SHapley Additive exPlanations (SHAP) methods to select features that are both independently significant and jointly informative. The system was trained and evaluated on a large-scale, multi-site resting-state EEG dataset, using random forest for both classification and regression tasks. The SHAP method, an explainable artificial intelligence technique, is also used post-hoc to infer the key electrophysiological features and key brain regions associated with MDD mechanism to further increase interpretability.

Results: The proposed system achieved 74.5% (95% CI [70.97%, 78.80%], p < 0.001) ten-fold classification accuracy and a correlation coefficient of 0.56 (95% CI [0.407, 0.683], p < 0.001) for severity estimation. SHAP analysis identified consistent, clinically meaningful EEG features, particularly in the left parietal-occipital lobe. Through in-depth SHAP value analysis, we identified critical disease-related brain areas in the left occipital and parietal lobes, along with key features including relative beta power in the left parietal lobe, time-domain features at the parietal midline, 1/f intercept, left occipital relative beta power, and global brain alpha energy.

Conclusion: This study proposes a data-centric, interpretable depression grading system built on large-scale, multi-center EEG data, using simple models and hybrid feature selection to emphasize explainability, generalizability and data fidelity. By shifting the focus from algorithmic complexity to data transparency and feature-level insight, the model offers a practical and trustworthy path toward real-world mental health assessment.

Keywords: Depression; EEG; Feature selection; Interpretable; Neurophysiological biomarkers; Resting state; SHAP.

PubMed Disclaimer

Conflict of interest statement

Declarations. Ethics approval and consent to participate: Not applicable. The human data used in this study was acquired from a public dataset, where ethical consent is believed to be a part of the original study. Consent for publication: Not applicable. Same reason as above. Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Overview of the study pipeline. Resting-state EEG signals collected across multiple centers and devices were analyzed to investigate depression. To facilitate clinical translation, interpretability was prioritized by integrating SHAP analysis into three key components: feature selection, severity grading model development, and inference of pathological brain regions
Fig. 2
Fig. 2
EEG microstate topographies
Fig. 3
Fig. 3
Comparison of classification accuracies across different classifiers
Fig. 4
Fig. 4
Model performance and interpretability analysis. a Macro-averaged ROC curve of the classification model. b Regression of HAMD17 scores using the selected 10 features. c SHAP analysis of classification features: left—mean absolute SHAP values indicating average impact on model output; right—individual SHAP values showing directional influence on predictions
Fig. 5
Fig. 5
Feature selection visualization and its impact on model performance. a Features mapped in the p-value vs. SHAP-value coordinate space. The five most statistically significant features (lowest p-values) improved severity grading accuracy by 15% above chance, while the five most influential SHAP-selected features improved it by 10%. b MDD severity classification accuracy as a function of the number of added features, starting from the top five p-value features. Performance is compared across feature selection strategies: SHAP-based, p-value-based, and random selection. The combined p-value + SHAP approach yielded the highest accuracy
Fig. 6
Fig. 6
Cross-device generalizability of the classification model. Classification accuracies are shown for each dataset under two conditions: models trained and tested within the same dataset (beige bars), and models trained on other datasets and tested on the target dataset (blue bars). The red dashed line indicates overall accuracy, while the orange dashed line marks chance level. Results demonstrate consistent above-chance performance across datasets, with reduced—but still reliable—generalization when applied to unseen sites
Fig. 7
Fig. 7
Experimental workflow and performance across electrode configurations. a Overview of the data experiment pipeline. MDD severity classification accuracy was evaluated under four conditions: (1) full-density electrodes across the whole brain, (2) full-density electrodes on half the brain (upper, lower, left, or right), (3) full-density electrodes in the overlapping quarter region from the half-brain conditions, and (4) sparse-density electrodes across the whole brain. b Ten-fold cross-validation accuracy for each configuration. The full-density whole-brain setup achieved significantly higher accuracy than other conditions. Among hemispheric subsets, the lower and left brain regions yielded relatively better performance. No significant difference was observed between local (quarter-brain) and global (sparse whole-brain) setups
Fig. 8
Fig. 8
Workflow of spatially condensed power feature calculation

Similar articles

References

    1. Institute of Health Metrics and Evaluation. Global Health Data Exchange (GHDx). https://vizhub.healthdata.org/gbd-results/. Accessed 4 Mar 2023.
    1. Beck AT, Ward CH, Mendelson M, Mock J, Erbaugh J. An inventory for measuring depression. Arch Gen Psychiatry. 1961;4(6):561–71. - PubMed
    1. Otte C, Gold SM, Penninx BW, Pariante CM, Etkin A, Fava M, et al. Major depressive disorder. Nat Rev Dis Primers. 2016;2:16065. - PubMed
    1. Zimmerman M, Martinez JH, Young D, Chelminski I, Dalrymple K. Severity classification on the Hamilton Depression Rating Scale. J Affect Disord. 2013;150(2):384–8. - PubMed
    1. Sun J, Xie Z, Sun Y, Shen A, Li R, Yuan X, et al. Precise prediction of cerebrospinal fluid amyloid beta protein for early Alzheimer’s disease detection using multimodal data. MedComm. 2024;5(5):532. - PMC - PubMed

LinkOut - more resources