. 2025 Feb 17;25(1):83.

doi: 10.1186/s12911-025-02903-1.

Prediction of depressive disorder using machine learning approaches: findings from the NHANES

Thien Vu^{1

2}, Research Dawadi³, Masaki Yamamoto³, Jie Ting Tay³, Naoki Watanabe³, Yuki Kuriya³, Ai Oya³, Phap Ngoc Hoang Tran³, Michihiro Araki^{4

5

6

7}

Affiliations

¹ Artificial Intelligence Center for Health and Biomedical Research, National Institutes of Biomedical Innovation, Health and Nutrition, 3-17 Senrioka-shinmachi, Osaka, Settsu, 566-0002, Japan. thienvuyd01@gmail.com.
² Department of Cardiac Surgery, Cardiovascular Center, Cho Ray Hospital, Ho Chi Minh, Vietnam. thienvuyd01@gmail.com.
³ Artificial Intelligence Center for Health and Biomedical Research, National Institutes of Biomedical Innovation, Health and Nutrition, 3-17 Senrioka-shinmachi, Osaka, Settsu, 566-0002, Japan.
⁴ Artificial Intelligence Center for Health and Biomedical Research, National Institutes of Biomedical Innovation, Health and Nutrition, 3-17 Senrioka-shinmachi, Osaka, Settsu, 566-0002, Japan. araki@nibiohn.go.jp.
⁵ Graduate School of Medicine, Kobe University, 54 Shogoin-Kawahara-cho, Sakyo-ku, Kyoto, 606-8507, Japan. araki@nibiohn.go.jp.
⁶ Graduate School of Science, Technology and Innovation, Kobe University, 1-1 Rokkodai, Nada-ku, Kobe, Hyogo, 657-8501, Japan. araki@nibiohn.go.jp.
⁷ Department of Preventive Cardiology, National Cerebral and Cardiovascular Center, Suita, Osaka, 564-8565, Japan. araki@nibiohn.go.jp.

PMID: 39962516
PMCID: PMC11834192
DOI: 10.1186/s12911-025-02903-1

Prediction of depressive disorder using machine learning approaches: findings from the NHANES

Thien Vu et al. BMC Med Inform Decis Mak. 2025.

. 2025 Feb 17;25(1):83.

doi: 10.1186/s12911-025-02903-1.

Authors

Thien Vu^{1

2}, Research Dawadi³, Masaki Yamamoto³, Jie Ting Tay³, Naoki Watanabe³, Yuki Kuriya³, Ai Oya³, Phap Ngoc Hoang Tran³, Michihiro Araki^{4

5

6

7}

Affiliations

¹ Artificial Intelligence Center for Health and Biomedical Research, National Institutes of Biomedical Innovation, Health and Nutrition, 3-17 Senrioka-shinmachi, Osaka, Settsu, 566-0002, Japan. thienvuyd01@gmail.com.
² Department of Cardiac Surgery, Cardiovascular Center, Cho Ray Hospital, Ho Chi Minh, Vietnam. thienvuyd01@gmail.com.
³ Artificial Intelligence Center for Health and Biomedical Research, National Institutes of Biomedical Innovation, Health and Nutrition, 3-17 Senrioka-shinmachi, Osaka, Settsu, 566-0002, Japan.
⁴ Artificial Intelligence Center for Health and Biomedical Research, National Institutes of Biomedical Innovation, Health and Nutrition, 3-17 Senrioka-shinmachi, Osaka, Settsu, 566-0002, Japan. araki@nibiohn.go.jp.
⁵ Graduate School of Medicine, Kobe University, 54 Shogoin-Kawahara-cho, Sakyo-ku, Kyoto, 606-8507, Japan. araki@nibiohn.go.jp.
⁶ Graduate School of Science, Technology and Innovation, Kobe University, 1-1 Rokkodai, Nada-ku, Kobe, Hyogo, 657-8501, Japan. araki@nibiohn.go.jp.
⁷ Department of Preventive Cardiology, National Cerebral and Cardiovascular Center, Suita, Osaka, 564-8565, Japan. araki@nibiohn.go.jp.

PMID: 39962516
PMCID: PMC11834192
DOI: 10.1186/s12911-025-02903-1

Abstract

Background: Depressive disorder, particularly major depressive disorder (MDD), significantly impact individuals and society. Traditional analysis methods often suffer from subjectivity and may not capture complex, non-linear relationships between risk factors. Machine learning (ML) offers a data-driven approach to predict and diagnose depression more accurately by analyzing large and complex datasets.

Methods: This study utilized data from the National Health and Nutrition Examination Survey (NHANES) 2013-2014 to predict depression using six supervised ML models: Logistic Regression, Random Forest, Naive Bayes, Support Vector Machine (SVM), Extreme Gradient Boost (XGBoost), and Light Gradient Boosting Machine (LightGBM). Depression was assessed using the Patient Health Questionnaire (PHQ-9), with a score of 10 or higher indicating moderate to severe depression. The dataset was split into training and testing sets (80% and 20%, respectively), and model performance was evaluated using accuracy, sensitivity, specificity, precision, AUC, and F1 score. SHAP (SHapley Additive exPlanations) values were used to identify the critical risk factors and interpret the contributions of each feature to the prediction.

Results: XGBoost was identified as the best-performing model, achieving the highest accuracy, sensitivity, specificity, precision, AUC, and F1 score. SHAP analysis highlighted the most significant predictors of depression: the ratio family income to poverty (PIR), sex, hypertension, serum cotinine and hydroxycotine, BMI, education level, glucose levels, age, marital status, and renal function (eGFR).

Conclusion: We developed ML models to predict depression and utilized SHAP for interpretation. This approach identifies key factors associated with depression, encompassing socioeconomic, demographic, and health-related aspects.

Keywords: Depression; Depressive disorder; Light Gradient Boosted Machine (Light-GBM); Logistic regression; Naïve bayes; Random forest; Shapley Addictive exPlanations (SHAP); Supervised machine learning; Support Vector Machine (SVM); eXtreme Gradient Boost (XGBoost).

PubMed Disclaimer

Conflict of interest statement

Declarations. Ethics approval and consent to participate: Ethics approval for this study was granted by the National Centre for Health Statistics Research Ethics Review Board (Protocol # 2013-14). Since this study involves secondary data analysis, the original informed consent provided during primary data collection included permission for secondary use, eliminating the need for additional participant consent. Participants’ privacy was protected by anonymizing or de-identifying the data to prevent identification. Further details on NHANES ethics approval are available on the CDC’s official website: https://www.cdc.gov/nchs/nhanes/about/erb.html?CDC_AAref_Val=https://www.cdc.gov/nchs/nhanes/irba98.htm . Consent for publication: Not applicable. Relevant guidelines and regulations: Not applicable. Competing interests: The authors declare no competing interests.

Figures

**Fig. 1**
The contribution levels of all variables to depression based on SHAP values The global significance of each feature in the model is illustrated in the SHAP (blue) bar plot. It provides an overview of the features’ impact on the model’s output by displaying the mean absolute SHAP value for each feature. A feature (variable) is represented by each bar in the plot, and the length of the bar indicates the extent of the feature’s contribution to Depression

**Fig. 2**
The heat plot on SHAP values The relationships between the feature (variable) and Depression are revealed by the heat plot of SHAP values. The relationship between the value of a specific feature and its impact on prediction can be fundamentally understood through this. Each data point is associated with a specific participant and their corresponding Shapley value for a specific feature. The Shapley value, which is represented on the x-axis, and the feature’s prominence, which is represented on the y-axis, determine the position of a data point on this plot

**Fig. 3**
The impact of categorical variables on depression

**Fig. 4**
The impact of numerical variables on depression

See this image and copyright information in PMC

References

1. Steger MF, Kashdan TB. Depression and everyday social activity, belonging, and well-being. J Couns Psychol. 2009;56(2):289–300. 10.1037/a0015416. - PMC - PubMed
1. Santomauro DF et al. Nov., Global prevalence and burden of depressive and anxiety disorders in 204 countries and territories in 2020 due to the COVID-19 pandemic, The Lancet, vol. 398, no. 10312, pp. 1700–1712, 2021, 10.1016/S0140-6736(21)02143-7 - PMC - PubMed
1. World Health Organization. https://www.who.int/news-room/fact-sheets/detail/depression.
1. Reddy MS. Depression: the disorder and the Burden. Indian J Psychol Med. Jan. 2010;32(1):1–2. 10.4103/0253-7176.70510. - PMC - PubMed
1. Vu T, et al. Machine learning approaches for stroke risk prediction: findings from the Suita Study. J Cardiovasc Dev Dis. Jul. 2024;11:207. 10.3390/jcdd11070207. - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

JPMJPF2018/Japan Science and Technology Agency (JST) COI-NEXT 315

LinkOut - more resources

Full Text Sources
- BioMed Central
- PubMed Central
Research Materials
- NCI CPTC Antibody Characterization Program
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Prediction of depressive disorder using machine learning approaches: findings from the NHANES

Affiliations

Prediction of depressive disorder using machine learning approaches: findings from the NHANES

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Research Materials

Miscellaneous