Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 May 9:10:1165854.
doi: 10.3389/fnut.2023.1165854. eCollection 2023.

Evaluation of nutritional status and clinical depression classification using an explainable machine learning method

Affiliations

Evaluation of nutritional status and clinical depression classification using an explainable machine learning method

Payam Hosseinzadeh Kasani et al. Front Nutr. .

Abstract

Introduction: Depression is a prevalent disorder worldwide, with potentially severe implications. It contributes significantly to an increased risk of diseases associated with multiple risk factors. Early accurate diagnosis of depressive symptoms is a critical first step toward management, intervention, and prevention. Various nutritional and dietary compounds have been suggested to be involved in the onset, maintenance, and severity of depressive disorders. Despite the challenges to better understanding the association between nutritional risk factors and the occurrence of depression, assessing the interplay of these markers through supervised machine learning remains to be fully explored.

Methods: This study aimed to determine the ability of machine learning-based decision support methods to identify the presence of depression using publicly available health data from the Korean National Health and Nutrition Examination Survey. Two exploration techniques, namely, uniform manifold approximation and projection and Pearson correlation, were performed for explanatory analysis among datasets. A grid search optimization with cross-validation was performed to fine-tune the models for classifying depression with the highest accuracy. Several performance measures, including accuracy, precision, recall, F1 score, confusion matrix, areas under the precision-recall and receiver operating characteristic curves, and calibration plot, were used to compare classifier performances. We further investigated the importance of the features provided: visualized interpretation using ELI5, partial dependence plots, and local interpretable using model-agnostic explanations and Shapley additive explanation for the prediction at both the population and individual levels.

Results: The best model achieved an accuracy of 86.18% for XGBoost and an area under the curve of 84.96% for the random forest model in original dataset and the XGBoost algorithm with an accuracy of 86.02% and an area under the curve of 85.34% in the quantile-based dataset. The explainable results revealed a complementary observation of the relative changes in feature values, and, thus, the importance of emergent depression risks could be identified.

Discussion: The strength of our approach is the large sample size used for training with a fine-tuned model. The machine learning-based analysis showed that the hyper-tuned model has empirically higher accuracy in classifying patients with depressive disorder, as evidenced by the set of interpretable experiments, and can be an effective solution for disease control.

Keywords: classification; clinical depression; depression; interpretability; machine learning; nutrition.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

Figure 1
Figure 1
Overview of the machine learning workflows for prediction of depressive disorder.
Figure 2
Figure 2
Visualization of non-depression and depression space with the UMAP method. Each dot represents a patient in a two-dimensional space, and its color represents the group. Blue dots (0) represent non-depression people and green dots (1) represent patients' depression. (A) Original dataset, (B) quantile-based dataset, (C) test set from the original dataset, (D) train set from the original dataset, (E) test set from the quantile-based dataset, and (F) train set from the quantile-based dataset.
Figure 3
Figure 3
Features correlating with target: (A) All variables, (B) nutritional default variables, and (C) nutritional quantile-based variables.
Figure 4
Figure 4
ROC curves and precision-recall PRC curve of top three machine learning models. (A) ROC curves for the original dataset, (B) ROC curves for the quantile-based dataset, (C) PRC curves for the original dataset, and (D) PRC curves for the quantile-based dataset.
Figure 5
Figure 5
Interpolated precision-recall-F1 curve. The vertical dotted lines indicate the recall at which the curves achieve optimal precision.
Figure 6
Figure 6
Confusion matrix of top three models on original and quantile-based datasets: Each of the confusion matrices are visualized as a color-coded heat map. It can be observed that all the plotted confusion matrices have darker cells for the diagonal elements. This indicates that more data are being predicted correctly to their respective label. Conversely, the off-diagonal elements with light shades indicate misclassifications done by the models.
Figure 7
Figure 7
Calibration curve of top three models on original and quantile-based datasets.
Figure 8
Figure 8
Feature contribution analysis performed by ELI5 for the classification of the depression model of the top three machine learning models. (A) LR model for the original dataset, (B) XGB model for the original dataset, (C) RF model for the original dataset, (D) LR model for the quantile-based dataset, (E) XGB model for the quantile-based dataset, and (F) RF model for the quantile-based dataset.
Figure 9
Figure 9
Partial dependence plots for the original dataset: (A) RF model and (B) XGBoost model. The partial plots show the dependencies of depression prediction change on each of the nutritional variables.
Figure 10
Figure 10
LIME and SHAP explanation plots of two representative individuals, patients 12 and 18. (A) Random forest model for an actual positive instance, (B) random forest model for an actual negative instance, (C) XGBoost model for an actual positive instance, and (D) XGBoost model for an actual negative instance.

Similar articles

Cited by

References

    1. Baldessarini RJ, Forte A, Selle V, Sim K, Tondo L, Undurraga J, et al. . Morbidity in depressive disorders. Psychother Psychosom. (2017) 86:65–72. 10.1159/000448661 - DOI - PubMed
    1. Kessler RC, Bromet EJ. The epidemiology of depression across cultures. Annu Rev Public Health. (2013) 34:119–38. 10.1146/annurev-publhealth-031912-114409 - DOI - PMC - PubMed
    1. Mouchet-Mages S, Baylé FJ. Sadness as an integral part of depression. Dialogues Clin Neurosci. (2008) 10:321–7. 10.31887/DCNS.2008.10.3/smmages - DOI - PMC - PubMed
    1. Nguyen DT, Wright EP, Dedding C, Pham TT, Bunders J. Low self-esteem and its association with anxiety, depression, and suicidal ideation in vietnamese secondary school students: a cross-sectional study. Front Psychiatry. (2019) 27:10. 10.3389/fpsyt.2019.00698 - DOI - PMC - PubMed
    1. Layne C, Merry J, Christian J, Ginn P. Motivational deficit in depression. Cognit Ther Res. (1982) 6:259–73. 10.1007/BF01173575 - DOI

LinkOut - more resources