Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Multicenter Study
. 2025 Jan 8:14:1500326.
doi: 10.3389/fcimb.2024.1500326. eCollection 2024.

Interpretable machine learning-based prediction of 28-day mortality in ICU patients with sepsis: a multicenter retrospective study

Affiliations
Multicenter Study

Interpretable machine learning-based prediction of 28-day mortality in ICU patients with sepsis: a multicenter retrospective study

Li Shen et al. Front Cell Infect Microbiol. .

Abstract

Background: Sepsis is a major cause of mortality in intensive care units (ICUs) and continues to pose a significant global health challenge, with sepsis-related deaths contributing substantially to the overall burden on healthcare systems worldwide. The primary objective was to construct and evaluate a machine learning (ML) model for forecasting 28-day all-cause mortality among ICU sepsis patients.

Methods: Data for the study was sourced from the eICU Collaborative Research Database (eICU-CRD) (version 2.0). The main outcome was 28-day all-cause mortality. Predictor selection for the final model was conducted using the least absolute shrinkage and selection operator (LASSO) regression analysis and the Boruta feature selection algorithm. Five machine learning algorithms including logistic regression (LR), decision tree (DT), extreme gradient boosting (XGBoost), support vector machine (SVM), and light gradient boosting machine (lightGBM) were employed to construct models using 10-fold cross-validation. Model performance was evaluated using AUC, accuracy, sensitivity, specificity, recall, and F1 score. Additionally, we performed an interpretability analysis on the model that showed the most stable performance.

Results: The final study cohort comprised 4564 patients, among whom 568 (12.4%) died within 28 days of ICU admission. The XGBoost algorithm demonstrated the most reliable performance, achieving an AUC of 0.821, balancing sensitivity (0.703) and specificity (0.798). The top three risk predictors of mortality included APACHE score, serum lactate levels, and AST.

Conclusion: ML models reliably predicted 28-day mortality in critically ill sepsis patients. Of the models evaluated, the XGBoost algorithm exhibited the most stable performance in identifying patients at elevated mortality risk. Model interpretability analysis identified crucial predictors, potentially informing clinical decisions for sepsis patients in the ICU.

Keywords: 28-day mortality; XGBoost; machine learning; multicenter retrospective study; sepsis.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

Figure 1
Figure 1
The whole study workflow.
Figure 2
Figure 2
Features selection by LASSO regression and Boruta. (A) The variation characteristics of the LASSO coefficient. Selection of the optimal parameter Lambda (λ) in LASSO involved plotting log (λ) on the X-axis and regression coefficients on the Y-axis. The different colored lines represented the different variables. (B) Optimization parameters (λ) of the LASSO model were selected by 10-fold cross-validation. The left dashed line represents λmin (minimum cross-validated error), while the right dashed line indicates λ1se (the largest λ within one standard error of λmin). (C) Feature identification via Boruta algorithm. The X-axis represented all features, and the Y-axis was the Z-value of each feature. The green boxes represented the initial 26 significant variables, while the yellow ones denoted tentative, and the red ones indicated unimportant.
Figure 3
Figure 3
Receiver operating characteristic curve and of the five models. (A) ROC of the training set. (B) ROC of the validation set. DT, decision tree; LGBM, light gradient boosting machine; LR, logistic regression; SVM, support vector machine; XGBoost, extreme gradient boosting.
Figure 4
Figure 4
The SHAP analysis of the XGBoost model. (A) A bar plot displaying the mean SHAP value for the top ten variables. (B) The beeswarm plots displayed the distribution of the top ten variables, with variable values represented by different colors. Each sample was represented by a colored point. The x-axis represented the SHAP value, while the color coding indicated the feature values. (C) SHAP waterfall plot for case 1. (D) SHAP waterfall plot for case 2.

Similar articles

Cited by

References

    1. Alhamzawi R., Ali H. T. M. (2018). The Bayesian adaptive lasso regression. Math Biosci. 303, 75–82. doi: 10.1016/j.mbs.2018.06.004 - DOI - PubMed
    1. Baysan M., Arbous M. S., Steyerberg E. W., van der Bom J. G. (2022). Prediction of inhospital mortality in critically ill patients with sepsis: confirmation of the added value of 24-hour lactate to acute physiology and chronic health evaluation IV. Crit. Care Explor. 4, e0750. doi: 10.1097/CCE.0000000000000750 - DOI - PMC - PubMed
    1. Cui S. H., Liang C. Y., Hao Y. F. (2024). Analysis of risk factors affecting the prognosis of patients with sepsis and construction of nomogram prediction model. Eur. Rev. Med. Pharmacol. Sci. 28 (6), 2409–2418. doi: 10.26355/eurrev_202403_35748 - DOI - PubMed
    1. Dankl D., Rezar R., Mamandipoor B., Zhou Z, Wernly S, Wernly B, et al. (2022). Red cell distribution width is independently associated with mortality in sepsis. Med. Princ Pract. 31 (2), 187–194. doi: 10.1159/000522261 - DOI - PMC - PubMed
    1. Ejiyi C. J., Qin Z., Ukwuoma C. C., Nneji G. U., Monday H. N., Ejiyi M. B., et al. (2024). Comparative performance analysis of Boruta, SHAP, and Borutashap for disease diagnosis: A study with multiple machine learning algorithms. Network, 1–38. doi: 10.1080/0954898X.2024.2331506 - DOI - PubMed

Publication types

LinkOut - more resources