A machine learning model for predicting acute respiratory distress syndrome risk in patients with sepsis using circulating immune cell parameters: a retrospective study

doi:10.1186/s12879-025-10974-8

. 2025 Apr 21;25(1):568.

doi: 10.1186/s12879-025-10974-8.

A machine learning model for predicting acute respiratory distress syndrome risk in patients with sepsis using circulating immune cell parameters: a retrospective study

Kaihuan Zhou^#¹, Lian Qin^#¹, Yin Chen¹, Hanming Gao¹, Yicong Ling¹, Qianqian Qin¹, Chenglin Mou¹, Tao Qin², Junyu Lu³

Affiliations

¹ Intensive Care Unit, The Second Affiliated Hospital of Guangxi Medical University, No 166 Daxuedong Road, Nanning, 530007, Guangxi, China.
² Intensive Care Unit, The Second Affiliated Hospital of Guangxi Medical University, No 166 Daxuedong Road, Nanning, 530007, Guangxi, China. qintao@gxmu.edu.cn.
³ Intensive Care Unit, The Second Affiliated Hospital of Guangxi Medical University, No 166 Daxuedong Road, Nanning, 530007, Guangxi, China. junyulu@gxmu.edu.cn.

^# Contributed equally.

PMID: 40259224
PMCID: PMC12013033
DOI: 10.1186/s12879-025-10974-8

A machine learning model for predicting acute respiratory distress syndrome risk in patients with sepsis using circulating immune cell parameters: a retrospective study

Kaihuan Zhou et al. BMC Infect Dis. 2025.

. 2025 Apr 21;25(1):568.

doi: 10.1186/s12879-025-10974-8.

Authors

Kaihuan Zhou^#¹, Lian Qin^#¹, Yin Chen¹, Hanming Gao¹, Yicong Ling¹, Qianqian Qin¹, Chenglin Mou¹, Tao Qin², Junyu Lu³

Affiliations

¹ Intensive Care Unit, The Second Affiliated Hospital of Guangxi Medical University, No 166 Daxuedong Road, Nanning, 530007, Guangxi, China.
² Intensive Care Unit, The Second Affiliated Hospital of Guangxi Medical University, No 166 Daxuedong Road, Nanning, 530007, Guangxi, China. qintao@gxmu.edu.cn.
³ Intensive Care Unit, The Second Affiliated Hospital of Guangxi Medical University, No 166 Daxuedong Road, Nanning, 530007, Guangxi, China. junyulu@gxmu.edu.cn.

^# Contributed equally.

PMID: 40259224
PMCID: PMC12013033
DOI: 10.1186/s12879-025-10974-8

Abstract

Background: Acute respiratory distress syndrome (ARDS) is a severe complication associated with a high mortality rate in patients with sepsis. Early identification of patients with sepsis at high risk of developing ARDS is crucial for timely intervention, optimization of treatment strategies, and improvement of clinical outcomes. However, traditional risk prediction methods are often insufficient. This study aimed to develop a machine learning (ML) model to predict the risk of ARDS in patients with sepsis using circulating immune cell parameters and other physiological data.

Methods: Clinical data from 10,559 patients with sepsis were obtained from the MIMIC-IV database. Principal component analysis (PCA) was used for dimensionality reduction and to comprehensively evaluate the models' predictive capabilities, we used several ML algorithms, including decision trees, k-nearest neighbors (KNN), logistic regression, naive Bayes, random forests, neural networks, XGBoost, and support vector machines (SVM) to predict ARDS risk. The model performance was assessed using the area under the receiver operating characteristic curve (AUC), accuracy, sensitivity, specificity, and F1 score. Shapley additive explanations (SHAP) were used to interpret the contribution of individual features to model predictions.

Results: Among all models, XGBoost showed the best performance with an AUC of 0.764. Feature importance analysis revealed that mean arterial pressure, monocyte count, neutrophil count, pH, and platelet count were key predictors of ARDS risk in patients with sepsis. The SHAP analysis provided further information on how these features contributed to the model's predictions, aiding in interpretability and potential clinical applications.

Conclusion: The XGBoost model using circulating immune cell parameters accurately predicted the risk of ARDS in patients with sepsis. This model could be a useful tool for the early identification of high-risk patients and timely intervention; however, further validation and integration into clinical practice are required.

Keywords: ARDS; MIMIC-IV database; Machine learning; Prediction model; Sepsis.

PubMed Disclaimer

Conflict of interest statement

Declarations. Ethics approval and consent to participate: Not applicable. Clinical trial number: Not applicable. Consent for publication: Not applicable. Competing interests: The authors declare no competing interests.

Figures

**Fig. 1**
Flowchart of the patient selection process * ARDS, acute respiratory distress syndrome; ICU, intensive care unit

**Fig. 2**
PCA biplot The biplot shows the distribution of variables and individual observations along the first two principal components (PC1 and PC2). The red points represent individual data observations, and the blue arrows indicate the direction and magnitude of each variable’s contribution to the variance of the dataset. PC1 accounted for 8.3% of the total variance and PC2 accounted for 6%. This visualization highlights how different variables influence the distribution of observations in reduced-dimensional space, facilitating the identification of patterns and relationships among variables * PCA, Principal component analysis

**Fig. 3**
Comparison of the ROC curve for multiple models ROC curves are used to compare the performance of various machine learning models in predicting ARDS in patients with sepsis. The AUC for each model is indicated in the legend. XGBoost (AUC = 0.764) showed the highest discriminative ability, closely followed by LR (AUC = 0.753). The decision-tree model performed the worst, with an AUC of 0.5, indicating no predictive power *ROC, receiver operating characteristic; AUC, area under the curve; KNN, k-nearest neighbors; SVM, support vector machine; XGBoost: extreme gradient boosting

**Fig. 4**
Calibration curve comparison for multiple models Calibration curves were used to compare the performance of various machine learning models in predicting ARDS in patients with sepsis. The curves illustrate the proximity of the predicted probabilities to the actual outcomes. XGBoost, LR, and random forest show better calibration, as their curves are closer to the diagonal line, indicating more accurate probability estimates. In contrast, KNN and naive Bayes show larger deviations from the diagonal, reflecting poorer calibration performance *KNN, k-nearest neighbors; SVM, support vector machine; XGBoost, extreme gradient boosting

**Fig. 5**
Feature importance comparison between the random forest and XGBoost models **Figure 5a** shows the top features of the random forest model, while Figure 5b shows the top features of the XGBoost model. Both models identify variables, such as MAP, pH, PLT, and monocytes, as significant predictors of ARDS in patients with sepsis. Although the models highlight similar key features, the ranking of feature importance differs. In the random forest model, feature importance is based on the Gini index, while in the XGBoost model, it is calculated using gain. In particular, the XGBoost model assigns greater importance to monocytes *MAP, mean arterial pressure; pH, potential of hydrogen; PLT, platelet count; monocytes, a type of white blood cell, BUN, blood urea nitrogen; SBP, systolic blood pressure; MCHC, mean corpuscular hemoglobin concentration, SII, systemic immune-inflammation index; PTT, partial thromboplastin time; RR, respiratory rate; PLR, platelet-to-lymphocyte ratio; Hb, hemoglobin; RDW, red cell distribution width, Lac, lactate

**Fig. 6**
SHAP value impact on the XGBoost model output The SHAP plot illustrates the impact of various features on the prediction of ARDS by the XGBoost model in patients with sepsis. The horizontal axis represents the SHAP values, where higher positive or negative values indicate the strength and direction of the influence of each feature on the model output. The features, such as MAP, BUN, and pH, had the strongest impact on the predictions *SHAP, SHapley Additive exPlanations; MAP, mean arterial pressure; BUN; blood urea nitrogen; pH, potential of hydrogen; SBP, systolic blood pressure; PLT, platelet count; MCHC, mean corpuscular hemoglobin concentration; SII, systemic immune-inflammation index; PTT, partial thromboplastin time; RR, respiratory rate; PLR, platelet-to-lymphocyte ratio; Hb: hemoglobin; RDW, red cell distribution width, Lac: lactate

**Fig. 7**
SHAP value distribution for each feature This figure shows the SHAP value distributions for each feature in the XGBoost model that aims to elucidate the contributions of various clinical, laboratory, and immune cell parameters in predicting the risk of ARDS in patients with sepsis. Each subplot reflects the influence of a specific feature on the model’s output, with the magnitude of the SHAP value positively correlated with the contribution of the feature to the model’s decision-making. Immune cells, such as lymphocytes and monocytes, show markedly high SHAP values, highlighting their crucial role in the prediction of ARDS risk. Additionally, metabolic parameters (e.g., pH, bicarbonate, and chloride) and clinical features (e.g., MAP and BUN) showed a substantial influence. Overall, SHAP analysis provides valuable insights into the role of each feature within the XGBoost model and corroborates the critical contributions of immune cell parameters, metabolic markers, and clinical indicators in predicting ARDS risk

See this image and copyright information in PMC

Cited by

Predicting 30-day in-hospital mortality in ICU asthma patients: a retrospective machine learning study with external validation.
Ge Y, Wang G, Liu T, Ji W, Sun J, Zhang Y. Ge Y, et al. BMC Pulm Med. 2025 Aug 12;25(1):387. doi: 10.1186/s12890-025-03881-w. BMC Pulm Med. 2025. PMID: 40797171 Free PMC article.

References

1. Singer M, Deutschman CS, Seymour CW, Shankar-Hari M, Annane D, Bauer M, et al. The third international consensus definitions for Sepsis and septic shock (Sepsis-3). JAMA. 2016;315(8):801–10. - PMC - PubMed
1. Bellani G, Laffey JG, Pham T, Fan E, Brochard L, Esteban A, et al. Epidemiology, patterns of care, and mortality for patients with acute respiratory distress syndrome in intensive care units in 50 countries. JAMA. 2016;315(8):788–800. - PubMed
1. Meyer NJ, Gattinoni L, Calfee CS. Acute respiratory distress syndrome. Lancet. 2021;398(10300):622–37. - PMC - PubMed
1. Combes A, Hajage D, Capellier G, Demoule A, Lavoué S, Guervilly C, et al. Extracorporeal membrane oxygenation for severe acute respiratory distress syndrome. N Engl J Med. 2018;378(21):1965–75. - PubMed
1. van der Poll T, Shankar-Hari M, Wiersinga WJ. The immunology of sepsis. Immunity. 2021;54(11):2450–64. - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
- BioMed Central
- PubMed Central
Medical
- MedlinePlus Health Information

[1] Singer M, Deutschman CS, Seymour CW, Shankar-Hari M, Annane D, Bauer M, et al. The third international consensus definitions for Sepsis and septic shock (Sepsis-3). JAMA. 2016;315(8):801–10. - PMC - PubMed

[2] Singer M, Deutschman CS, Seymour CW, Shankar-Hari M, Annane D, Bauer M, et al. The third international consensus definitions for Sepsis and septic shock (Sepsis-3). JAMA. 2016;315(8):801–10. - PMC - PubMed

[3] Bellani G, Laffey JG, Pham T, Fan E, Brochard L, Esteban A, et al. Epidemiology, patterns of care, and mortality for patients with acute respiratory distress syndrome in intensive care units in 50 countries. JAMA. 2016;315(8):788–800. - PubMed

[4] Bellani G, Laffey JG, Pham T, Fan E, Brochard L, Esteban A, et al. Epidemiology, patterns of care, and mortality for patients with acute respiratory distress syndrome in intensive care units in 50 countries. JAMA. 2016;315(8):788–800. - PubMed

[5] Meyer NJ, Gattinoni L, Calfee CS. Acute respiratory distress syndrome. Lancet. 2021;398(10300):622–37. - PMC - PubMed

[6] Meyer NJ, Gattinoni L, Calfee CS. Acute respiratory distress syndrome. Lancet. 2021;398(10300):622–37. - PMC - PubMed

[7] Combes A, Hajage D, Capellier G, Demoule A, Lavoué S, Guervilly C, et al. Extracorporeal membrane oxygenation for severe acute respiratory distress syndrome. N Engl J Med. 2018;378(21):1965–75. - PubMed

[8] Combes A, Hajage D, Capellier G, Demoule A, Lavoué S, Guervilly C, et al. Extracorporeal membrane oxygenation for severe acute respiratory distress syndrome. N Engl J Med. 2018;378(21):1965–75. - PubMed

[9] van der Poll T, Shankar-Hari M, Wiersinga WJ. The immunology of sepsis. Immunity. 2021;54(11):2450–64. - PubMed

[10] van der Poll T, Shankar-Hari M, Wiersinga WJ. The immunology of sepsis. Immunity. 2021;54(11):2450–64. - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A machine learning model for predicting acute respiratory distress syndrome risk in patients with sepsis using circulating immune cell parameters: a retrospective study

Affiliations

A machine learning model for predicting acute respiratory distress syndrome risk in patients with sepsis using circulating immune cell parameters: a retrospective study

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Medical

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Medical