Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Apr 21;25(1):568.
doi: 10.1186/s12879-025-10974-8.

A machine learning model for predicting acute respiratory distress syndrome risk in patients with sepsis using circulating immune cell parameters: a retrospective study

Affiliations

A machine learning model for predicting acute respiratory distress syndrome risk in patients with sepsis using circulating immune cell parameters: a retrospective study

Kaihuan Zhou et al. BMC Infect Dis. .

Abstract

Background: Acute respiratory distress syndrome (ARDS) is a severe complication associated with a high mortality rate in patients with sepsis. Early identification of patients with sepsis at high risk of developing ARDS is crucial for timely intervention, optimization of treatment strategies, and improvement of clinical outcomes. However, traditional risk prediction methods are often insufficient. This study aimed to develop a machine learning (ML) model to predict the risk of ARDS in patients with sepsis using circulating immune cell parameters and other physiological data.

Methods: Clinical data from 10,559 patients with sepsis were obtained from the MIMIC-IV database. Principal component analysis (PCA) was used for dimensionality reduction and to comprehensively evaluate the models' predictive capabilities, we used several ML algorithms, including decision trees, k-nearest neighbors (KNN), logistic regression, naive Bayes, random forests, neural networks, XGBoost, and support vector machines (SVM) to predict ARDS risk. The model performance was assessed using the area under the receiver operating characteristic curve (AUC), accuracy, sensitivity, specificity, and F1 score. Shapley additive explanations (SHAP) were used to interpret the contribution of individual features to model predictions.

Results: Among all models, XGBoost showed the best performance with an AUC of 0.764. Feature importance analysis revealed that mean arterial pressure, monocyte count, neutrophil count, pH, and platelet count were key predictors of ARDS risk in patients with sepsis. The SHAP analysis provided further information on how these features contributed to the model's predictions, aiding in interpretability and potential clinical applications.

Conclusion: The XGBoost model using circulating immune cell parameters accurately predicted the risk of ARDS in patients with sepsis. This model could be a useful tool for the early identification of high-risk patients and timely intervention; however, further validation and integration into clinical practice are required.

Keywords: ARDS; MIMIC-IV database; Machine learning; Prediction model; Sepsis.

PubMed Disclaimer

Conflict of interest statement

Declarations. Ethics approval and consent to participate: Not applicable. Clinical trial number: Not applicable. Consent for publication: Not applicable. Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Flowchart of the patient selection process * ARDS, acute respiratory distress syndrome; ICU, intensive care unit
Fig. 2
Fig. 2
PCA biplot The biplot shows the distribution of variables and individual observations along the first two principal components (PC1 and PC2). The red points represent individual data observations, and the blue arrows indicate the direction and magnitude of each variable’s contribution to the variance of the dataset. PC1 accounted for 8.3% of the total variance and PC2 accounted for 6%. This visualization highlights how different variables influence the distribution of observations in reduced-dimensional space, facilitating the identification of patterns and relationships among variables * PCA, Principal component analysis
Fig. 3
Fig. 3
Comparison of the ROC curve for multiple models ROC curves are used to compare the performance of various machine learning models in predicting ARDS in patients with sepsis. The AUC for each model is indicated in the legend. XGBoost (AUC = 0.764) showed the highest discriminative ability, closely followed by LR (AUC = 0.753). The decision-tree model performed the worst, with an AUC of 0.5, indicating no predictive power *ROC, receiver operating characteristic; AUC, area under the curve; KNN, k-nearest neighbors; SVM, support vector machine; XGBoost: extreme gradient boosting
Fig. 4
Fig. 4
Calibration curve comparison for multiple models Calibration curves were used to compare the performance of various machine learning models in predicting ARDS in patients with sepsis. The curves illustrate the proximity of the predicted probabilities to the actual outcomes. XGBoost, LR, and random forest show better calibration, as their curves are closer to the diagonal line, indicating more accurate probability estimates. In contrast, KNN and naive Bayes show larger deviations from the diagonal, reflecting poorer calibration performance *KNN, k-nearest neighbors; SVM, support vector machine; XGBoost, extreme gradient boosting
Fig. 5
Fig. 5
Feature importance comparison between the random forest and XGBoost models Figure 5a shows the top features of the random forest model, while Figure 5b shows the top features of the XGBoost model. Both models identify variables, such as MAP, pH, PLT, and monocytes, as significant predictors of ARDS in patients with sepsis. Although the models highlight similar key features, the ranking of feature importance differs. In the random forest model, feature importance is based on the Gini index, while in the XGBoost model, it is calculated using gain. In particular, the XGBoost model assigns greater importance to monocytes *MAP, mean arterial pressure; pH, potential of hydrogen; PLT, platelet count; monocytes, a type of white blood cell, BUN, blood urea nitrogen; SBP, systolic blood pressure; MCHC, mean corpuscular hemoglobin concentration, SII, systemic immune-inflammation index; PTT, partial thromboplastin time; RR, respiratory rate; PLR, platelet-to-lymphocyte ratio; Hb, hemoglobin; RDW, red cell distribution width, Lac, lactate
Fig. 6
Fig. 6
SHAP value impact on the XGBoost model output The SHAP plot illustrates the impact of various features on the prediction of ARDS by the XGBoost model in patients with sepsis. The horizontal axis represents the SHAP values, where higher positive or negative values indicate the strength and direction of the influence of each feature on the model output. The features, such as MAP, BUN, and pH, had the strongest impact on the predictions *SHAP, SHapley Additive exPlanations; MAP, mean arterial pressure; BUN; blood urea nitrogen; pH, potential of hydrogen; SBP, systolic blood pressure; PLT, platelet count; MCHC, mean corpuscular hemoglobin concentration; SII, systemic immune-inflammation index; PTT, partial thromboplastin time; RR, respiratory rate; PLR, platelet-to-lymphocyte ratio; Hb: hemoglobin; RDW, red cell distribution width, Lac: lactate
Fig. 7
Fig. 7
SHAP value distribution for each feature This figure shows the SHAP value distributions for each feature in the XGBoost model that aims to elucidate the contributions of various clinical, laboratory, and immune cell parameters in predicting the risk of ARDS in patients with sepsis. Each subplot reflects the influence of a specific feature on the model’s output, with the magnitude of the SHAP value positively correlated with the contribution of the feature to the model’s decision-making. Immune cells, such as lymphocytes and monocytes, show markedly high SHAP values, highlighting their crucial role in the prediction of ARDS risk. Additionally, metabolic parameters (e.g., pH, bicarbonate, and chloride) and clinical features (e.g., MAP and BUN) showed a substantial influence. Overall, SHAP analysis provides valuable insights into the role of each feature within the XGBoost model and corroborates the critical contributions of immune cell parameters, metabolic markers, and clinical indicators in predicting ARDS risk

Similar articles

Cited by

References

    1. Singer M, Deutschman CS, Seymour CW, Shankar-Hari M, Annane D, Bauer M, et al. The third international consensus definitions for Sepsis and septic shock (Sepsis-3). JAMA. 2016;315(8):801–10. - PMC - PubMed
    1. Bellani G, Laffey JG, Pham T, Fan E, Brochard L, Esteban A, et al. Epidemiology, patterns of care, and mortality for patients with acute respiratory distress syndrome in intensive care units in 50 countries. JAMA. 2016;315(8):788–800. - PubMed
    1. Meyer NJ, Gattinoni L, Calfee CS. Acute respiratory distress syndrome. Lancet. 2021;398(10300):622–37. - PMC - PubMed
    1. Combes A, Hajage D, Capellier G, Demoule A, Lavoué S, Guervilly C, et al. Extracorporeal membrane oxygenation for severe acute respiratory distress syndrome. N Engl J Med. 2018;378(21):1965–75. - PubMed
    1. van der Poll T, Shankar-Hari M, Wiersinga WJ. The immunology of sepsis. Immunity. 2021;54(11):2450–64. - PubMed

LinkOut - more resources