Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 May 25;15(1):18268.
doi: 10.1038/s41598-025-02072-1.

A predictive model for hospital death in cancer patients with acute pulmonary embolism using XGBoost machine learning and SHAP interpretation

Affiliations

A predictive model for hospital death in cancer patients with acute pulmonary embolism using XGBoost machine learning and SHAP interpretation

Zhen-Nan Yuan et al. Sci Rep. .

Abstract

The prediction of in-hospital mortality in cancer patients with acute pulmonary embolism (APE) remains a significant clinical challenge. This study aimed to develop and validate a machine learning model using XGBoost to predict in-hospital mortality in this vulnerable population. A retrospective cohort study was conducted using the MIMIC-IV 2.2 database and external data from the intensive care unit of Cancer hospital, Chinese Academy of Medical Sciences, collected between May 1, 2021, and April 30, 2023. A total of 448 cancer patients with APE were included from the MIMIC-IV 2.2 database, divided into a training set (70%, n = 314) and an internal validation set (30%, n = 134). An external validation cohort consisted of 56 patients. An XGBoost model was trained and the SHAP (SHapley Additive Explanations) method was used to identify the top 10 predictors of in-hospital mortality. These predictors included Glasgow Coma Scale (GCS) score, albumin, platelet count, age, serum creatinine, hemoglobin, presence of metastasis, lactate, creatine kinase (CK), and types of cancer. The XGBoost model achieved an area under the ROC curve (AUC) of 0.806 (95% CI: 0.717-0.896) in the internal validation set and 0.724 (95% CI: 0.686-0.901) in the external validation set. Calibration curves indicated good model fit, and decision curve analysis (DCA) demonstrated a high clinical benefit across both the internal and external validation cohorts. The XGBoost model, leveraging SHAP for interpretation, effectively predicts in-hospital mortality in cancer patients with APE. This model provides valuable insights for clinical decision-making and has the potential to improve patient outcomes through early intervention and personalized treatment strategies. Further validation in diverse clinical settings is warranted to confirm its generalizability.

Keywords: Acute pulmonary embolism; Cancer; In-hospital mortality; Machine learning.

PubMed Disclaimer

Conflict of interest statement

DeclarationsDeclarations. Competing interests: The authors declare no competing interests. Ethics approval and consent to participate: The data in this study were from two public de-identified databases. After completing Collaborative Institutional Training Initiative (CITI program), we got permission to access the database (Record ID: 36,067,767). Consent for publication: Not applicable.

Figures

Fig. 1
Fig. 1
Flow diagram of the patient selection in MIMIC IV and ICU of cancer hospital, Chinese academy of medical sciences. (MIMIC-IV, Medical Information Mart for Intensive Care).
Fig. 2
Fig. 2
The performance of the predicted model in the internal validation set (A) and external validation set (B). Calibration curves of the predicted model for predicting hospital morality both in the internal validation set (C) and external validation set (D). Decision-curve analysis of the predicted model in internal validation set (E) and external validation set (F).
Fig. 3
Fig. 3
(A) Importance chart of SHAP variables, with the included features sorted by the average absolute value of SHAP from highest to lowest. (B, C) SHAP force plot for two cases: Color indicates the contribution of each feature, purple indicates that the feature has a negative effect on the prediction (arrow to the left, SHAP value decreases), and yellow indicates that the feature has a positive effect on the prediction (arrow to the right, SHAP value increases). The length of the color bar indicates the strength of the contribution, and E[f(x)] indicates the SHAP reference value, which is the mean predicted by the model. f (x) represents the SHAP value of the individual.
Fig. 4
Fig. 4
Interaction summary plot generated using SHAP values. This plot displays the top 10 most interacting features of the model. On the x-axis and y-axis, the features are listed according to their interaction importance, with the feature names ordered as follows: GCS, platelets, age, albumin, creatinine, lactate, hemoglobin, metastasis, CK, and type. Each point on the plot represents the SHAP interaction value for a specific feature interaction, highlighting how pairs of features together impact the model’s predictions. (GCS, Glasgow Coma Scale; CK, creatine kinase.)

Similar articles

Cited by

References

    1. Lee, A. Y. et al. Low-molecular-weight heparin versus a coumarin for the prevention of recurrent venous thromboembolism in patients with cancer. N Engl. J. Med.349(2), 146–153 (2003). - PubMed
    1. Sorensen, H. T., Mellemkjaer, L., Olsen, J. H. & Baron, J. A. Prognosis of cancers associated with venous thromboembolism. N Engl. J. Med.343(25), 1846–1850 (2000). - PubMed
    1. Lubetsky, A. Pulmonary embolism in cancer patients: A review. ISR Med. Assoc. J.24(3), 179–182 (2022). - PubMed
    1. Mulder, F. I. et al. Venous thromboembolism in cancer patients: A population-based cohort study. Blood137(14), 1959–1969 (2021). - PubMed
    1. Surov, A., Thormann, M., Bar, C., Wienke, A. & Borggrefe, J. Validation of clinical-radiological scores for prognosis of mortality in acute pulmonary embolism. Respir Res24(1), 195 (2023). - PMC - PubMed