A predictive model for hospital death in cancer patients with acute pulmonary embolism using XGBoost machine learning and SHAP interpretation
- PMID: 40414906
- PMCID: PMC12104392
- DOI: 10.1038/s41598-025-02072-1
A predictive model for hospital death in cancer patients with acute pulmonary embolism using XGBoost machine learning and SHAP interpretation
Abstract
The prediction of in-hospital mortality in cancer patients with acute pulmonary embolism (APE) remains a significant clinical challenge. This study aimed to develop and validate a machine learning model using XGBoost to predict in-hospital mortality in this vulnerable population. A retrospective cohort study was conducted using the MIMIC-IV 2.2 database and external data from the intensive care unit of Cancer hospital, Chinese Academy of Medical Sciences, collected between May 1, 2021, and April 30, 2023. A total of 448 cancer patients with APE were included from the MIMIC-IV 2.2 database, divided into a training set (70%, n = 314) and an internal validation set (30%, n = 134). An external validation cohort consisted of 56 patients. An XGBoost model was trained and the SHAP (SHapley Additive Explanations) method was used to identify the top 10 predictors of in-hospital mortality. These predictors included Glasgow Coma Scale (GCS) score, albumin, platelet count, age, serum creatinine, hemoglobin, presence of metastasis, lactate, creatine kinase (CK), and types of cancer. The XGBoost model achieved an area under the ROC curve (AUC) of 0.806 (95% CI: 0.717-0.896) in the internal validation set and 0.724 (95% CI: 0.686-0.901) in the external validation set. Calibration curves indicated good model fit, and decision curve analysis (DCA) demonstrated a high clinical benefit across both the internal and external validation cohorts. The XGBoost model, leveraging SHAP for interpretation, effectively predicts in-hospital mortality in cancer patients with APE. This model provides valuable insights for clinical decision-making and has the potential to improve patient outcomes through early intervention and personalized treatment strategies. Further validation in diverse clinical settings is warranted to confirm its generalizability.
Keywords: Acute pulmonary embolism; Cancer; In-hospital mortality; Machine learning.
© 2025. The Author(s).
Conflict of interest statement
DeclarationsDeclarations. Competing interests: The authors declare no competing interests. Ethics approval and consent to participate: The data in this study were from two public de-identified databases. After completing Collaborative Institutional Training Initiative (CITI program), we got permission to access the database (Record ID: 36,067,767). Consent for publication: Not applicable.
Figures
References
-
- Lee, A. Y. et al. Low-molecular-weight heparin versus a coumarin for the prevention of recurrent venous thromboembolism in patients with cancer. N Engl. J. Med.349(2), 146–153 (2003). - PubMed
-
- Sorensen, H. T., Mellemkjaer, L., Olsen, J. H. & Baron, J. A. Prognosis of cancers associated with venous thromboembolism. N Engl. J. Med.343(25), 1846–1850 (2000). - PubMed
-
- Lubetsky, A. Pulmonary embolism in cancer patients: A review. ISR Med. Assoc. J.24(3), 179–182 (2022). - PubMed
-
- Mulder, F. I. et al. Venous thromboembolism in cancer patients: A population-based cohort study. Blood137(14), 1959–1969 (2021). - PubMed
MeSH terms
LinkOut - more resources
Full Text Sources
Medical
Research Materials
Miscellaneous
