A predictive model for hospital death in cancer patients with acute pulmonary embolism using XGBoost machine learning and SHAP interpretation
- PMID: 40414906
- PMCID: PMC12104392
- DOI: 10.1038/s41598-025-02072-1
A predictive model for hospital death in cancer patients with acute pulmonary embolism using XGBoost machine learning and SHAP interpretation
Abstract
The prediction of in-hospital mortality in cancer patients with acute pulmonary embolism (APE) remains a significant clinical challenge. This study aimed to develop and validate a machine learning model using XGBoost to predict in-hospital mortality in this vulnerable population. A retrospective cohort study was conducted using the MIMIC-IV 2.2 database and external data from the intensive care unit of Cancer hospital, Chinese Academy of Medical Sciences, collected between May 1, 2021, and April 30, 2023. A total of 448 cancer patients with APE were included from the MIMIC-IV 2.2 database, divided into a training set (70%, n = 314) and an internal validation set (30%, n = 134). An external validation cohort consisted of 56 patients. An XGBoost model was trained and the SHAP (SHapley Additive Explanations) method was used to identify the top 10 predictors of in-hospital mortality. These predictors included Glasgow Coma Scale (GCS) score, albumin, platelet count, age, serum creatinine, hemoglobin, presence of metastasis, lactate, creatine kinase (CK), and types of cancer. The XGBoost model achieved an area under the ROC curve (AUC) of 0.806 (95% CI: 0.717-0.896) in the internal validation set and 0.724 (95% CI: 0.686-0.901) in the external validation set. Calibration curves indicated good model fit, and decision curve analysis (DCA) demonstrated a high clinical benefit across both the internal and external validation cohorts. The XGBoost model, leveraging SHAP for interpretation, effectively predicts in-hospital mortality in cancer patients with APE. This model provides valuable insights for clinical decision-making and has the potential to improve patient outcomes through early intervention and personalized treatment strategies. Further validation in diverse clinical settings is warranted to confirm its generalizability.
Keywords: Acute pulmonary embolism; Cancer; In-hospital mortality; Machine learning.
© 2025. The Author(s).
Conflict of interest statement
DeclarationsDeclarations. Competing interests: The authors declare no competing interests. Ethics approval and consent to participate: The data in this study were from two public de-identified databases. After completing Collaborative Institutional Training Initiative (CITI program), we got permission to access the database (Record ID: 36,067,767). Consent for publication: Not applicable.
Figures




Similar articles
-
[Predicting Intensive Care Unit Mortality in Patients With Heart Failure Combined With Acute Kidney Injury Using an Interpretable Machine Learning Model: A Retrospective Cohort Study].Sichuan Da Xue Xue Bao Yi Xue Ban. 2025 Jan 20;56(1):183-190. doi: 10.12182/20250160507. Sichuan Da Xue Xue Bao Yi Xue Ban. 2025. PMID: 40109460 Free PMC article. Chinese.
-
Prediction of STAS in lung adenocarcinoma with nodules ≤ 2 cm using machine learning: a multicenter retrospective study.BMC Cancer. 2025 Mar 7;25(1):417. doi: 10.1186/s12885-025-13783-z. BMC Cancer. 2025. PMID: 40055661 Free PMC article.
-
Explainable SHAP-XGBoost models for pressure injuries among patients requiring with mechanical ventilation in intensive care unit.Sci Rep. 2025 Mar 22;15(1):9878. doi: 10.1038/s41598-025-92848-2. Sci Rep. 2025. PMID: 40118880 Free PMC article.
-
Development and validation of an interpretable machine learning model for predicting in-hospital mortality for ischemic stroke patients in ICU.Int J Med Inform. 2025 Jun;198:105874. doi: 10.1016/j.ijmedinf.2025.105874. Epub 2025 Mar 9. Int J Med Inform. 2025. PMID: 40073651
-
Development and Validation of an Interpretable Machine Learning Model for Early Prognosis Prediction in ICU Patients with Malignant Tumors and Hyperkalemia.Medicine (Baltimore). 2024 Jul 26;103(30):e38747. doi: 10.1097/MD.0000000000038747. Medicine (Baltimore). 2024. PMID: 39058887 Free PMC article.
Cited by
-
Development and Validation of the Early Gastric Carcinoma Prediction Model in Post-Eradication Patients with Intestinal Metaplasia.Cancers (Basel). 2025 Jun 26;17(13):2158. doi: 10.3390/cancers17132158. Cancers (Basel). 2025. PMID: 40647458 Free PMC article.
-
Exploration and analysis of risk factors for coronary artery disease with type 2 diabetes based on SHAP explainable machine learning algorithm.Sci Rep. 2025 Aug 12;15(1):29521. doi: 10.1038/s41598-025-11142-3. Sci Rep. 2025. PMID: 40796917 Free PMC article.
References
-
- Lee, A. Y. et al. Low-molecular-weight heparin versus a coumarin for the prevention of recurrent venous thromboembolism in patients with cancer. N Engl. J. Med.349(2), 146–153 (2003). - PubMed
-
- Sorensen, H. T., Mellemkjaer, L., Olsen, J. H. & Baron, J. A. Prognosis of cancers associated with venous thromboembolism. N Engl. J. Med.343(25), 1846–1850 (2000). - PubMed
-
- Lubetsky, A. Pulmonary embolism in cancer patients: A review. ISR Med. Assoc. J.24(3), 179–182 (2022). - PubMed
-
- Mulder, F. I. et al. Venous thromboembolism in cancer patients: A population-based cohort study. Blood137(14), 1959–1969 (2021). - PubMed
MeSH terms
LinkOut - more resources
Full Text Sources
Medical
Research Materials
Miscellaneous