Developing explainable machine learning models from biochemical and clinical data to predict all-cause and cause-specific mortality in CVD-cancer comorbidity: A longitudinal study based on NHANES
- PMID: 41386362
- DOI: 10.1016/j.ijcard.2025.134075
Developing explainable machine learning models from biochemical and clinical data to predict all-cause and cause-specific mortality in CVD-cancer comorbidity: A longitudinal study based on NHANES
Abstract
Background: Cardiovascular disease (CVD) and cancer are leading causes of mortality, often coexisting in aging populations. Patients with comorbidities face synergistically increased risks, yet accurate and interpretable prediction tools remain limited. Conventional Cox proportional hazards (Cox PH) models cannot fully capture nonlinear biochemical marker interactions, restricting predictive utility.
Objective: Develop interpretable machine learning (ML) models predicting all-cause, CVD-specific, and cancer-specific mortality in U.S. adults with comorbid CVD and cancer using routine biochemical profiles.
Methods: We analyzed 10 National Health and Nutrition Examination Survey (NHANES) cycles (1999-2018; N = 1094). Twenty-one biochemical markers and clinical covariates were screened via random survival forests (RSF). Cox PH, Cox model with elastic net regularization (Cox Net), gradient boosting, extreme survival trees (EST), and RSF were compared using time-dependent AUC, C-index, Brier score with 10-fold cross-validation and bootstrapping. SHapley Additive exPlanations (SHAP) quantified feature contributions.
Results: RSF consistently outperformed other models. Test-set C-indices were 0.729 (95 % CI: 0.716-0.741) for all-cause, 0.731 (0.704-0.753) for CVD, and 0.674 (0.557-0.684) for cancer mortality. RSF achieved the lowest Brier scores (all-cause: 0.175; CVD: 0.152; cancer: 0.237), indicating superior calibration. Pairwise testing showed RSF significantly outperformed Cox PH and Cox Net for cancer mortality (P < 0.05). SHAP identified age, red cell distribution width, creatinine, and albumin as key predictors, reflecting pathways of inflammation, renal dysfunction, and metabolic dysregulation. RSF maintained moderate precision-recall performance in imbalanced outcomes.
Conclusions: RSF outperformed conventional models by capturing nonlinear interactions while interpretable. This framework supports risk stratification for CVD-cancer comorbidity, highlighting clinical value of explainable ML in precision medicine.
Keywords: Cancer; Cardiovascular disease; Machine learning; Random survival forest; SHAP analysis.
Copyright © 2025 The Authors. Published by Elsevier B.V. All rights reserved.
Conflict of interest statement
Declaration of competing interest None declared.
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Medical
Research Materials
