Predicting Colorectal Cancer Survival Using Time-to-Event Machine Learning: Retrospective Cohort Study
- PMID: 37883174
- PMCID: PMC10636616
- DOI: 10.2196/44417
Predicting Colorectal Cancer Survival Using Time-to-Event Machine Learning: Retrospective Cohort Study
Abstract
Background: Machine learning (ML) methods have shown great potential in predicting colorectal cancer (CRC) survival. However, the ML models introduced thus far have mainly focused on binary outcomes and have not considered the time-to-event nature of this type of modeling.
Objective: This study aims to evaluate the performance of ML approaches for modeling time-to-event survival data and develop transparent models for predicting CRC-specific survival.
Methods: The data set used in this retrospective cohort study contains information on patients who were newly diagnosed with CRC between December 28, 2012, and December 27, 2019, at West China Hospital, Sichuan University. We assessed the performance of 6 representative ML models, including random survival forest (RSF), gradient boosting machine (GBM), DeepSurv, DeepHit, neural net-extended time-dependent Cox (or Cox-Time), and neural multitask logistic regression (N-MTLR) in predicting CRC-specific survival. Multiple imputation by chained equations method was applied to handle missing values in variables. Multivariable analysis and clinical experience were used to select significant features associated with CRC survival. Model performance was evaluated in stratified 5-fold cross-validation repeated 5 times by using the time-dependent concordance index, integrated Brier score, calibration curves, and decision curves. The SHapley Additive exPlanations method was applied to calculate feature importance.
Results: A total of 2157 patients with CRC were included in this study. Among the 6 time-to-event ML models, the DeepHit model exhibited the best discriminative ability (time-dependent concordance index 0.789, 95% CI 0.779-0.799) and the RSF model produced better-calibrated survival estimates (integrated Brier score 0.096, 95% CI 0.094-0.099), but these are not statistically significant. Additionally, the RSF, GBM, DeepSurv, Cox-Time, and N-MTLR models have comparable predictive accuracy to the Cox Proportional Hazards model in terms of discrimination and calibration. The calibration curves showed that all the ML models exhibited good 5-year survival calibration. The decision curves for CRC-specific survival at 5 years showed that all the ML models, especially RSF, had higher net benefits than default strategies of treating all or no patients at a range of clinically reasonable risk thresholds. The SHapley Additive exPlanations method revealed that R0 resection, tumor-node-metastasis staging, and the number of positive lymph nodes were important factors for 5-year CRC-specific survival.
Conclusions: This study showed the potential of applying time-to-event ML predictive algorithms to help predict CRC-specific survival. The RSF, GBM, Cox-Time, and N-MTLR algorithms could provide nonparametric alternatives to the Cox Proportional Hazards model in estimating the survival probability of patients with CRC. The transparent time-to-event ML models help clinicians to more accurately predict the survival rate for these patients and improve patient outcomes by enabling personalized treatment plans that are informed by explainable ML models.
Keywords: SHAP; SHapley Additive exPlanations; colorectal cancer; machine learning; survival prediction; time-to-event.
©Xulin Yang, Hang Qiu, Liya Wang, Xiaodong Wang. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 26.10.2023.
Conflict of interest statement
Conflicts of Interest: None declared.
Figures




Similar articles
-
Comparison of time-to-event machine learning models in predicting oral cavity cancer prognosis.Int J Med Inform. 2022 Jan;157:104635. doi: 10.1016/j.ijmedinf.2021.104635. Epub 2021 Nov 14. Int J Med Inform. 2022. PMID: 34800847
-
Development and validation of a deep learning-based survival prediction model for pediatric glioma patients: A retrospective study using the SEER database and Chinese data.Comput Biol Med. 2024 Nov;182:109185. doi: 10.1016/j.compbiomed.2024.109185. Epub 2024 Sep 27. Comput Biol Med. 2024. PMID: 39341114
-
Data-driven survival modeling for breast cancer prognostics: A comparative study with machine learning and traditional survival modeling methods.PLoS One. 2025 Apr 22;20(4):e0318167. doi: 10.1371/journal.pone.0318167. eCollection 2025. PLoS One. 2025. PMID: 40262081 Free PMC article.
-
A Systematic Review of Artificial Intelligence Models for Time-to-Event Outcome Applied in Cardiovascular Disease Risk Prediction.J Med Syst. 2024 Jul 19;48(1):68. doi: 10.1007/s10916-024-02087-7. J Med Syst. 2024. PMID: 39028429 Free PMC article.
-
Application of machine learning in predicting survival outcomes involving real-world data: a scoping review.BMC Med Res Methodol. 2023 Nov 13;23(1):268. doi: 10.1186/s12874-023-02078-1. BMC Med Res Methodol. 2023. PMID: 37957593 Free PMC article.
Cited by
-
Predicting In-Hospital Mortality in Intensive Care Unit Patients Using Causal SurvivalNet With Serum Chloride and Other Causal Factors: Cross-Country Study.J Med Internet Res. 2025 Jul 24;27:e70118. doi: 10.2196/70118. J Med Internet Res. 2025. PMID: 40706028 Free PMC article.
-
Comparing Random Survival Forests and Cox Regression for Nonresponders to Neoadjuvant Chemotherapy Among Patients With Breast Cancer: Multicenter Retrospective Cohort Study.J Med Internet Res. 2025 Apr 8;27:e69864. doi: 10.2196/69864. J Med Internet Res. 2025. PMID: 40198909 Free PMC article.
-
A Machine Learning Model for Predicting Prognosis in HCC Patients With Diabetes After TACE.J Hepatocell Carcinoma. 2025 Jan 21;12:77-91. doi: 10.2147/JHC.S496481. eCollection 2025. J Hepatocell Carcinoma. 2025. PMID: 39867262 Free PMC article.
-
Predicting Resistance and Survival of HCC Patients Post-HAIC: Based on Shapley Additive exPlanations and Machine Learning.J Hepatocell Carcinoma. 2025 May 31;12:1111-1128. doi: 10.2147/JHC.S523806. eCollection 2025. J Hepatocell Carcinoma. 2025. PMID: 40475092 Free PMC article.
-
Predicting 14-day readmission in middle-aged and elderly patients with pneumonia using emergency department data: a multicentre retrospective cohort study with a survival machine learning approach.BMJ Open. 2025 Jun 17;15(6):e102711. doi: 10.1136/bmjopen-2025-102711. BMJ Open. 2025. PMID: 40527577 Free PMC article.
References
-
- Xi Y, Xu P. Global colorectal cancer burden in 2020 and projections to 2040. Transl Oncol. 2021 Oct;14(10):101174. doi: 10.1016/j.tranon.2021.101174. https://linkinghub.elsevier.com/retrieve/pii/S1936-5233(21)00166-2 S1936-5233(21)00166-2 - DOI - PMC - PubMed
-
- Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, Bray F. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2021 Feb 04;:209–249. doi: 10.3322/caac.21660. doi: 10.3322/caac.21660. - DOI - DOI - PubMed
-
- Wang Y, Wang D, Ye X, Wang Y, Yin Y, Jin Y. A tree ensemble-based two-stage model for advanced-stage colorectal cancer survival prediction. Inf Sci. 2019 Feb;474:106–124. doi: 10.1016/j.ins.2018.09.046. - DOI
-
- Pourhoseingholi MA, Kheirian S, Zali MR. Comparison of basic and ensemble data mining methods in predicting 5-year survival of colorectal cancer patients. Acta Inform Med. 2017 Dec;25(4):254–258. doi: 10.5455/aim.2017.25.254-258. https://europepmc.org/abstract/MED/29284916 AIM-25-254 - DOI - PMC - PubMed
-
- Cox D. Regression models and life-tables. J R Stat Soc Series B Stat Methodol. 2018 Dec 05;34(2):187–202. doi: 10.1111/j.2517-6161.1972.tb00899.x. - DOI
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Medical