OncoE25: an AI model for predicting postoperative prognosis in early-onset stage I-III colon and rectal cancer-a population-based study using SEER with dual-center cohort validation
- PMID: 40545536
- PMCID: PMC12183820
- DOI: 10.1186/s12967-025-06663-4
OncoE25: an AI model for predicting postoperative prognosis in early-onset stage I-III colon and rectal cancer-a population-based study using SEER with dual-center cohort validation
Abstract
Background: Although CRC incidence is declining overall, early-onset colorectal cancers are increasing. No prognostic models currently exist for predicting postoperative survival in Stage I-III early-onset colon or rectal cancer. Such tools are urgently needed to enable individualized risk assessment.
Methods: We identified patients with early onset (EO) and late-onset (LO) colon or rectal cancer from the SEER database and randomly split them into training and test cohorts (7:3). External cohorts of early-onset colon and rectal cancer were collected from two Chinese hospitals. After LASSO-Cox feature selection, six models-RSF, LASSO-Cox, S-SVM, XGBSE, GBSA, and DeepSurv-were developed to predict cancer-specific survival (CSS). Performance was assessed using the C-index, Brier score, time-dependent AUC, calibration, and decision curves. SHAP was used for model interpretation. A risk stratification system and an online calculator were constructed based on the best-performing model.
Results: A total of 3,997 EO colon cancer, 2,016 EO rectal cancer, 30,621 LO colon cancer, and 8,667 LO rectal cancer patients from SEER, along with 205 EO colon cancer and 153 EO rectal cancer patients from Chinese institutions, were included in the study. Based on comprehensive evaluation across multiple datasets and metrics, the RSF model demonstrated the best and most stable performance, outperforming not only other machine learning models but also the traditional TNM staging system. In EO colon cancer, the RSF model achieved C-indices of 0.738 (test cohort) and 0.829 (external validation), mean AUCs of 0.765 and 0.889, and integrated Brier scores of 0.084 and 0.077, respectively. For EO rectal cancer, C-indices were 0.728 and 0.722, mean AUCs were 0.753 and 0.900, and integrated Brier scores were 0.106 and 0.095, respectively. The calibration and decision curves further confirmed the RSF model's good calibration and clinical net benefit. The RSF model also showed robust performance in LOCRC cohorts. SHAP analysis was used to quantify the marginal contribution of each predictor within each cancer subtype. Based on the RSF model, we developed a CSS-based risk stratification framework and deployed an online prediction tool.
Conclusions: In summary, we selected the RSF model for its outstanding predictive performance, naming it OncoE25, to support personalized health management for EO colon and rectal patients.
Keywords: Artificial intelligence; Early-onset colon cancer; Early-onset rectal cancer; Machine learning; SEER; Systemic therapy.
© 2025. The Author(s).
Conflict of interest statement
Declarations. Ethics approval and consent to participate: The study involving human participants was conducted in accordance with the principles outlined in the Declaration of Helsinki. Since the SEER database contains de-identified public health data with all personally identifiable elements removed, ethical review is not required. The original human data used in this study has been approved by the Ethics Committee of Putuo Hospital, Shanghai University of Traditional Chinese Medicine, Shanghai, China on 18 October 2024 (Ethical Committee N° PTEC-A-2024–61 (S) 1). The study obtained written and signed consent from the patient or their guardian/next of kin, if contact was possible. For patients who could not be reached, the ethical committee granted a waiver of informed consent. This written consent has been approved by the ethics committee. Prior to analysis, patient information was de-identified. Consent for publication: All authors confirm that the work described has not been published before and is not under consideration for publication elsewhere. All authors have seen and gave consent to the publication of this study. The publication of this work has been approved by the responsible authorities at the institutions where the work is carried out. Competing interests: The authors declare that they have no competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Figures










Similar articles
-
Competing risk and random survival forest models for predicting survival in post-resection elderly stage I-III colorectal cancer patients.Sci Rep. 2025 Jul 7;15(1):24269. doi: 10.1038/s41598-025-05824-1. Sci Rep. 2025. PMID: 40624131 Free PMC article.
-
Comparison of Random Survival Forest Based-Overall Survival With Deep Learning and Cox Proportional Hazard Models in HER-2-Positive HR-Negative Breast Cancer.Cancer Rep (Hoboken). 2025 Jul;8(7):e70262. doi: 10.1002/cnr2.70262. Cancer Rep (Hoboken). 2025. PMID: 40624807 Free PMC article.
-
Cost-effectiveness of using prognostic information to select women with breast cancer for adjuvant systemic therapy.Health Technol Assess. 2006 Sep;10(34):iii-iv, ix-xi, 1-204. doi: 10.3310/hta10340. Health Technol Assess. 2006. PMID: 16959170
-
Development and validation of a Log odds of negative lymph nodes/T stage ratio-based prognostic model for gastric cancer.Front Oncol. 2025 Jun 3;15:1554270. doi: 10.3389/fonc.2025.1554270. eCollection 2025. Front Oncol. 2025. PMID: 40530015 Free PMC article.
-
Postoperative adjuvant chemotherapy in rectal cancer operated for cure.Cochrane Database Syst Rev. 2012 Mar 14;2012(3):CD004078. doi: 10.1002/14651858.CD004078.pub2. Cochrane Database Syst Rev. 2012. PMID: 22419291 Free PMC article.
Cited by
-
From Model to Bedside: What Kind of OSA Risk Prediction Tools Do We Need More of? [Letter].Nat Sci Sleep. 2025 Jul 12;17:1601-1602. doi: 10.2147/NSS.S549821. eCollection 2025. Nat Sci Sleep. 2025. PMID: 40672998 Free PMC article. No abstract available.
References
-
- Siegel RL, Miller KD, Jemal A. Cancer statistics, 2019. CA: A Cancer J Clin. 2019;69(1):7–34. - PubMed
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources