Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Multicenter Study
. 2025 Jun 22;23(1):695.
doi: 10.1186/s12967-025-06663-4.

OncoE25: an AI model for predicting postoperative prognosis in early-onset stage I-III colon and rectal cancer-a population-based study using SEER with dual-center cohort validation

Affiliations
Multicenter Study

OncoE25: an AI model for predicting postoperative prognosis in early-onset stage I-III colon and rectal cancer-a population-based study using SEER with dual-center cohort validation

Luyun Yuan et al. J Transl Med. .

Abstract

Background: Although CRC incidence is declining overall, early-onset colorectal cancers are increasing. No prognostic models currently exist for predicting postoperative survival in Stage I-III early-onset colon or rectal cancer. Such tools are urgently needed to enable individualized risk assessment.

Methods: We identified patients with early onset (EO) and late-onset (LO) colon or rectal cancer from the SEER database and randomly split them into training and test cohorts (7:3). External cohorts of early-onset colon and rectal cancer were collected from two Chinese hospitals. After LASSO-Cox feature selection, six models-RSF, LASSO-Cox, S-SVM, XGBSE, GBSA, and DeepSurv-were developed to predict cancer-specific survival (CSS). Performance was assessed using the C-index, Brier score, time-dependent AUC, calibration, and decision curves. SHAP was used for model interpretation. A risk stratification system and an online calculator were constructed based on the best-performing model.

Results: A total of 3,997 EO colon cancer, 2,016 EO rectal cancer, 30,621 LO colon cancer, and 8,667 LO rectal cancer patients from SEER, along with 205 EO colon cancer and 153 EO rectal cancer patients from Chinese institutions, were included in the study. Based on comprehensive evaluation across multiple datasets and metrics, the RSF model demonstrated the best and most stable performance, outperforming not only other machine learning models but also the traditional TNM staging system. In EO colon cancer, the RSF model achieved C-indices of 0.738 (test cohort) and 0.829 (external validation), mean AUCs of 0.765 and 0.889, and integrated Brier scores of 0.084 and 0.077, respectively. For EO rectal cancer, C-indices were 0.728 and 0.722, mean AUCs were 0.753 and 0.900, and integrated Brier scores were 0.106 and 0.095, respectively. The calibration and decision curves further confirmed the RSF model's good calibration and clinical net benefit. The RSF model also showed robust performance in LOCRC cohorts. SHAP analysis was used to quantify the marginal contribution of each predictor within each cancer subtype. Based on the RSF model, we developed a CSS-based risk stratification framework and deployed an online prediction tool.

Conclusions: In summary, we selected the RSF model for its outstanding predictive performance, naming it OncoE25, to support personalized health management for EO colon and rectal patients.

Keywords: Artificial intelligence; Early-onset colon cancer; Early-onset rectal cancer; Machine learning; SEER; Systemic therapy.

PubMed Disclaimer

Conflict of interest statement

Declarations. Ethics approval and consent to participate: The study involving human participants was conducted in accordance with the principles outlined in the Declaration of Helsinki. Since the SEER database contains de-identified public health data with all personally identifiable elements removed, ethical review is not required. The original human data used in this study has been approved by the Ethics Committee of Putuo Hospital, Shanghai University of Traditional Chinese Medicine, Shanghai, China on 18 October 2024 (Ethical Committee N° PTEC-A-2024–61 (S) 1). The study obtained written and signed consent from the patient or their guardian/next of kin, if contact was possible. For patients who could not be reached, the ethical committee granted a waiver of informed consent. This written consent has been approved by the ethics committee. Prior to analysis, patient information was de-identified. Consent for publication: All authors confirm that the work described has not been published before and is not under consideration for publication elsewhere. All authors have seen and gave consent to the publication of this study. The publication of this work has been approved by the responsible authorities at the institutions where the work is carried out. Competing interests: The authors declare that they have no competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Figures

Fig. 1
Fig. 1
The patient screening process of the study
Fig. 2
Fig. 2
The workflow of the main steps of the study
Fig. 3
Fig. 3
Kaplan–Meier curves for CSS in early- vs. late-onset colon cancer (A) and rectal cancer (B), analyzed using the log-rank test. P < 0.05 indicates statistical significance; P < 0.001 indicates strong significance
Fig. 4
Fig. 4
Kaplan–Meier curves for CSS in early-onset colon cancer (A) and early-onset rectal cancer (B) across the training set, test set, and external validation set, respectively. Log-rank test was used for comparison; P < 0.05 indicates statistical significance, and P < 0.001 indicates strong significance
Fig. 5
Fig. 5
Time-dependent AUC curves for early-onset colon cancer in the training (A), test (B), and external validation (C) cohorts, and for early-onset rectal cancer in the training (D), test (E), and external validation (F) cohorts
Fig. 6
Fig. 6
Calibration curves at 12, 36, and 60 months for early-onset colon cancer in the training (A–C), test (DF), and external validation (GI) cohorts, and for early-onset rectal cancer in the training (JL), test (MO), and external validation (PR) cohorts
Fig. 7
Fig. 7
DCA at 12, 36, and 60 months for early-onset colon cancer in the training (AC), test (DF), and external validation (GI) cohorts, and for early-onset rectal cancer in the training (JL), test (MO), and external validation (PR) cohorts
Fig. 8
Fig. 8
SHAP summary plots displaying the distribution of Shapley values for variables ranked by their mean absolute SHAP values in the OncoE25 model: (A) early-onset rectal cancer, (B) early-onset colon cancer, (C) late-onset colon cancer, and (D) late-onset rectal cancer
Fig. 9
Fig. 9
Kaplan–Meier survival curves for CSS in early-onset colon cancer (training (A), test (B), and external validation (C)) and early-onset rectal cancer (training (D), test (E), and external validation (F)), stratified into low-, medium-, and high-risk groups based on the OncoE25 model. Log-rank test was used for comparison; P < 0.05 indicates statistical significance, and P < 0.001 indicates strong significance
Fig. 10
Fig. 10
Cumulative cancer-specific mortality risk curves for four representative patients as predicted by the OncoE25 model. (A) Patient A; (B) Patient B; (C) Patient C; (D) Patient D

Similar articles

Cited by

References

    1. Siegel RL, Miller KD, Jemal A. Cancer statistics, 2019. CA: A Cancer J Clin. 2019;69(1):7–34. - PubMed
    1. Siegel RL, Fedewa SA, Anderson WF, et al. Colorectal cancer incidence patterns in the United States, 1974–2013. JNCI: J Natl Cancer Inst. 2017;109(8):djw322. - PMC - PubMed
    1. Saad El Din K, Loree JM, Sayre EC, et al. Trends in the epidemiology of young-onset colorectal cancer: a worldwide systematic review. BMC Cancer. 2020;20(1):288. 10.1186/s12885-020-06766-9. - PMC - PubMed
    1. Araghi M, Soerjomataram I, Bardot A, et al. Changes in colorectal cancer incidence in seven high-income countries: a population-based study. Lancet Gastroenterol Hepatol. 2019;4(7):511–8. 10.1016/s2468-1253(19)30147-5. - PMC - PubMed
    1. He JH, Cao C, Ding Y, et al. A nomogram model for predicting distant metastasis of newly diagnosed colorectal cancer based on clinical features. Front Oncol. 2023;13:1186298. 10.3389/fonc.2023.1186298. - PMC - PubMed

LinkOut - more resources