Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Apr 15;15(1):12864.
doi: 10.1038/s41598-025-95385-0.

Development and validation of survival prediction tools in early and late onset colorectal cancer patients

Affiliations

Development and validation of survival prediction tools in early and late onset colorectal cancer patients

Wanling Li et al. Sci Rep. .

Abstract

This study aims to develop online calculators using machine learning models to predict survival probabilities for early- and late-onset colorectal cancer (EOCRC and LOCRC) over a 1- to 8-year period. We extracted data on 117,965 CRC patients from the published database spanning 2010 to 2021, divided into training and internal testing datasets. The data of 200 CRC patients from Chongqing Hospital of Jiangsu Province Hospital was used as the external testing dataset. We conducted univariate and multivariate regression analyses on the training dataset to identify key survival factors and develop predictive machine learning models. The models were evaluated using internal and external testing datasets based on AUC, accuracy, precision, recall, and F1 score. Web-based calculators were subsequently developed to predict survival curves for EOCRC and LOCRC patients under different treatment strategies. In the multivariate Cox regression analysis, 16 and 18 variables were independently significant survival factors for EOCRC and LOCRC, respectively. In the EOCRC group, the machine learning models achieved AUC values of 0.880 and 0.804 in the internal and external testing cohorts. For the LOCRC group, the machine learning models exhibited AUC values of 0.857 and 0.823 in the internal and external testing cohorts. The online calculators, powered by trained machine learning models, are accessible at https://eocrc-surv.streamlit.app/ and https://locrc-surv.streamlit.app/ . These tools estimate survival probabilities for EOCRC and LOCRC patients under various treatment strategies and display the corresponding survival curves post-treatment over the 1- to 8-year period. This study successfully developed online calculators using machine learning algorithms to predict 1- to 8-year survival probabilities for EOCRC and LOCRC patients under various treatment strategies.

Keywords: Colorectal cancer; Machine learning; Online calculators; Survival.

PubMed Disclaimer

Conflict of interest statement

Declarations. Competing interests: The authors declare no competing interests. Ethics statement: For data from the SEER database, ethical review and approval were not required since the SEER database is publicly available and de-identified. For data from Chongqing Hospital of Jiangsu Province Hospital (The People’s Hospital of Qijiang District), ethical approval was obtained from the Ethical Review Committee of Chongqing Hospital of Jiangsu Province Hospital (The People’s Hospital of Qijiang District) with the approval number of 20240005 prior to commencing this study. The requirement for informed consent for retrospective study was waived by the Ethical Review Committee of Chongqing Hospital of Jiangsu Province Hospital (The People’s Hospital of Qijiang District) because of the observational design and the anonymity of the patient’s identity.

Figures

Fig. 1
Fig. 1
The workflow and sample selection of this study.
Fig. 2
Fig. 2
Cox analyses of overall survival in EOCRC training set by forest plots. The forest plot shows univariate cox analyses of EOCRC (A) and multivariate cox analyses of EOCRC (B). The hazard ratio and its confidence interval were shown on these forest plots. If the confidence interval does not cross 1, the effect of the variable is considered significant. CMS combined metastasis status, EOCRC early-onset colorectal cancer.
Fig. 3
Fig. 3
Cox analyses of overall survival in LOCRC training sets by forest plots. The forest plot shows univariate cox analyses of LOCRC (A) and multivariate cox analyses of LOCRC (B). CMS combined metastasis status, LOCRC late-onset colorectal cancer.
Fig. 4
Fig. 4
Predicting the prognosis of EOCRC patients at various time points (1 to 8 years) in the internal testing cohort. (A) The number of alive/dead patients at different time points for EOCRC. Performance metrics of machine learning models evaluated in the internal testing cohort, including (B) AUC, (C) Accuracy, (D) F1 Score, (E) Precision, and (F) Recall. RF random forest (RF), XGB extreme gradient boosting, GB gradient boosting.
Fig. 5
Fig. 5
Predicting the prognosis of LOCRC patients at various time points (1 to 8 years) in the internal testing cohort. (A) The number of alive/dead patients at different time points for LOCRC. Performance metrics of machine learning models evaluated in the internal testing cohort, including (B) AUC, (C) Accuracy, (D) F1 Score, (E) Precision, and (F) Recall.
Fig. 6
Fig. 6
Validation of models for EOCRC by external testing cohort. (A) Distribution of the alive/dead patient ratio at different time points. Performance metrics of machine learning models evaluated in the external testing cohort, including (B) AUC, (C) Accuracy, (D) F1 Score, (E) Precision, and (F) Recall.
Fig. 7
Fig. 7
Validation of models for LOCRC by external testing cohort. Performance metrics of machine learning models evaluated in the external testing cohort, including (A) AUC, (B) Accuracy, (C) F1 Score, (D) Precision, and (E) Recall.
Fig. 8
Fig. 8
Interactive Manual Interface for Predicting Survival Probabilities in EOCRC and LOCRC. (A) Kaplan–Meier curves showing survival outcomes for high- and low-risk groups in the internal testing cohort for EOCRC. (B) Kaplan–Meier curves for high- and low-risk groups in the internal testing cohort for LOCRC. (C) A demonstration of the online survival probability calculator for EOCRC. (D) A demonstration of the online survival probability calculator for LOCRC. Users simply need to input the relevant clinical variables and click the “Predict” button. The calculator will then generate survival probabilities across different treatment options, ranging from 1 to 8 years. The output includes a survival curve plot and a data table showing the predicted probabilities. To ensure the calculator is active, users may need to click the “Wake Up” button before use.

Similar articles

References

    1. Ciardiello, F. et al. Clinical management of metastatic colorectal cancer in the era of precision medicine. CA Cancer J. Clin.72, 372–401. 10.3322/caac.21728 (2022). - PubMed
    1. Boatman, S., Nalluri, H. & Gaertner, W. B. Colon and rectal cancer management in low-resource settings. Clin. Colon. Rectal. Surg.35, 402–409. 10.1055/s-0042-1746189 (2022). - PMC - PubMed
    1. Rahiminejad, S. et al. Modular and mechanistic changes across stages of colorectal cancer. BMC Cancer22, 436. 10.1186/s12885-022-09479-3 (2022). - PMC - PubMed
    1. Brenner, H. & Chen, C. The colorectal cancer epidemic: challenges and opportunities for primary, secondary and tertiary prevention. Br. J. Cancer.119, 785–792. 10.1038/s41416-018-0264-x (2018). - PMC - PubMed
    1. Siegel, R. L. et al. Colorectal cancer statistics, 2020. CA Cancer J. Clin.70, 145–164. 10.3322/caac.21601 (2020). - PubMed

LinkOut - more resources