Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jul 14:15:1604386.
doi: 10.3389/fonc.2025.1604386. eCollection 2025.

Prediction of 5-year postoperative survival and analysis of key prognostic factors in stage III colorectal cancer patients using novel machine learning algorithms

Affiliations

Prediction of 5-year postoperative survival and analysis of key prognostic factors in stage III colorectal cancer patients using novel machine learning algorithms

Wei Zhang et al. Front Oncol. .

Abstract

Objective: This study explores the predictive value of clinical and socio-demographic characteristics for postoperative survival in stage III colorectal cancer (CRC) patients and develops a 5-year postoperative survival prediction model using machine learning algorithms.

Methods: Data from 13,855 stage III CRC patients who underwent surgery were extracted from the SEER database. Key variables, including marital status, gender, tumor location, histological type, T stage, chemotherapy status, age, tumor size, lymph node ratio, and others, were collected. Data were split into a 7:3 training-validation ratio. Optimal cutoff points for age, tumor diameter, and lymph node ratio were determined using X-tile software. Independent prognostic factors for postoperative survival in stage III colorectal cancer patients were identified through univariate and multivariate logistic regression as well as Lasso regression analyses. These factors were incorporated into machine learning models, including logistic regression, decision tree, LightGBM, and others. ROC curves, calibration curves, and decision curve analysis were used to assess model performance. External validation was performed using data from Shanxi Bethune Hospital.

Results: Optimal cutoff points were identified for age (65, 80 years), tumor size (29 mm, 74 mm), and lymph node ratio (0.11, 0.49). Both multivariate logistic regression and Lasso regression consistently identified marital status, tumor location, histological type, T stage, chemotherapy, radiotherapy, age, maximum tumor diameter, lymph node ratio, serum carcinoembryonic antigen (CEA) level, perineural invasion, and tumor differentiation as independent prognostic factors for 5-year postoperative survival in patients with stage III colorectal cancer (P < 0.05). The models showed excellent predictive performance with AUC values ranging from 0.766 to 0.791 in the validation cohort. Age, lymph node ratio, chemotherapy, and T stage were key factors. External validation confirmed model accuracy and clinical applicability.

Conclusion: This study developed and validated an interpretable machine learning model that predicts the 5-year postoperative survival of stage III CRC patients, offering potential for personalized treatment plans.

Keywords: SEER database; colorectal cancer; machine learning; prognostic model; survival prognosis.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

Figure 1
Figure 1
Patient selection flowchart from the SEER database.
Figure 2
Figure 2
Optimal cutoff values for age (A), maximum tumor diameter (B), and lymph node ratio (C) determined by X-tile analysis.
Figure 3
Figure 3
Lasso regression path plot.
Figure 4
Figure 4
Nomogram.
Figure 5
Figure 5
Performance evaluation of different machine learning models. (A) ROC curves of each model in the training set; (B) ROC curves of each model in the validation set; (C) Calibration curves of each model in the training set; (D) Calibration curves of each model in the validation set; (E) DCA curves of each model in the training set; (F) DCA curves of each model in the validation set.
Figure 6
Figure 6
Feature importance ranking based on the LightGBM model.
Figure 7
Figure 7
SHAP summary plot for the LightGBM model.
Figure 8
Figure 8
Interactive SHAP plot for surviving patients.
Figure 9
Figure 9
Interactive SHAP plot for deceased patients.
Figure 10
Figure 10
Kaplan-Meier survival curves for different clinical subgroups. (A) Survival curves by age group; (B) Survival curves by LNR levels; (C) Survival curves by chemotherapy status; (D) Survival curves by T stage.
Figure 11
Figure 11
(A) ROC curve of Logistic model in external validation cohort. (B) ROC curve of LightGBM model in external validation cohort. (C) Calibration curves of the Logistic models in the external validation cohort. (D) Calibration curves of the LightGBM models in the external validation cohort. (E) DCA of the Logistic models in the external validation cohort. (F) DCA of the LightGBM models in the external validation cohort.

Similar articles

References

    1. Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: Cancer J Clin. (2021) 71:209–49. doi: 10.3322/caac.21660, PMID: - DOI - PubMed
    1. Dekker E, Tanis PJ, Vleugels JLA, Kasi PM, Wallace MB. Colorectal cancer. Lancet (London England). (2019) 394:1467–80. doi: 10.1016/s0140-6736(19)32319-0, PMID: - DOI - PubMed
    1. Han B, Zheng R, Zeng H, Wang S, Sun K, Chen R, et al. Cancer incidence and mortality in China, 2022. J Natl Cancer Center. (2024) 4:47–53. doi: 10.1016/j.jncc.2024.01.006, PMID: - DOI - PMC - PubMed
    1. Dienstmann R, Mason MJ, Sinicrope FA, Phipps AI, Tejpar S, Nesbakken A, et al. Prediction of overall survival in stage II and III colon cancer beyond TNM system: a retrospective, pooled biomarker study. Ann oncology: Off J Eur Soc Med Oncol. (2017) 28:1023–31. doi: 10.1093/annonc/mdx052, PMID: - DOI - PMC - PubMed
    1. Cserni G, Chmielik E, Cserni B, Tot T. The new TNM-based staging of breast cancer. Virchows Archiv: an Int J pathology. (2018) 472:697–703. doi: 10.1007/s00428-018-2301-9, PMID: - DOI - PubMed

LinkOut - more resources