Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Apr 18;25(1):272.
doi: 10.1186/s12876-025-03841-y.

Optimizing prediction of metastasis among colorectal cancer patients using machine learning technology

Affiliations

Optimizing prediction of metastasis among colorectal cancer patients using machine learning technology

Raoof Nopour. BMC Gastroenterol. .

Abstract

Background and aim: Colorectal cancer is among the most prevalent and deadliest cancers. Early prediction of metastasis in patients with colorectal cancer is crucial in preventing it from the advanced stages and enhancing the prognosis among these patients. So far, previous studies have been conducted to predict metastasis in colorectal cancer patients using clinical data. The current research attempts to leverage a combination of demographic, lifestyle, nutritional, and clinical factors, such as diagnostic and therapeutical factors, to construct an ML model with more predictive insights and generalizability than previous ones.

Materials and methods: In this retrospective study, we used 1156 CRC patients referred to the Masoud internal clinic in Tehran City from January 2017 to December 2023. The chosen machine learning algorithms, including LightGBM, XG-Boost, random forest, artificial neural network, support vector machine, decision tree, K-Nearest Neighbor and logistic regression, were utilized to establish prediction models for predicting metastasis among colorectal cancer patients. We also assessed features based on the best-performing model to improve clinical usability. To show the generalizability of the established prediction model for predicting CRC metastasis, we leveraged the data of 115 CRC patients from Imam Khomeini Hospital in Sari City. We assessed the predictive ability of LightGBM as the best-performing model based on external data.

Results: The LightGBM model with a PPV of 97.32%, NPV of 84.67%, sensitivity of 83.14%, specificity of 93.14%, accuracy of 88.14%, F1-score of 87.51%, and an AU-ROC of 0.9 [Formula: see text]0.01 obtained satisfactory performance for prediction purposes on this topic. Factors including the history of IBD, family history of CRC, number of lymph nodes involved, fruit intake, and tumor size were considered as more strengthful predictors for metastasis in colorectal cancer and clinical usability. The external validation cohort showed a PPV of 0.8, NPV of 0.85, sensitivity of 0.78, specificity of 0.86, accuracy of 0.834, F1-score of 0.795, and AU-ROC of 0.77[Formula: see text]0.03, demonstrating satisfactory generalizability when leveraging external data from other clinical settings.

Conclusion: The current empirical results indicated that LighGBM has predictive competency that can be leveraged by physicians in clinical environments for early prediction of metastasis and enhanced prognosis in patients with colorectal cancer.

Clinical trial number: Not applicable.

Keywords: Colorectal cancer; Lifestyle factor; Machine learning; Metastasis; Prediction model; Prognosis.

PubMed Disclaimer

Conflict of interest statement

Declarations. Ethics approval and consent to participate: This study was approved by the ethics committee of Tehran University of Medical Sciences (Reg No: 1398-F-280-3/98-10-03). All methods were carried out in accordance with relevant guidelines and regulations. Informed consent was obtained from all subjects and/or their legal guardian(s). Consent for publication: Not applicable. Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
The model development steps flowchart
Fig. 2
Fig. 2
The preprocessing flowchart of the CRC cases
Fig. 3
Fig. 3
The ROC of ML models for predicting CRC metastasis: the cross line of the two x and y axes indicates a random classification
Fig. 4
Fig. 4
The calibration curve of ML models for prediction purposes
Fig. 5
Fig. 5
The DCA of all ML models for prediction purposes
Fig. 6
Fig. 6
The PFI of predictors regarding CRC metastasis
Fig. 7
Fig. 7
The SHAP value of ten top-ranking factors for predicting CRC metastasis. The vertical and horizontal vertices show the top-ranking features and their associated degree of importance, respectively. Each line shows features, and each dot indicates one case. Hot (red) and cold (blue) colors indicate the high and low values of features, respectively
Fig. 8
Fig. 8
The confusion matrix of LightGBM for external cases: positive and negative indicate metastatic and non-metastatic cases, respectively
Fig. 9
Fig. 9
The ROC for external validation assessment: the cross line of the two x and y axes indicates a random classification

Similar articles

References

    1. Gupta S. Screening for colorectal cancer. Hematology/Oncology Clin. 2022;36(3):393–414. - PMC - PubMed
    1. Roshandel G, Ghasemi-Kebria F, Malekzadeh R. Colorectal cancer: epidemiology, risk factors, and prevention. Cancers [Internet]. 2024; 16(8). - PMC - PubMed
    1. Lewandowska A, Rudzki G, Lewandowski T, Stryjkowska-Góra A, Rudzki S. Risk factors for the diagnosis of colorectal cancer. Cancer Control. 2022;29:10732748211056692. - PMC - PubMed
    1. Rawla P, Sunkara T, Barsouk A. Epidemiology of colorectal cancer: incidence, mortality, survival, and risk factors. Gastroenterol Review/Przegląd Gastroenterologiczny. 2019;14(2):89–103. - PMC - PubMed
    1. Sawicki T, Ruszkowska M, Danielewicz A, Niedźwiedzka E, Arłukowicz T, Przybyłowicz KE. A review of colorectal cancer in terms of epidemiology, risk factors, development, symptoms and diagnosis. Cancers [Internet]. 2021; 13(9). - PMC - PubMed

LinkOut - more resources