Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Nov 14;31(42):112180.
doi: 10.3748/wjg.v31.i42.112180.

Predicting chemotherapy-induced myelosuppression in colorectal cancer: An interpretable, machine learning-based nomogram

Affiliations

Predicting chemotherapy-induced myelosuppression in colorectal cancer: An interpretable, machine learning-based nomogram

Yu-Ming Liu et al. World J Gastroenterol. .

Abstract

Background: Colorectal cancer is a common digestive malignancy, and chemotherapy remains a cornerstone of treatment. Myelosuppression, a frequent hematologic toxicity, poses significant clinical challenges. However, no interpretable machine learning-based nomogram exists to predict chemotherapy-induced myelosuppression in colorectal cancer patients. This study aimed to develop and validate an interpretable clinic-machine learning nomogram integrating clinical predictors with multiple algorithms via a feature mapping algorithm. The model provides accurate risk estimation and clinical interpretability, supporting individualized prevention strategies and optimizing decision-making in patients receiving first-line chemotherapy.

Aim: To develop and validate an interpretable clinic-machine learning nomogram predicting chemotherapy-induced myelosuppression in colorectal cancer.

Methods: This retrospective study enrolled 855 colorectal cancer patients receiving first-line chemotherapy. Data were split into training (n = 612), validation (n = 153), and testing (n = 90) cohorts. Ten predictors were identified through least absolute shrinkage and selection operator, decision tree, random forest, and expert consensus. Ten machine learning algorithms were applied, with performance assessed by area under the receiver operating characteristic curve (AUC), area under the precision-recall curve (AUPRC), calibration, and decision curves. The optimal model was integrated into a clinic-machine learning nomogram via the feature mapping algorithm, which was internally validated for predictive accuracy and clinical utility.

Results: A total of 855 colorectal cancer patients were enrolled, with 765 cases (April 2020 to December 2023) used for model training and validation, and 90 cases (January 2024 to July 2024) for internal testing. Baseline clinical features did not differ significantly between training and validation cohorts (P > 0.05). Ten predictors were identified through integrated feature selection and expert consensus, including age, body surface area, body mass index, tumor position, albumin, carcinoembryonic antigen, carbohydrate antigen (CA) 19-9, CA125, chemotherapy regimen, and chemotherapy cycles. Among ten machine learning algorithms, extreme gradient boosting achieved the best validation performance (AUC = 0.97, AUPRC = 0.92, sensitivity = 0.79, specificity = 0.92, accuracy = 0.88). Logistic regression confirmed extra trees and random forest as independent predictors, which were incorporated into a clinic-machine learning nomogram. The clinic-machine learning nomogram demonstrated superior discrimination (AUC = 0.96, AUPRC = 0.93, accuracy = 0.90, specificity = 0.95), good calibration, and greater net clinical benefit across a wide probability range (10%-90%). Internal testing further confirmed its robustness and generalizability (AUC = 0.95).

Conclusion: The clinic-machine learning nomogram accurately predicts chemotherapy-induced myelosuppression in colorectal cancer, providing interpretability and clinical utility to support individualized risk assessment and treatment decision-making.

Keywords: Chemotherapy-induced myelosuppression; Colorectal cancer; Machine learning; Nomogram; Risk factors.

PubMed Disclaimer

Conflict of interest statement

Conflict-of-interest statement: The authors declare that they have no conflict of interest.

Figures

Figure 1
Figure 1
Flowchart of the study protocol. CRC: Colorectal cancer; T: Tumor; N: Node; M: Metastasis; HM: Hepatic metastasis; LM: Lung metastasis; PM: Peritoneal metastasis; BSA: Body surface area; BMI: Body mass index; ALB: Albumin; CEA: Carcinoembryonic antigen; CA: Carbohydrate antigen; LASSO: Least absolute shrinkage and selection operator; ML: Machine learning; LR: Logistic regression; DT: Decision trees; RF: Random forest; XGBoost: Extreme gradient boosting; SVM: Support vector machines; GBM: Gradient boosting machines; KNN: K-Nearest neighbors; ANN: Artificial neural network; ET: Extreme trees; ROC: Receiver operating characteristic; AUC: Area under the curve; PR: Precision-recall; AUPRC: Area under the precision-recall curve; PPV: Positive predictive value; NPV: Negative predictive value.
Figure 2
Figure 2
Sample size calculation flowchart. rMPSE: Root mean squared prediction error; MPSE: Mean squared prediction error; EPV: Events per variable. In formula: Ø: Events fraction; δ: A margin of error, generally recommend < 0.05; P: Number of candidate predictors; S: Shrinkage factor; R2cs: A (conservative) value for the anticipated model performance is required, as defined by the Cox-Snell R squared statistic; MAPE: The mean absolute prediction error; n: The sample size.
Figure 3
Figure 3
Candidate predictor screening using least absolute shrinkage and selection operator. A: Path diagram of least absolute shrinkage and selection operator (LASSO) regression coefficients for candidate predictors; B: Cross-validation curves for LASSO. MSE: Mean squared error.
Figure 4
Figure 4
Mean importance of candidate predictors. A: Random forest algorithm; B: Decision trees algorithm. BSA: Body surface area; BMI: Body mass index; T: Tumor; N: Node; M: Metastasis; HM: Hepatic metastasis; LM: Lung metastasis; PM: Peritoneal metastasis; ALB: Albumin; CEA: Carcinoembryonic antigen; CA: Carbohydrate antigen.
Figure 5
Figure 5
10-fold cross-validation plot.
Figure 6
Figure 6
Curves for 10 machine learnings. A and B: Receiver operating characteristic curves of training set (A) and validation set (B); C and D: Precision-recall curves of training set (C) and validation set (D). LR: Logistic regression; DT: Decision trees; RF: Random forest; XGBoost: Extreme gradient boosting; SVM: Support vector machines; GBM: Gradient boosting machines; KNN: K-Nearest neighbors; ANN: Artificial neural network; ET: Extreme trees; AUC: Area under the curve; AP: Average precision.
Figure 7
Figure 7
The nomogram for predicting myelosuppression induced by first-line chemotherapy in colorectal cancer. A: Clinic-machine learning; B: Clinic. BSA: Body surface area; BMI: Body mass index; ALB: Albumin; CEA: Carcinoembryonic antigen; CA: Carbohydrate antigen.
Figure 8
Figure 8
Receiver operating characteristic curves for extreme gradient boosting, clinic nomogram and clinic-machine learning nomogram. A: Training set; B: Validation set. XGBoost: Extreme gradient boosting; AUC: Area under the curve; ML: Machine learning.
Figure 9
Figure 9
Precision-recall curve for extreme gradient boosting, clinic nomogram and clinic-machine learning nomogram. A: Training set; B: Validation set. XGBoost: Extreme gradient boosting; AP: Average precision; ML: Machine learning.
Figure 10
Figure 10
Calibration curves for extreme gradient boosting, clinic nomogram and clinic-machine learning nomogram. A: Training set; B: Validation set. XGBoost: Extreme gradient boosting; ML: Machine learning.
Figure 11
Figure 11
Decision curve analysis for extreme gradient boosting, clinic nomogram and clinic-machine learning nomogram. A: Training set; B: Validation set. XGBoost: Extreme gradient boosting; ML: Machine learning.
Figure 12
Figure 12
Receiver operating characteristic, precision-recall curve, calibration curves and decision curve analysis for the optimal prediction model clinic-machine learning nomogram (testing set). A: Receiver operating characteristic curve; B: Precision-recall curve; C: Calibration curve; D: Decision curve analysis. AUC: Area under the curve; ML: Machine learning; AUPRC: Area under the precision-recall curve; CI: Confidence interval.

References

    1. Bray F, Laversanne M, Sung H, Ferlay J, Siegel RL, Soerjomataram I, Jemal A. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2024;74:229–263. - PubMed
    1. Wang F, Chen G, Zhang Z, Yuan Y, Wang Y, Gao YH, Sheng W, Wang Z, Li X, Yuan X, Cai S, Ren L, Liu Y, Xu J, Zhang Y, Liang H, Wang X, Zhou A, Ying J, Li G, Cai M, Ji G, Li T, Wang J, Hu H, Nan K, Wang L, Zhang S, Li J, Xu RH. The Chinese Society of Clinical Oncology (CSCO): Clinical guidelines for the diagnosis and treatment of colorectal cancer, 2024 update. Cancer Commun (Lond) 2025;45:332–379. - PMC - PubMed
    1. Aoullay Z, Slaoui M, Razine R, Er-Raki A, Meddah B, Cherrah Y. Therapeutic Characteristics, Chemotherapy-Related Toxicities and Survivorship in Colorectal Cancer Patients. Ethiop J Health Sci. 2020;30:65–74. - PMC - PubMed
    1. Barreto JN, McCullough KB, Ice LL, Smith JA. Antineoplastic agents and the associated myelosuppressive effects: a review. J Pharm Pract. 2014;27:440–446. - PubMed
    1. Kuter DJ. Managing thrombocytopenia associated with cancer chemotherapy. Oncology (Williston Park) 2015;29:282–294. - PubMed

Publication types

Substances