The Application and Comparison of Machine Learning Models for the Prediction of Breast Cancer Prognosis: Retrospective Cohort Study
- PMID: 35179504
- PMCID: PMC8900909
- DOI: 10.2196/33440
The Application and Comparison of Machine Learning Models for the Prediction of Breast Cancer Prognosis: Retrospective Cohort Study
Abstract
Background: Over the recent years, machine learning methods have been increasingly explored in cancer prognosis because of the appearance of improved machine learning algorithms. These algorithms can use censored data for modeling, such as support vector machines for survival analysis and random survival forest (RSF). However, it is still debated whether traditional (Cox proportional hazard regression) or machine learning-based prognostic models have better predictive performance.
Objective: This study aimed to compare the performance of breast cancer prognostic prediction models based on machine learning and Cox regression.
Methods: This retrospective cohort study included all patients diagnosed with breast cancer and subsequently hospitalized in Fudan University Shanghai Cancer Center between January 1, 2008, and December 31, 2016. After all exclusions, a total of 22,176 cases with 21 features were eligible for model development. The data set was randomly split into a training set (15,523 cases, 70%) and a test set (6653 cases, 30%) for developing 4 models and predicting the overall survival of patients diagnosed with breast cancer. The discriminative ability of models was evaluated by the concordance index (C-index), the time-dependent area under the curve, and D-index; the calibration ability of models was evaluated by the Brier score.
Results: The RSF model revealed the best discriminative performance among the 4 models with 3-year, 5-year, and 10-year time-dependent area under the curve of 0.857, 0.838, and 0.781, a D-index of 7.643 (95% CI 6.542, 8.930) and a C-index of 0.827 (95% CI 0.809, 0.845). The statistical difference of the C-index was tested, and the RSF model significantly outperformed the Cox-EN (elastic net) model (C-index 0.816, 95% CI 0.796, 0.836; P=.01), the Cox model (C-index 0.814, 95% CI 0.794, 0.835; P=.003), and the support vector machine model (C-index 0.812, 95% CI 0.793, 0.832; P<.001). The 4 models' 3-year, 5-year, and 10-year Brier scores were very close, ranging from 0.027 to 0.094 and less than 0.1, which meant all models had good calibration. In the context of feature importance, elastic net and RSF both indicated that TNM staging, neoadjuvant therapy, number of lymph node metastases, age, and tumor diameter were the top 5 important features for predicting the prognosis of breast cancer. A final online tool was developed to predict the overall survival of patients with breast cancer.
Conclusions: The RSF model slightly outperformed the other models on discriminative ability, revealing the potential of the RSF method as an effective approach to building prognostic prediction models in the context of survival analysis.
Keywords: breast cancer; machine learning; medical informatics; prediction models; random survival forest; support vector machine; survival analysis.
©Jialong Xiao, Miao Mo, Zezhou Wang, Changming Zhou, Jie Shen, Jing Yuan, Yulian He, Ying Zheng. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 18.02.2022.
Conflict of interest statement
Conflicts of Interest: None declared.
Figures






Similar articles
-
Predicting Colorectal Cancer Survival Using Time-to-Event Machine Learning: Retrospective Cohort Study.J Med Internet Res. 2023 Oct 26;25:e44417. doi: 10.2196/44417. J Med Internet Res. 2023. PMID: 37883174 Free PMC article.
-
Development and validation of machine learning models for predicting prognosis and guiding individualized postoperative chemotherapy: A real-world study of distal cholangiocarcinoma.Front Oncol. 2023 Mar 15;13:1106029. doi: 10.3389/fonc.2023.1106029. eCollection 2023. Front Oncol. 2023. PMID: 37007095 Free PMC article.
-
[Application value of machine learning algorithms for predicting recurrence after resection of early-stage hepatocellular carcinoma].Zhonghua Wai Ke Za Zhi. 2021 Aug 1;59(8):679-685. doi: 10.3760/cma.j.cn112139-20201026-00768. Online ahead of print. Zhonghua Wai Ke Za Zhi. 2021. PMID: 34192861 Chinese.
-
Cervical cancer survival prediction by machine learning algorithms: a systematic review.BMC Cancer. 2023 Apr 13;23(1):341. doi: 10.1186/s12885-023-10808-3. BMC Cancer. 2023. PMID: 37055741 Free PMC article.
-
Combined application of inflammation-related biomarkers to predict postoperative complications of rectal cancer patients: a retrospective study by machine learning analysis.Langenbecks Arch Surg. 2023 Oct 13;408(1):400. doi: 10.1007/s00423-023-03127-5. Langenbecks Arch Surg. 2023. PMID: 37831218 Review.
Cited by
-
Artificial Intelligence and Breast Cancer Management: From Data to the Clinic.Cancer Innov. 2025 Feb 20;4(2):e159. doi: 10.1002/cai2.159. eCollection 2025 Apr. Cancer Innov. 2025. PMID: 39981497 Free PMC article. Review.
-
Predicting Colorectal Cancer Survival Using Time-to-Event Machine Learning: Retrospective Cohort Study.J Med Internet Res. 2023 Oct 26;25:e44417. doi: 10.2196/44417. J Med Internet Res. 2023. PMID: 37883174 Free PMC article.
-
Combination of urinary biomarkers and machine-learning models provided a higher predictive accuracy to predict long-term treatment outcomes of patients with interstitial cystitis/bladder pain syndrome.World J Urol. 2024 Mar 20;42(1):173. doi: 10.1007/s00345-024-04843-3. World J Urol. 2024. PMID: 38507059
-
Deep-learning-based 3D super-resolution CT radiomics model: Predict the possibility of the micropapillary/solid component of lung adenocarcinoma.Heliyon. 2024 Jul 5;10(13):e34163. doi: 10.1016/j.heliyon.2024.e34163. eCollection 2024 Jul 15. Heliyon. 2024. PMID: 39071606 Free PMC article.
-
Oncologic Applications of Artificial Intelligence and Deep Learning Methods in CT Spine Imaging-A Systematic Review.Cancers (Basel). 2024 Aug 28;16(17):2988. doi: 10.3390/cancers16172988. Cancers (Basel). 2024. PMID: 39272846 Free PMC article. Review.
References
-
- Candido Dos Reis FJ, Wishart GC, Dicks EM, Greenberg D, Rashbass J, Schmidt MK, van den Broek AJ, Ellis IO, Green A, Rakha E, Maishman T, Eccles DM, Pharoah PDP. An updated PREDICT breast cancer prognostication and treatment benefit prediction model with independent validation. Breast Cancer Res. 2017 May 22;19(1):58. doi: 10.1186/s13058-017-0852-3. https://breast-cancer-research.biomedcentral.com/articles/10.1186/s13058... 10.1186/s13058-017-0852-3 - DOI - DOI - PMC - PubMed
-
- Bhoo-Pathy N, Yip C, Hartman M, Saxena N, Taib NA, Ho G, Looi L, Bulgiba AM, van der Graaf Y, Verkooijen HM. Adjuvant! Online is overoptimistic in predicting survival of Asian breast cancer patients. Eur J Cancer. 2012 May;48(7):982–9. doi: 10.1016/j.ejca.2012.01.034. https://linkinghub.elsevier.com/retrieve/pii/S0959-8049(12)00098-6 S0959-8049(12)00098-6 - DOI - PubMed
-
- Wong H, Subramaniam S, Alias Z, Taib NA, Ho G, Ng C, Yip C, Verkooijen HM, Hartman M, Bhoo-Pathy N. The predictive accuracy of PREDICT: a personalized decision-making tool for Southeast Asian women with breast cancer. Medicine (Baltimore) 2015 Feb;94(8):e593. doi: 10.1097/MD.0000000000000593. doi: 10.1097/MD.0000000000000593.00005792-201502040-00019 - DOI - DOI - PMC - PubMed
-
- Zaguirre K, Kai M, Kubo M, Yamada M, Kurata K, Kawaji H, Kaneshiro K, Harada Y, Hayashi S, Shimazaki A, Morisaki T, Mori H, Oda Y, Chen S, Moriyama T, Shimizu S, Nakamura M. Validity of the prognostication tool PREDICT version 2.2 in Japanese breast cancer patients. Cancer Med. 2021 Mar;10(5):1605–1613. doi: 10.1002/cam4.3713. doi: 10.1002/cam4.3713. - DOI - DOI - PMC - PubMed
LinkOut - more resources
Full Text Sources
Research Materials