The Application and Comparison of Machine Learning Models for the Prediction of Breast Cancer Prognosis: Retrospective Cohort Study

doi:10.2196/33440

. 2022 Feb 18;10(2):e33440.

doi: 10.2196/33440.

The Application and Comparison of Machine Learning Models for the Prediction of Breast Cancer Prognosis: Retrospective Cohort Study

Jialong Xiao^{1

2

3}, Miao Mo^{2

3}, Zezhou Wang^{2

3}, Changming Zhou^{2

3}, Jie Shen^{2

3}, Jing Yuan^{2

3}, Yulian He^{1

2}, Ying Zheng^{2

3

4}

Affiliations

¹ Department of Epidemiology, School of Public Health, Fudan University, Shanghai, China.
² Department of Cancer Prevention, Fudan University Shanghai Cancer Center, Shanghai, China.
³ Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, China.
⁴ Shanghai Engineering Research Center of Artificial Intelligence Technology for Tumor Diseases, Shanghai, China.

PMID: 35179504
PMCID: PMC8900909
DOI: 10.2196/33440

The Application and Comparison of Machine Learning Models for the Prediction of Breast Cancer Prognosis: Retrospective Cohort Study

Jialong Xiao et al. JMIR Med Inform. 2022.

. 2022 Feb 18;10(2):e33440.

doi: 10.2196/33440.

Authors

Jialong Xiao^{1

2

3}, Miao Mo^{2

3}, Zezhou Wang^{2

3}, Changming Zhou^{2

3}, Jie Shen^{2

3}, Jing Yuan^{2

3}, Yulian He^{1

2}, Ying Zheng^{2

3

4}

Affiliations

¹ Department of Epidemiology, School of Public Health, Fudan University, Shanghai, China.
² Department of Cancer Prevention, Fudan University Shanghai Cancer Center, Shanghai, China.
³ Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, China.
⁴ Shanghai Engineering Research Center of Artificial Intelligence Technology for Tumor Diseases, Shanghai, China.

PMID: 35179504
PMCID: PMC8900909
DOI: 10.2196/33440

Abstract

Background: Over the recent years, machine learning methods have been increasingly explored in cancer prognosis because of the appearance of improved machine learning algorithms. These algorithms can use censored data for modeling, such as support vector machines for survival analysis and random survival forest (RSF). However, it is still debated whether traditional (Cox proportional hazard regression) or machine learning-based prognostic models have better predictive performance.

Objective: This study aimed to compare the performance of breast cancer prognostic prediction models based on machine learning and Cox regression.

Methods: This retrospective cohort study included all patients diagnosed with breast cancer and subsequently hospitalized in Fudan University Shanghai Cancer Center between January 1, 2008, and December 31, 2016. After all exclusions, a total of 22,176 cases with 21 features were eligible for model development. The data set was randomly split into a training set (15,523 cases, 70%) and a test set (6653 cases, 30%) for developing 4 models and predicting the overall survival of patients diagnosed with breast cancer. The discriminative ability of models was evaluated by the concordance index (C-index), the time-dependent area under the curve, and D-index; the calibration ability of models was evaluated by the Brier score.

Results: The RSF model revealed the best discriminative performance among the 4 models with 3-year, 5-year, and 10-year time-dependent area under the curve of 0.857, 0.838, and 0.781, a D-index of 7.643 (95% CI 6.542, 8.930) and a C-index of 0.827 (95% CI 0.809, 0.845). The statistical difference of the C-index was tested, and the RSF model significantly outperformed the Cox-EN (elastic net) model (C-index 0.816, 95% CI 0.796, 0.836; P=.01), the Cox model (C-index 0.814, 95% CI 0.794, 0.835; P=.003), and the support vector machine model (C-index 0.812, 95% CI 0.793, 0.832; P<.001). The 4 models' 3-year, 5-year, and 10-year Brier scores were very close, ranging from 0.027 to 0.094 and less than 0.1, which meant all models had good calibration. In the context of feature importance, elastic net and RSF both indicated that TNM staging, neoadjuvant therapy, number of lymph node metastases, age, and tumor diameter were the top 5 important features for predicting the prognosis of breast cancer. A final online tool was developed to predict the overall survival of patients with breast cancer.

Conclusions: The RSF model slightly outperformed the other models on discriminative ability, revealing the potential of the RSF method as an effective approach to building prognostic prediction models in the context of survival analysis.

Keywords: breast cancer; machine learning; medical informatics; prediction models; random survival forest; support vector machine; survival analysis.

©Jialong Xiao, Miao Mo, Zezhou Wang, Changming Zhou, Jie Shen, Jing Yuan, Yulian He, Ying Zheng. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 18.02.2022.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: None declared.

Figures

**Figure 1**
The coefficients of features change for varying α.

**Figure 2**
The important coefficient of each feature corresponding to the optimal α by elastic net. Ln: lymph node; PR: progesterone receptors.

**Figure 3**
The important coefficient of each feature by random survival forest. Ln: lymph node; PR: progesterone receptors.

**Figure 4**
Time-dependent receiver operating characteristic curves of models at 3 years, 5 years, and 10 years. EN: elastic net; RSF: random survival forest; SVM: support vector machine.

**Figure 5**
Time-dependent AUC of models over time. AUC: area under the curve; EN: elastic net; RSF: random survival forest; SVM: support vector machine.

**Figure 6**
Survival curves of high-risk and low-risk groups divided according to the risk score from (A) Cox, (B) Cox-EN (elastic net), (C) SVM (support vector machine), and (D) RSF (random survival forest).

See this image and copyright information in PMC

Cited by

Artificial Intelligence and Breast Cancer Management: From Data to the Clinic.
Feng K, Yi Z, Xu B. Feng K, et al. Cancer Innov. 2025 Feb 20;4(2):e159. doi: 10.1002/cai2.159. eCollection 2025 Apr. Cancer Innov. 2025. PMID: 39981497 Free PMC article. Review.
Predicting Colorectal Cancer Survival Using Time-to-Event Machine Learning: Retrospective Cohort Study.
Yang X, Qiu H, Wang L, Wang X. Yang X, et al. J Med Internet Res. 2023 Oct 26;25:e44417. doi: 10.2196/44417. J Med Internet Res. 2023. PMID: 37883174 Free PMC article.
Combination of urinary biomarkers and machine-learning models provided a higher predictive accuracy to predict long-term treatment outcomes of patients with interstitial cystitis/bladder pain syndrome.
Jhang JF, Yu WR, Huang WT, Kuo HC. Jhang JF, et al. World J Urol. 2024 Mar 20;42(1):173. doi: 10.1007/s00345-024-04843-3. World J Urol. 2024. PMID: 38507059
Deep-learning-based 3D super-resolution CT radiomics model: Predict the possibility of the micropapillary/solid component of lung adenocarcinoma.
Xing X, Li L, Sun M, Yang J, Zhu X, Peng F, Du J, Feng Y. Xing X, et al. Heliyon. 2024 Jul 5;10(13):e34163. doi: 10.1016/j.heliyon.2024.e34163. eCollection 2024 Jul 15. Heliyon. 2024. PMID: 39071606 Free PMC article.
Oncologic Applications of Artificial Intelligence and Deep Learning Methods in CT Spine Imaging-A Systematic Review.
Ong W, Lee A, Tan WC, Fong KTD, Lai DD, Tan YL, Low XZ, Ge S, Makmur A, Ong SJ, Ting YH, Tan JH, Kumar N, Hallinan JTPD. Ong W, et al. Cancers (Basel). 2024 Aug 28;16(17):2988. doi: 10.3390/cancers16172988. Cancers (Basel). 2024. PMID: 39272846 Free PMC article. Review.

See all "Cited by" articles

References

1. Candido Dos Reis FJ, Wishart GC, Dicks EM, Greenberg D, Rashbass J, Schmidt MK, van den Broek AJ, Ellis IO, Green A, Rakha E, Maishman T, Eccles DM, Pharoah PDP. An updated PREDICT breast cancer prognostication and treatment benefit prediction model with independent validation. Breast Cancer Res. 2017 May 22;19(1):58. doi: 10.1186/s13058-017-0852-3. https://breast-cancer-research.biomedcentral.com/articles/10.1186/s13058... 10.1186/s13058-017-0852-3 - DOI - DOI - PMC - PubMed
1. Ravdin PM, Siminoff LA, Davis GJ, Mercer MB, Hewlett J, Gerson N, Parker HL. Computer program to assist in making decisions about adjuvant therapy for women with early breast cancer. J Clin Oncol. 2001 Feb 15;19(4):980–91. doi: 10.1200/JCO.2001.19.4.980. - DOI - PubMed
1. Bhoo-Pathy N, Yip C, Hartman M, Saxena N, Taib NA, Ho G, Looi L, Bulgiba AM, van der Graaf Y, Verkooijen HM. Adjuvant! Online is overoptimistic in predicting survival of Asian breast cancer patients. Eur J Cancer. 2012 May;48(7):982–9. doi: 10.1016/j.ejca.2012.01.034. https://linkinghub.elsevier.com/retrieve/pii/S0959-8049(12)00098-6 S0959-8049(12)00098-6 - DOI - PubMed
1. Wong H, Subramaniam S, Alias Z, Taib NA, Ho G, Ng C, Yip C, Verkooijen HM, Hartman M, Bhoo-Pathy N. The predictive accuracy of PREDICT: a personalized decision-making tool for Southeast Asian women with breast cancer. Medicine (Baltimore) 2015 Feb;94(8):e593. doi: 10.1097/MD.0000000000000593. doi: 10.1097/MD.0000000000000593.00005792-201502040-00019 - DOI - DOI - PMC - PubMed
1. Zaguirre K, Kai M, Kubo M, Yamada M, Kurata K, Kawaji H, Kaneshiro K, Harada Y, Hayashi S, Shimazaki A, Morisaki T, Mori H, Oda Y, Chen S, Moriyama T, Shimizu S, Nakamura M. Validity of the prognostication tool PREDICT version 2.2 in Japanese breast cancer patients. Cancer Med. 2021 Mar;10(5):1605–1613. doi: 10.1002/cam4.3713. doi: 10.1002/cam4.3713. - DOI - DOI - PMC - PubMed

LinkOut - more resources

Full Text Sources
Research Materials
- NCI CPTC Antibody Characterization Program

[1] Candido Dos Reis FJ, Wishart GC, Dicks EM, Greenberg D, Rashbass J, Schmidt MK, van den Broek AJ, Ellis IO, Green A, Rakha E, Maishman T, Eccles DM, Pharoah PDP. An updated PREDICT breast cancer prognostication and treatment benefit prediction model with independent validation. Breast Cancer Res. 2017 May 22;19(1):58. doi: 10.1186/s13058-017-0852-3. https://breast-cancer-research.biomedcentral.com/articles/10.1186/s13058... 10.1186/s13058-017-0852-3 - DOI - DOI - PMC - PubMed

[2] Candido Dos Reis FJ, Wishart GC, Dicks EM, Greenberg D, Rashbass J, Schmidt MK, van den Broek AJ, Ellis IO, Green A, Rakha E, Maishman T, Eccles DM, Pharoah PDP. An updated PREDICT breast cancer prognostication and treatment benefit prediction model with independent validation. Breast Cancer Res. 2017 May 22;19(1):58. doi: 10.1186/s13058-017-0852-3. https://breast-cancer-research.biomedcentral.com/articles/10.1186/s13058... 10.1186/s13058-017-0852-3 - DOI - DOI - PMC - PubMed

[3] Ravdin PM, Siminoff LA, Davis GJ, Mercer MB, Hewlett J, Gerson N, Parker HL. Computer program to assist in making decisions about adjuvant therapy for women with early breast cancer. J Clin Oncol. 2001 Feb 15;19(4):980–91. doi: 10.1200/JCO.2001.19.4.980. - DOI - PubMed

[4] Ravdin PM, Siminoff LA, Davis GJ, Mercer MB, Hewlett J, Gerson N, Parker HL. Computer program to assist in making decisions about adjuvant therapy for women with early breast cancer. J Clin Oncol. 2001 Feb 15;19(4):980–91. doi: 10.1200/JCO.2001.19.4.980. - DOI - PubMed

[5] Bhoo-Pathy N, Yip C, Hartman M, Saxena N, Taib NA, Ho G, Looi L, Bulgiba AM, van der Graaf Y, Verkooijen HM. Adjuvant! Online is overoptimistic in predicting survival of Asian breast cancer patients. Eur J Cancer. 2012 May;48(7):982–9. doi: 10.1016/j.ejca.2012.01.034. https://linkinghub.elsevier.com/retrieve/pii/S0959-8049(12)00098-6 S0959-8049(12)00098-6 - DOI - PubMed

[6] Bhoo-Pathy N, Yip C, Hartman M, Saxena N, Taib NA, Ho G, Looi L, Bulgiba AM, van der Graaf Y, Verkooijen HM. Adjuvant! Online is overoptimistic in predicting survival of Asian breast cancer patients. Eur J Cancer. 2012 May;48(7):982–9. doi: 10.1016/j.ejca.2012.01.034. https://linkinghub.elsevier.com/retrieve/pii/S0959-8049(12)00098-6 S0959-8049(12)00098-6 - DOI - PubMed

[7] Wong H, Subramaniam S, Alias Z, Taib NA, Ho G, Ng C, Yip C, Verkooijen HM, Hartman M, Bhoo-Pathy N. The predictive accuracy of PREDICT: a personalized decision-making tool for Southeast Asian women with breast cancer. Medicine (Baltimore) 2015 Feb;94(8):e593. doi: 10.1097/MD.0000000000000593. doi: 10.1097/MD.0000000000000593.00005792-201502040-00019 - DOI - DOI - PMC - PubMed

[8] Wong H, Subramaniam S, Alias Z, Taib NA, Ho G, Ng C, Yip C, Verkooijen HM, Hartman M, Bhoo-Pathy N. The predictive accuracy of PREDICT: a personalized decision-making tool for Southeast Asian women with breast cancer. Medicine (Baltimore) 2015 Feb;94(8):e593. doi: 10.1097/MD.0000000000000593. doi: 10.1097/MD.0000000000000593.00005792-201502040-00019 - DOI - DOI - PMC - PubMed

[9] Zaguirre K, Kai M, Kubo M, Yamada M, Kurata K, Kawaji H, Kaneshiro K, Harada Y, Hayashi S, Shimazaki A, Morisaki T, Mori H, Oda Y, Chen S, Moriyama T, Shimizu S, Nakamura M. Validity of the prognostication tool PREDICT version 2.2 in Japanese breast cancer patients. Cancer Med. 2021 Mar;10(5):1605–1613. doi: 10.1002/cam4.3713. doi: 10.1002/cam4.3713. - DOI - DOI - PMC - PubMed

[10] Zaguirre K, Kai M, Kubo M, Yamada M, Kurata K, Kawaji H, Kaneshiro K, Harada Y, Hayashi S, Shimazaki A, Morisaki T, Mori H, Oda Y, Chen S, Moriyama T, Shimizu S, Nakamura M. Validity of the prognostication tool PREDICT version 2.2 in Japanese breast cancer patients. Cancer Med. 2021 Mar;10(5):1605–1613. doi: 10.1002/cam4.3713. doi: 10.1002/cam4.3713. - DOI - DOI - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

The Application and Comparison of Machine Learning Models for the Prediction of Breast Cancer Prognosis: Retrospective Cohort Study

Affiliations

The Application and Comparison of Machine Learning Models for the Prediction of Breast Cancer Prognosis: Retrospective Cohort Study

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

LinkOut - more resources

Full Text Sources

Research Materials