Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Feb 18;10(2):e33440.
doi: 10.2196/33440.

The Application and Comparison of Machine Learning Models for the Prediction of Breast Cancer Prognosis: Retrospective Cohort Study

Affiliations

The Application and Comparison of Machine Learning Models for the Prediction of Breast Cancer Prognosis: Retrospective Cohort Study

Jialong Xiao et al. JMIR Med Inform. .

Abstract

Background: Over the recent years, machine learning methods have been increasingly explored in cancer prognosis because of the appearance of improved machine learning algorithms. These algorithms can use censored data for modeling, such as support vector machines for survival analysis and random survival forest (RSF). However, it is still debated whether traditional (Cox proportional hazard regression) or machine learning-based prognostic models have better predictive performance.

Objective: This study aimed to compare the performance of breast cancer prognostic prediction models based on machine learning and Cox regression.

Methods: This retrospective cohort study included all patients diagnosed with breast cancer and subsequently hospitalized in Fudan University Shanghai Cancer Center between January 1, 2008, and December 31, 2016. After all exclusions, a total of 22,176 cases with 21 features were eligible for model development. The data set was randomly split into a training set (15,523 cases, 70%) and a test set (6653 cases, 30%) for developing 4 models and predicting the overall survival of patients diagnosed with breast cancer. The discriminative ability of models was evaluated by the concordance index (C-index), the time-dependent area under the curve, and D-index; the calibration ability of models was evaluated by the Brier score.

Results: The RSF model revealed the best discriminative performance among the 4 models with 3-year, 5-year, and 10-year time-dependent area under the curve of 0.857, 0.838, and 0.781, a D-index of 7.643 (95% CI 6.542, 8.930) and a C-index of 0.827 (95% CI 0.809, 0.845). The statistical difference of the C-index was tested, and the RSF model significantly outperformed the Cox-EN (elastic net) model (C-index 0.816, 95% CI 0.796, 0.836; P=.01), the Cox model (C-index 0.814, 95% CI 0.794, 0.835; P=.003), and the support vector machine model (C-index 0.812, 95% CI 0.793, 0.832; P<.001). The 4 models' 3-year, 5-year, and 10-year Brier scores were very close, ranging from 0.027 to 0.094 and less than 0.1, which meant all models had good calibration. In the context of feature importance, elastic net and RSF both indicated that TNM staging, neoadjuvant therapy, number of lymph node metastases, age, and tumor diameter were the top 5 important features for predicting the prognosis of breast cancer. A final online tool was developed to predict the overall survival of patients with breast cancer.

Conclusions: The RSF model slightly outperformed the other models on discriminative ability, revealing the potential of the RSF method as an effective approach to building prognostic prediction models in the context of survival analysis.

Keywords: breast cancer; machine learning; medical informatics; prediction models; random survival forest; support vector machine; survival analysis.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: None declared.

Figures

Figure 1
Figure 1
The coefficients of features change for varying α.
Figure 2
Figure 2
The important coefficient of each feature corresponding to the optimal α by elastic net. Ln: lymph node; PR: progesterone receptors.
Figure 3
Figure 3
The important coefficient of each feature by random survival forest. Ln: lymph node; PR: progesterone receptors.
Figure 4
Figure 4
Time-dependent receiver operating characteristic curves of models at 3 years, 5 years, and 10 years. EN: elastic net; RSF: random survival forest; SVM: support vector machine.
Figure 5
Figure 5
Time-dependent AUC of models over time. AUC: area under the curve; EN: elastic net; RSF: random survival forest; SVM: support vector machine.
Figure 6
Figure 6
Survival curves of high-risk and low-risk groups divided according to the risk score from (A) Cox, (B) Cox-EN (elastic net), (C) SVM (support vector machine), and (D) RSF (random survival forest).

Similar articles

Cited by

References

    1. Candido Dos Reis FJ, Wishart GC, Dicks EM, Greenberg D, Rashbass J, Schmidt MK, van den Broek AJ, Ellis IO, Green A, Rakha E, Maishman T, Eccles DM, Pharoah PDP. An updated PREDICT breast cancer prognostication and treatment benefit prediction model with independent validation. Breast Cancer Res. 2017 May 22;19(1):58. doi: 10.1186/s13058-017-0852-3. https://breast-cancer-research.biomedcentral.com/articles/10.1186/s13058... 10.1186/s13058-017-0852-3 - DOI - DOI - PMC - PubMed
    1. Ravdin PM, Siminoff LA, Davis GJ, Mercer MB, Hewlett J, Gerson N, Parker HL. Computer program to assist in making decisions about adjuvant therapy for women with early breast cancer. J Clin Oncol. 2001 Feb 15;19(4):980–91. doi: 10.1200/JCO.2001.19.4.980. - DOI - PubMed
    1. Bhoo-Pathy N, Yip C, Hartman M, Saxena N, Taib NA, Ho G, Looi L, Bulgiba AM, van der Graaf Y, Verkooijen HM. Adjuvant! Online is overoptimistic in predicting survival of Asian breast cancer patients. Eur J Cancer. 2012 May;48(7):982–9. doi: 10.1016/j.ejca.2012.01.034. https://linkinghub.elsevier.com/retrieve/pii/S0959-8049(12)00098-6 S0959-8049(12)00098-6 - DOI - PubMed
    1. Wong H, Subramaniam S, Alias Z, Taib NA, Ho G, Ng C, Yip C, Verkooijen HM, Hartman M, Bhoo-Pathy N. The predictive accuracy of PREDICT: a personalized decision-making tool for Southeast Asian women with breast cancer. Medicine (Baltimore) 2015 Feb;94(8):e593. doi: 10.1097/MD.0000000000000593. doi: 10.1097/MD.0000000000000593.00005792-201502040-00019 - DOI - DOI - PMC - PubMed
    1. Zaguirre K, Kai M, Kubo M, Yamada M, Kurata K, Kawaji H, Kaneshiro K, Harada Y, Hayashi S, Shimazaki A, Morisaki T, Mori H, Oda Y, Chen S, Moriyama T, Shimizu S, Nakamura M. Validity of the prognostication tool PREDICT version 2.2 in Japanese breast cancer patients. Cancer Med. 2021 Mar;10(5):1605–1613. doi: 10.1002/cam4.3713. doi: 10.1002/cam4.3713. - DOI - DOI - PMC - PubMed