Randomized Controlled Trial

. 2023 Sep 30;28(1):394.

doi: 10.1186/s40001-023-01361-7.

Development and testing of a random forest-based machine learning model for predicting events among breast cancer patients with a poor response to neoadjuvant chemotherapy

Yudi Jin^{1

2}, Ailin Lan¹, Yuran Dai¹, Linshan Jiang¹, Shengchun Liu³

Affiliations

¹ Department of Breast and Thyroid Surgery, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China.
² Department of Pathology, Chongqing Key Laboratory for Intelligent Oncology in Breast Cancer (iCQBC), Chongqing University Cancer Hospital, Chongqing, 400030, China.
³ Department of Breast and Thyroid Surgery, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China. liushengchun1968@163.com.

PMID: 37777809
PMCID: PMC10543332
DOI: 10.1186/s40001-023-01361-7

Randomized Controlled Trial

Development and testing of a random forest-based machine learning model for predicting events among breast cancer patients with a poor response to neoadjuvant chemotherapy

Yudi Jin et al. Eur J Med Res. 2023.

. 2023 Sep 30;28(1):394.

doi: 10.1186/s40001-023-01361-7.

Authors

Yudi Jin^{1

2}, Ailin Lan¹, Yuran Dai¹, Linshan Jiang¹, Shengchun Liu³

Affiliations

¹ Department of Breast and Thyroid Surgery, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China.
² Department of Pathology, Chongqing Key Laboratory for Intelligent Oncology in Breast Cancer (iCQBC), Chongqing University Cancer Hospital, Chongqing, 400030, China.
³ Department of Breast and Thyroid Surgery, The First Affiliated Hospital of Chongqing Medical University, Chongqing, 400016, China. liushengchun1968@163.com.

PMID: 37777809
PMCID: PMC10543332
DOI: 10.1186/s40001-023-01361-7

Abstract

Background: Breast cancer (BC) is the most common malignant tumor around the world. Timely detection of the tumor progression after treatment could improve the survival outcome of patients. This study aimed to develop machine learning models to predict events (defined as either (1) the first tumor relapse locally, regionally, or distantly; (2) a diagnosis of secondary malignant tumor; or (3) death because of any reason.) in BC patients post-treatment.

Methods: The patients with the response of stable disease (SD) and progressive disease (PD) after neoadjuvant chemotherapy (NAC) were selected. The clinicopathological features and the survival data were recorded in 1 year and 5 years, respectively. Patients were randomly divided into the training set and test set in the ratio of 8:2. A random forest (RF) and a logistic regression were established in both of 1-year cohort and the 5-year cohort. The performance was compared between the two models. The models were validated using data from the Surveillance, Epidemiology, and End Results (SEER) database.

Results: A total of 315 patients were included. In the 1-year cohort, 197 patients were divided into a training set while 87 were into a test set. The specificity, sensitivity, and AUC were 0.800, 0.833, and 0.810 in the RF model. And 0.520, 0.833, and 0.653 of the logistic regression. In the 5-year cohort, 132 patients were divided into the training set while 33 were into the test set. The specificity, sensitivity, and AUC were 0.882, 0.750, and 0.829 in the RF model. And 0.882, 0.688, and 0.752 of the logistic regression. In the external validation set, of the RF model, the specificity, sensitivity, and AUC were 0.765, 0.812, and 0.779. Of the logistics regression model, the specificity, sensitivity, and AUC were 0.833, 0.376, and 0.619.

Conclusion: The RF model has a good performance in predicting events among BC patients with SD and PD post-NAC. It may be beneficial to BC patients, assisting in detecting tumor recurrence.

Keywords: Breast cancer; Event; Logistic regression; Machine learning; Random forest.

PubMed Disclaimer

Conflict of interest statement

The authors declare that no competing interests exist.

Figures

**Fig. 2**
The choice of the number of mtry and ntree in the 1-year cohort (A, B) and the 5-year cohort (C, D), respectively. “0” means the error rate of predicting the probability of non-event. “1” means the error rate of predicting the probability of the event. “OOB” means the error rate of out-of-bag

**Fig. 3**
The mean decrease accuracy and mean decrease gini index in the 1-year cohort (A, B) and the 5-year cohort (C, D), respectively. Both the “In_sig” and “ns” means not significant. “Sig” means significant. “*” means P-value < 0.05. “**” means P-value < 0.01

**Fig. 4**
The ROC curve of the random forest model: training set and test set of the 1-year cohort (A, B); training set and test set of the 5-year cohort (C, D)

**Fig. 5**
The ROC curve of the logistic regression: training set and test set of the 1-year cohort (A, B); training set and test set of the 5-year cohort (C, D)

**Fig. 6**
The DFS curve for both the low risk group and high risk group predicted by the model

**Fig. 7**
The ROC curve of the random forest model: external validation set of the 5-year cohort (A). The ROC curve of the logistic regression: external validation set of the 5-year cohort (B)

See this image and copyright information in PMC

References

1. Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2021;71(3):209–249. doi: 10.3322/caac.21660. - DOI - PubMed
1. Waks AG, Winer EP. Breast cancer treatment: a review. JAMA. 2019;321(3):288–300. doi: 10.1001/jama.2018.19323. - DOI - PubMed
1. Early Breast Cancer Trialists' Collaborative Group (EBCTCG). Long-term outcomes for neoadjuvant versus adjuvant chemotherapy in early breast cancer: meta-analysis of individual patient data from ten randomised trials. Lancet Oncol. 2018;19(1):27–39. - PMC - PubMed
1. Cortazar P, Zhang L, Untch M, Mehta K, Costantino JP, Wolmark N, et al. Pathological complete response and long-term clinical benefit in breast cancer: the CTNeoBC pooled analysis. Lancet. 2014;384(9938):164–172. doi: 10.1016/S0140-6736(13)62422-8. - DOI - PubMed
1. Spring L, Greenup R, Niemierko A, Schapira L, Haddad S, Jimenez R, et al. Pathologic complete response after neoadjuvant chemotherapy and long-term outcomes among young women with breast cancer. J Natl Compr Canc Netw. 2017;15(10):1216–1223. doi: 10.6004/jnccn.2017.0158. - DOI - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Development and testing of a random forest-based machine learning model for predicting events among breast cancer patients with a poor response to neoadjuvant chemotherapy

Affiliations

Development and testing of a random forest-based machine learning model for predicting events among breast cancer patients with a poor response to neoadjuvant chemotherapy

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Medical