Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Sep 29;11(19):5772.
doi: 10.3390/jcm11195772.

Deep Learning and Machine Learning with Grid Search to Predict Later Occurrence of Breast Cancer Metastasis Using Clinical Data

Affiliations

Deep Learning and Machine Learning with Grid Search to Predict Later Occurrence of Breast Cancer Metastasis Using Clinical Data

Xia Jiang et al. J Clin Med. .

Abstract

Background: It is important to be able to predict, for each individual patient, the likelihood of later metastatic occurrence, because the prediction can guide treatment plans tailored to a specific patient to prevent metastasis and to help avoid under-treatment or over-treatment. Deep neural network (DNN) learning, commonly referred to as deep learning, has become popular due to its success in image detection and prediction, but questions such as whether deep learning outperforms other machine learning methods when using non-image clinical data remain unanswered. Grid search has been introduced to deep learning hyperparameter tuning for the purpose of improving its prediction performance, but the effect of grid search on other machine learning methods are under-studied. In this research, we take the empirical approach to study the performance of deep learning and other machine learning methods when using non-image clinical data to predict the occurrence of breast cancer metastasis (BCM) 5, 10, or 15 years after the initial treatment. We developed prediction models using the deep feedforward neural network (DFNN) methods, as well as models using nine other machine learning methods, including naïve Bayes (NB), logistic regression (LR), support vector machine (SVM), LASSO, decision tree (DT), k-nearest neighbor (KNN), random forest (RF), AdaBoost (ADB), and XGBoost (XGB). We used grid search to tune hyperparameters for all methods. We then compared our feedforward deep learning models to the models trained using the nine other machine learning methods.

Results: Based on the mean test AUC (Area under the ROC Curve) results, DFNN ranks 6th, 4th, and 3rd when predicting 5-year, 10-year, and 15-year BCM, respectively, out of 10 methods. The top performing methods in predicting 5-year BCM are XGB (1st), RF (2nd), and KNN (3rd). For predicting 10-year BCM, the top performers are XGB (1st), RF (2nd), and NB (3rd). Finally, for 15-year BCM, the top performers are SVM (1st), LR and LASSO (tied for 2nd), and DFNN (3rd). The ensemble methods RF and XGB outperform other methods when data are less balanced, while SVM, LR, LASSO, and DFNN outperform other methods when data are more balanced. Our statistical testing results show that at a significance level of 0.05, DFNN overall performs comparably to other machine learning methods when predicting 5-year, 10-year, and 15-year BCM.

Conclusions: Our results show that deep learning with grid search overall performs at least as well as other machine learning methods when using non-image clinical data. It is interesting to note that some of the other machine learning methods, such as XGB, RF, and SVM, are very strong competitors of DFNN when incorporating grid search. It is also worth noting that the computation time required to do grid search with DFNN is much more than that required to do grid search with the other nine machine learning methods.

Keywords: DNN; EHR; breast cancer; clinical; deep learning; machine learning; metastasis; metastatic breast cancer; non-image; prediction.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
A DFNN (deep feedforward neural network) model that contains n hidden layers.
Figure 2
Figure 2
ROC curves of the best-performing models for all methods, each respectively, for predicting 5-year metastasis (ROC: receiver operating characteristic; DFNN: Deep feedforward neural network; NB: Naïve bayes; LR: Logistic regression; DT: Decision tree; SVM: Support vector machine; LASSO: Least absolute shrinkage and selection operator; KNN: K-nearest neighbor; RF: Random forest; ADB: AdaBoost; XGB: XGBoost).
Figure 3
Figure 3
ROC curves of the best-performing models for all methods, each respectively, for predicting 10-year metastasis (ROC: receiver operating characteristic; DFNN: Deep feedforward neural network; NB: Naïve bayes; LR: Logistic regression; DT: Decision tree; SVM: Support vector machine; LASSO: Least absolute shrinkage and selection operator; KNN: K-nearest neighbor; RF: Random forest; ADB: AdaBoost; XGB: XGBoost).
Figure 4
Figure 4
ROC curves of the best-performing models for all methods, each respectively, for predicting 15-year metastasis (ROC: receiver operating characteristic; DFNN: Deep feedforward neural network; NB: Naïve bayes; LR: Logistic regression; DT: Decision tree; SVM: Support vector machine; LASSO: Least absolute shrinkage and selection operator; KNN: K-nearest neighbor; RF: Random forest; ADB: AdaBoost; XGB: XGBoost).
Figure 5
Figure 5
Boxplots to compare the mean test AUCs of all methods (AUC: area under the ROC curves; DFNN: Deep feedforward neural network; NB: Naïve bayes; LR: Logistic regression; DT: Decision tree; SVM: Support vector machine; LASSO: Least absolute shrinkage and selection operator; KNN: K-nearest neighbor; RF: Random forest; ADB: AdaBoost; XGB: XGBoost).
Figure 6
Figure 6
Side by side comparisons of the mean test AUCs of all methods when predicting 5-, 10-, and 15-year breast cancer metastasis (AUC: area under the ROC curves; DFNN: Deep feedforward neural network; NB: Naïve bayes; LR: Logistic regression; DT: Decision tree; SVM: Support vector machine; LASSO: Least absolute shrinkage and selection operator; KNN: K-nearest neighbor; RF: Random forest; ADB: AdaBoost; XGB: XGBoost).
Figure 7
Figure 7
A comparison of the data imbalance status of LSM-5 year, 10 year, and 15 year datasets (LSM: Lynn Sage Dataset for Metastasis).

Similar articles

Cited by

References

    1. Sung H., Ferlay J., Siegel R.L., Laversanne M., Soerjomataram I., Jemal A., Bray F. Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA Cancer J. Clin. 2021;71:209–249. doi: 10.3322/caac.21660. - DOI - PubMed
    1. Rahib L., Wehner M.R., Matrisian L.M., Nead K.T. Estimated Projection of US Cancer Incidence and Death to 2040. JAMA Netw. Open. 2021;4:e214708. doi: 10.1001/jamanetworkopen.2021.4708. - DOI - PMC - PubMed
    1. American Cancer Society Cancer Facts & Figures. 2021. [(accessed on 8 July 2021)]. Available online: https://www.cancer.org/research/cancer-facts-statistics/all-cancer-facts....
    1. DeSantis C.E., Ma J., Gaudet M.M., Newman L.A., Miller K.D., Goding Sauer A., Jemal A., Siegel R.L. Breast cancer statistics, 2019. CA Cancer J. Clin. 2019;69:438–451. doi: 10.3322/caac.21583. - DOI - PubMed
    1. Afifi A., Saad A.M., Al-Husseini M.J., Elmehrath A.O., Northfelt D.W., Sonbol M.B. Causes of death after breast cancer diagnosis: A US population-based analysis. Cancer. 2019;126:1559–1567. doi: 10.1002/cncr.32648. - DOI - PubMed