Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Oct 25;13(1):18301.
doi: 10.1038/s41598-023-45438-z.

Predicting diagnosis and survival of bone metastasis in breast cancer using machine learning

Affiliations

Predicting diagnosis and survival of bone metastasis in breast cancer using machine learning

Xugang Zhong et al. Sci Rep. .

Abstract

This study aimed at establishing more accurate predictive models based on novel machine learning algorithms, with the overarching goal of providing clinicians with effective decision-making assistance. We retrospectively analyzed the breast cancer patients recorded in the Surveillance, Epidemiology, and End Results (SEER) database from 2010 to 2016. Multivariable logistic regression analyses were used to identify risk factors for bone metastases in breast cancer, whereas Cox proportional hazards regression analyses were used to identify prognostic factors for breast cancer with bone metastasis (BCBM). Based on the identified risk and prognostic factors, we developed diagnostic and prognostic models that incorporate six machine learning classifiers. We then used the area under the receiver operating characteristic (ROC) curve (AUC), learning curve, precision curve, calibration plot, and decision curve analysis to evaluate performance of the machine learning models. Univariable and multivariable logistic regression analyses showed that bone metastases were significantly associated with age, race, sex, grade, T stage, N stage, surgery, radiotherapy, chemotherapy, tumor size, brain metastasis, liver metastasis, lung metastasis, breast subtype, and PR. Univariate and multivariate Cox regression analyses revealed that age, race, marital status, grade, surgery, radiotherapy, chemotherapy, brain metastasis, liver metastasis, lung metastasis, breast subtype, ER, and PR were closely associated with the prognosis of BCBM. Among the six machine learning models, the XGBoost algorithm predicted the most accurate results (Diagnostic model AUC = 0.98; Prognostic model AUC = 0.88). According to the Shapley additive explanations (SHAP), the most critical feature of the diagnostic model was surgery, followed by N stage. Interestingly, surgery was also the most critical feature of prognostic model, followed by liver metastasis. Based on the XGBoost algorithm, we could effectively predict the diagnosis and survival of bone metastasis in breast cancer and provide targeted references for the treatment of BCBM patients.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
Flow chart of patient screening.
Figure 2
Figure 2
Learning curves of models with training data. (A) XGBoost; (B) Random Forest; (C) Decision Trees; (D) Extra Trees; (E) Gaussian NB; (F) Logistic regression.
Figure 3
Figure 3
ROC curves of diagnostic models developed by training cohort (A) and validation cohort (B); PR curves of models developed by training cohort (C) and validation cohort (D); calibration curves of models developed by training cohort (E) and validation cohort (F).
Figure 4
Figure 4
Ten-fold cross-validation results of the six machine learning models in the training group.
Figure 5
Figure 5
The ROC curves of prognostic models based on machine learning in training set (A) and validation set (B). The decision curves of prognostic models based on machine learning in training cohort (C) and validation cohort (D).
Figure 6
Figure 6
Prediction performance of seven models.
Figure 7
Figure 7
Feature importance ranking by SHAP values in diagnostic model based on the XGBoost algorithm. (A) The features are sorted according to the sum of the SHAP values of all patients, and SHAP values are used to represent the distribution of the influence of each feature on the output of the XGBoost model. Red indicates that the value of the feature is higher, whereas blue indicates that the value of the feature is lower. The X-axis represents the effect of SHAP values on the output of the model. The higher the value of X-axis, the greater the likelihood of delayed mitigation. (B) The standard bar chart is drawn and sorted using the average absolute value of each feature shape value in the XGBoost model.
Figure 8
Figure 8
Feature importance ranking by SHAP values in prognostic model based on XGBoost algorithm. (A) The features are sorted according to the sum of the SHAP values of all patients, and SHAP values are used to represent the distribution of the influence of each feature on the output of the XGBoost model. Red indicates that the value of the feature is higher, whereas blue indicates that the value of the feature is lower. The X-axis represents the effect of SHAP values on the output of the model. The higher the value of X-axis, the greater the likelihood of delayed mitigation. (B) The standard bar chart is drawn and sorted using the average absolute value of each feature shape value in the XGBoost model.

References

    1. Siegel RL, Miller KD, Fuchs HE, Jemal A. Cancer statistics, 2022. CA Cancer J. Clin. 2022;72:7–33. doi: 10.3322/caac.21708. - DOI - PubMed
    1. DeSantis CE, Ma J, Goding Sauer A, Newman LA, Jemal A. Breast cancer statistics, 2017, racial disparity in mortality by state. CA Cancer J. Clin. 2017;67:439–448. doi: 10.3322/caac.21412. - DOI - PubMed
    1. Li Z, Kang Y. Emerging therapeutic targets in metastatic progression: A focus on breast cancer. Pharmacol. Ther. 2016;161:79–96. doi: 10.1016/j.pharmthera.2016.03.003. - DOI - PMC - PubMed
    1. Schrijver W, et al. Mutation profiling of key cancer genes in primary breast cancers and their distant metastases. Cancer Res. 2018;78:3112–3121. doi: 10.1158/0008-5472.can-17-2310. - DOI - PMC - PubMed
    1. Ng CKY, et al. Genetic heterogeneity in therapy-naïve synchronous primary breast cancers and their metastases. Clin. Cancer Res. 2017;23:4402–4415. doi: 10.1158/1078-0432.ccr-16-3115. - DOI - PMC - PubMed

Publication types