Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Apr 3;32(4):561-571.
doi: 10.1158/1055-9965.EPI-22-0677.

Performance of Statistical and Machine Learning Risk Prediction Models for Surveillance Benefits and Failures in Breast Cancer Survivors

Affiliations

Performance of Statistical and Machine Learning Risk Prediction Models for Surveillance Benefits and Failures in Breast Cancer Survivors

Yu-Ru Su et al. Cancer Epidemiol Biomarkers Prev. .

Abstract

Background: Machine learning (ML) approaches facilitate risk prediction model development using high-dimensional predictors and higher-order interactions at the cost of model interpretability and transparency. We compared the relative predictive performance of statistical and ML models to guide modeling strategy selection for surveillance mammography outcomes in women with a personal history of breast cancer (PHBC).

Methods: We cross-validated seven risk prediction models for two surveillance outcomes, failure (breast cancer within 12 months of a negative surveillance mammogram) and benefit (surveillance-detected breast cancer). We included 9,447 mammograms (495 failures, 1,414 benefits, and 7,538 nonevents) from years 1996 to 2017 using a 1:4 matched case-control samples of women with PHBC in the Breast Cancer Surveillance Consortium. We assessed model performance of conventional regression, regularized regressions (LASSO and elastic-net), and ML methods (random forests and gradient boosting machines) by evaluating their calibration and, among well-calibrated models, comparing the area under the receiver operating characteristic curve (AUC) and 95% confidence intervals (CI).

Results: LASSO and elastic-net consistently provided well-calibrated predicted risks for surveillance failure and benefit. The AUCs of LASSO and elastic-net were both 0.63 (95% CI, 0.60-0.66) for surveillance failure and 0.66 (95% CI, 0.64-0.68) for surveillance benefit, the highest among well-calibrated models.

Conclusions: For predicting breast cancer surveillance mammography outcomes, regularized regression outperformed other modeling approaches and balanced the trade-off between model flexibility and interpretability.

Impact: Regularized regression may be preferred for developing risk prediction models in other contexts with rare outcomes, similar training sample sizes, and low-dimensional features.

PubMed Disclaimer

Conflict of interest statement

The following authors have potential conflicts of interest; Dr. Diana Buist: Athena WISDOM Study Data Safety and Monitoring Board (2015-present); Dr. Janie M Lee: Research Grant from GE Healthcare (11/15/2016-12/31/2020), Consulting agreement with GE Healthcare (2017 only); Dr. Diana Miglioretti: Honorarium from Society for Breast Imaging for keynote lecture in April 2019. Royalties from Elsevier; Dr. Karla Kerlikowske: Non-paid consultant for Grail on the STRIVE study (2017-present). No other disclosures were reported.

Figures

Figure 1.
Figure 1.. Assessment on overall model calibration.
We showed the overall model calibration measured by three metrics, including the ratio of expected to observed events (E/O ratio) and calibration intercept and slope for surveillance failure (interval cancer; top panel) and benefit (surveillance-detected cancer; button panel) for each prediction modeling approach. The 95% CI for each calibration measure was shown using the error bars. The vertical lines showed the ideal value for each metric (1 for E/O ratio, 0 for calibration intercept, and 1 for calibration slope).
Figure 2.
Figure 2.. Weak calibration for 7 risk prediction models for surveillance failure (interval second breast cancer).
Each subfigure demonstrated the weak calibration of an individual modeling approach by comparing the mean predicted risk (x-axis) to the observed risk of surveillance failure (y-axis) in 10 deciles determined by the predicted risk. The vertical error bars showed the 95% confidence interval of the observed risk of surveillance failure in individual deciles. A p-value from the Hosmer-Lemeshow test (HL test) was shown as well for each modeling approach.
Figure 3.
Figure 3.. Weak calibration for 7 risk prediction models for surveillance benefit (surveillance-detected cancer).
Each subfigure demonstrated the weak calibration of an individual modeling approach by comparing the mean predicted risk (x-axis) to the observed risk of surveillance benefit (y-axis) in 10 deciles determined by the predicted risk. The vertical error bars showed the 95% confidence interval of the observed risk of surveillance benefit in individual deciles. A p-value from the Hosmer-Lemeshow test (HL test) was shown as well for each modeling approach.
Figure 4.
Figure 4.. Receiver operating characteristic curves for surveillance failure (panel A) and surveillance benefit (panel B) for well-calibrated risk modeling approaches.
We showed the receiver operating characteristics curves for 4 well-calibrated risk prediction models, the expert model, LASSO model, elastic-net (EN) model and gradient boosting machines (GBM), along with their AUCs and the corresponding 95% confidence interval.
Figure 5.
Figure 5.. Race and ethnicity-stratified model calibration for surveillance failures (top panel) and benefits (button panel) in Non-Hispanic racial and ethnic groups.
This figure showed the assessment of race and ethnicity-stratified model calibration by three metrics, including the ratio of expected and observed events (E/O ratio) and the calibration intercept and slope within three Non-Hispanic racial and ethnic groups, including Asian and Pacific Islander, Black, and White. The 95% CI for each calibration measure was shown using the error bars. The vertical lines showed the ideal value for each metric (1 for E/O ratio, 0 for calibration intercept, and 1 for calibration slope).

Similar articles

Cited by

References

    1. Parikh RB; Manz C; Chivers C; Regli SH; Braun J; Draugelis ME; et al. Machine Learning Approaches to Predict 6-Month Mortality Among Patients With Cancer. JAMA Netw Open 2019, 2 (10), e1915997. 10.1001/jamanetworkopen.2019.15997. - DOI - PMC - PubMed
    1. Goldstein BA; Navar AM; Carter RE Moving beyond Regression Techniques in Cardiovascular Risk Prediction: Applying Machine Learning to Address Analytic Challenges. Eur Heart J 2016, ehw302. 10.1093/eurheartj/ehw302. - DOI - PMC - PubMed
    1. Ming C; Viassolo V; Probst-Hensch N; Dinov ID; Chappuis PO; Katapodi MC Machine Learning-Based Lifetime Breast Cancer Risk Reclassification Compared with the BOADICEA Model: Impact on Screening Recommendations. Br J Cancer 2020, 123 (5), 860–867. 10.1038/s41416-020-0937-0. - DOI - PMC - PubMed
    1. Gravesteijn BY; Nieboer D; Ercole A; Lingsma HF; Nelson D; van Calster B; et al. Machine Learning Algorithms Performed No Better than Regression Models for Prognostication in Traumatic Brain Injury. Journal of Clinical Epidemiology 2020, 122, 95–107. 10.1016/j.jclinepi.2020.03.005. - DOI - PubMed
    1. Nusinovici S; Tham YC; Chak Yan MY; Wei Ting DS; Li J; Sabanayagam C; et al. Logistic Regression Was as Good as Machine Learning for Predicting Major Chronic Diseases. J Clin Epidemiol 2020, 122, 56–69. 10.1016/j.jclinepi.2020.03.002. - DOI - PubMed

Publication types