Machine Learning Risk Stratification for Older Breast Cancer Survivors: Clinical Care Implications

Stephanie B Wheeler^{1

2}, Jason Rotter³, Lisa P Spees^{1

2}, Caitlin B Biddell³, Justin G Trogdon^{1

2}, Catherine M Alfano⁴, Deborah K Mayer², Michaela A Dinan^{5

6}, Larissa Nekhlyudov⁷, Sarah A Birken⁸

Affiliations

¹ Department of Health Policy and Management, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, North Carolina, USA.
² Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, North Carolina, USA.
³ Mathematica, Washington, District of Columbia, USA.
⁴ Northwell Health, New York, New York, USA.
⁵ Department of Chronic Disease Epidemiology, Yale School of Public Health, New Haven, Connecticut, USA.
⁶ Yale Cancer Outcomes, Public Policy, and Effectiveness Research Center, New Haven, Connecticut, USA.
⁷ Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, USA.
⁸ Department of Implementation Science, Wake Forest University School of Medicine, Winston-Salem, North Carolina, USA.

PMID: 40671264
DOI: 10.1111/1475-6773.70005

Machine Learning Risk Stratification for Older Breast Cancer Survivors: Clinical Care Implications

Stephanie B Wheeler et al. Health Serv Res. 2025.

. 2025 Jul 16:e70005.

doi: 10.1111/1475-6773.70005. Online ahead of print.

Authors

Affiliations

¹ Department of Health Policy and Management, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, North Carolina, USA.
² Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, North Carolina, USA.
³ Mathematica, Washington, District of Columbia, USA.
⁴ Northwell Health, New York, New York, USA.
⁵ Department of Chronic Disease Epidemiology, Yale School of Public Health, New Haven, Connecticut, USA.
⁶ Yale Cancer Outcomes, Public Policy, and Effectiveness Research Center, New Haven, Connecticut, USA.
⁷ Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, USA.
⁸ Department of Implementation Science, Wake Forest University School of Medicine, Winston-Salem, North Carolina, USA.

PMID: 40671264
DOI: 10.1111/1475-6773.70005

Abstract

Objective: To develop and validate a clinical risk prediction algorithm to identify breast cancer survivors at high risk for adverse outcomes.

Study setting and design: Our national retrospective analysis used cross-validated random forest machine learning models to separately predict the risk of all-cause death, cancer-specific death, claims-derived risk of recurrence, and other adverse health outcomes within 3 and 5 years following treatment completion.

Data sources and analytic sample: Our study used the Surveillance and Epidemiology End Results (SEER) registry-Consumer Assessment of Healthcare Providers and Systems (CAHPS) survey (SEER-CAHPS) linked data for survivors diagnosed between 2003 and 2011, with follow-up claims data to 2017.

Principal findings: Within the 3-year follow-up period, 372/4516 survivors (mean age 75.1; 81.7% white) in the primary cohort (8.2%) died, 111 from cancer (2.5%), 665 (14.7%) experienced cancer recurrence, and 488 (10.8%) were hospitalized for adverse health outcomes. The algorithm's prediction resulted in 91.9% out-of-sample accuracy (the percent of observations classified correctly) and a 37.6% Cohen's Kappa (i.e., improvement over an uninformed model). Out-of-sample accuracy was 97.5% (44% improvement) for predicting cancer-specific death, 85% (26% improvement) for recurrence, and 89% (28% improvement) for other adverse health outcomes. Important predictors across outcomes included geographic region, age, frailty, comorbidity, time since diagnosis, and out-of-pocket cost responsibility.

Conclusions: Machine learning models accurately predicted relevant adverse survivorship outcomes, driven primarily by non-cancer specific factors. Breast cancer survivors at high risk for adverse outcomes may benefit from more intensive care, whereas those at low risk may be more appropriately managed by primary care.

Keywords: Machine learning; cancer survivorship; risk stratification.

PubMed Disclaimer

References

1. A. B. Mariotto, K. R. Yabroff, Y. Shao, E. J. Feuer, and M. L. Brown, “Projections of the Cost of Cancer Care in the United States: 2010–2020,” Journal of the National Cancer Institute 103, no. 2 (2011): 117–128.
1. S. M. Bluethmann, A. B. Mariotto, and J. H. Rowland, “Anticipating the ‘Silver Tsunami’: Prevalence Trajectories and co‐Morbidity Burden Among Older Cancer Survivors in the United States,” Cancer Epidemiology, Biomarkers & Prevention 25, no. 7 (2016): 1029–1036.
1. W. Yang, J. H. Williams, P. F. Hogan, et al., “Projected Supply of and Demand for Oncologists and Radiation Oncologists Through 2025: An Aging, Better‐Insured Population Will Result in Shortage,” Journal of Oncology Practice 10, no. 1 (2014): 39–45.
1. K. Y Bilimoria, C. Y Ko, J. S Tomlinson, et al., “Wait Times for Cancer Surgery in the United States,” Annals of Surgery 253, no. 4 (2011): 779–785.
1. C. Erikson, E. Salsberg, G. Forte, S. Bruinooge, and M. Goldstein, “Future Supply and Demand for Oncologists: Challenges to Assuring Access to Oncology Services,” Journal of Oncology Practice 3, no. 2 (2007): 79–86.

Grants and funding

ACS5113568/American Cancer Society

LinkOut - more resources

Full Text Sources
- Ovid Technologies, Inc.
- Wiley

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Machine Learning Risk Stratification for Older Breast Cancer Survivors: Clinical Care Implications

Affiliations

Machine Learning Risk Stratification for Older Breast Cancer Survivors: Clinical Care Implications

Authors

Affiliations

Abstract

References

Grants and funding

LinkOut - more resources

Full Text Sources