Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Apr-Jun;34(2):173-180.
doi: 10.1097/QMH.0000000000000513. Epub 2025 Apr 1.

Prediction of Breast Cancer Remission

Affiliations

Prediction of Breast Cancer Remission

Vladimir Cardenas et al. Qual Manag Health Care. 2025 Apr-Jun.

Abstract

Background and objectives: This study aims to use electronic health records (EHR) and social determinants of health (SDOH) data to predict breast cancer remission. The emphasis is placed on utilizing easily accessible information to improve predictive models, facilitate the early detection of high-risk patients, and facilitate targeted interventions and personalized care strategies.

Methods: This study identifies individuals who are unlikely to respond to standard treatment of breast cancer. The study identified 1621 patients with breast cancer by selecting patients who received tamoxifen in the All of Us Research Database. The dependent variable, remission, was defined using tamoxifen exposure as a proxy. Data preprocessing involved creating dummy variables for diseases, demographic, and socioeconomic factors and handling missing values to maintain data integrity. For the feature selection phase, we utilized the strong rule for feature elimination and then logistic least absolute shrinkage and selection operator regression with 5-fold cross-validation to reduce the number of predictors by retaining only those with coefficients with an absolute value greater than 0.01. We then trained machine learning models using logistic regression, random forest, naïve Bayes, and extreme gradient boost using area under the receiver operating curve (AUROC) metric to score model performance. This created race-neutral model performance. Finally, we analyzed model performance for race and ethnicity test populations including Non-Hispanic White, Non-Hispanic Black, Hispanic, and Other Race or Ethnicity. These generated race-specific model performance.

Results: The model achieved an AUROC range between 0.68 and 0.75, with logistic regression and random forest trained on data without interaction terms demonstrating the best performance. Feature selection identified significant factors such as melanocytic nevus and bone disorders, highlighting the importance of these factors in predictive accuracy. Race-specific model performance was lower than race-neutral model performance for Non-Hispanic Blacks, and Other Race and Ethnicity Groups.

Conclusions: In conclusion, our research demonstrates the feasibility of predicting breast cancer non-remission using EHR and SDOH data, achieving acceptable performance without complex predictors. Addressing the data quality limitations and refining remission indicators can further improve the models' utility for early treatment decisions, fostering improved patient outcomes and support throughout the cancer journey.

Keywords: breast cancer; electronic health records; non-remission; predictive modeling; social determinants of health.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflicts of interest.

References

    1. Giaquinto AN, Sung H, Miller KD, et al. Breast Cancer Statistics, 2022. CA Cancer J Clin. 2022;72(6):524-541. doi:10.3322/caac.21754. - DOI
    1. Siegel RL, Miller KD, Wagle NS, Jemal A. Cancer Statistics, 2023. CA Cancer J Clin. 2023;73(1):17-48. doi:10.3322/caac.21763. - DOI
    1. Arnold M, Morgan E, Rumgay H, et al. Current and future burden of breast cancer: Global statistics for 2020 and 2040. Breast. 2022;66:15-23. doi:10.1016/j.breast.2022.08.010. - DOI
    1. Nardin S, Mora E, Varughese FM, et al. Breast cancer survivorship, quality of life, and late toxicities. Front Oncol. 2020;10:864. doi:10.3389/fonc.2020.00864. - DOI
    1. Rabiei R, Ayyoubzadeh SM, Sohrabei S, Esmaeili M, Atashi A. Prediction of breast cancer using machine learning approaches. J Biomed Phys Eng. 2022;12(3):297-308. doi:10.31661/jbpe.v0i0.2109-1403. - DOI