Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Feb:10577:105770J.
doi: 10.1117/12.2293954. Epub 2018 Mar 7.

Quantifying predictive capability of electronic health records for the most harmful breast cancer

Affiliations

Quantifying predictive capability of electronic health records for the most harmful breast cancer

Yirong Wu et al. Proc SPIE Int Soc Opt Eng. 2018 Feb.

Abstract

Improved prediction of the "most harmful" breast cancers that cause the most substantive morbidity and mortality would enable physicians to target more intense screening and preventive measures at those women who have the highest risk; however, such prediction models for the "most harmful" breast cancers have rarely been developed. Electronic health records (EHRs) represent an underused data source that has great research and clinical potential. Our goal was to quantify the value of EHR variables in the "most harmful" breast cancer risk prediction. We identified 794 subjects who had breast cancer with primary non-benign tumors with their earliest diagnosis on or after 1/1/2004 from an existing personalized medicine data repository, including 395 "most harmful" breast cancer cases and 399 "least harmful" breast cancer cases. For these subjects, we collected EHR data comprised of 6 components: demographics, diagnoses, symptoms, procedures, medications, and laboratory results. We developed two regularized prediction models, Ridge Logistic Regression (Ridge-LR) and Lasso Logistic Regression (Lasso-LR), to predict the "most harmful" breast cancer one year in advance. The area under the ROC curve (AUC) was used to assess model performance. We observed that the AUCs of Ridge-LR and Lasso-LR models were 0.818 and 0.839 respectively. For both the Ridge-LR and Lasso-LR models, the predictive performance of the whole EHR variables was significantly higher than that of each individual component (p<0.001). In conclusion, EHR variables can be used to predict the "most harmful" breast cancer, providing the possibility to personalize care for those women at the highest risk in clinical practice.

Keywords: breast cancer; electronic health records (EHRs); least absolute shrinkage and selection operator (Lasso); regularized prediction model.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Specification of censor date. EHR data in the red oval were used to predict the “most harmful” breast cancer.
Figure 2
Figure 2
ROC curves for Ridge-LR models.
Figure 3
Figure 3
ROC curves for Lasso-LR models.

References

    1. Feig S. Overdiagnosis of breast cancer at screening is clinically insignificant. Acad Radiol. 2015;22:961–966. - PubMed
    1. Helvie M, Chang J, Hendrick R, et al. Reduction in late-stage breast cancer incidence in the mammography era: Implications for overdiagnosis of invasive cancer. Cancer. 2014;120(17):2649–56. - PubMed
    1. Ong M, Mandl K. National expenditure for false-positive mammograms and breast cancer overdiagnoses estimated at $4 billion a year. Health Aff. 2015;34(4):576–83. - PubMed
    1. Gail M, Brinton L, Byar D, et al. Projecting individualized probabilities of developing breast cancer for white females who are being examined annually. J Natl Cancer Inst. 1989;81(24):1879–86. - PubMed
    1. Gail MH. Discriminatory accuracy from single-nucleotide polymorphisms in models to predict breast cancer risk. J Natl Cancer Inst. 2008;100(14):1037–41. - PMC - PubMed

LinkOut - more resources