Quantifying predictive capability of electronic health records for the most harmful breast cancer

Affiliations

¹ University of Wisconsin Madison, WI, USA.
² Marshfield Clinic, Marshfield, WI, USA.
³ Jiangbei People's Hospital, Jiangsu, China.
⁴ China Three Gorges University, Hubei, China.

PMID: 29706685
PMCID: PMC5914175
DOI: 10.1117/12.2293954

Quantifying predictive capability of electronic health records for the most harmful breast cancer

Yirong Wu et al. Proc SPIE Int Soc Opt Eng. 2018 Feb.

. 2018 Feb:10577:105770J.

doi: 10.1117/12.2293954. Epub 2018 Mar 7.

Authors

Affiliations

¹ University of Wisconsin Madison, WI, USA.
² Marshfield Clinic, Marshfield, WI, USA.
³ Jiangbei People's Hospital, Jiangsu, China.
⁴ China Three Gorges University, Hubei, China.

PMID: 29706685
PMCID: PMC5914175
DOI: 10.1117/12.2293954

Abstract

Improved prediction of the "most harmful" breast cancers that cause the most substantive morbidity and mortality would enable physicians to target more intense screening and preventive measures at those women who have the highest risk; however, such prediction models for the "most harmful" breast cancers have rarely been developed. Electronic health records (EHRs) represent an underused data source that has great research and clinical potential. Our goal was to quantify the value of EHR variables in the "most harmful" breast cancer risk prediction. We identified 794 subjects who had breast cancer with primary non-benign tumors with their earliest diagnosis on or after 1/1/2004 from an existing personalized medicine data repository, including 395 "most harmful" breast cancer cases and 399 "least harmful" breast cancer cases. For these subjects, we collected EHR data comprised of 6 components: demographics, diagnoses, symptoms, procedures, medications, and laboratory results. We developed two regularized prediction models, Ridge Logistic Regression (Ridge-LR) and Lasso Logistic Regression (Lasso-LR), to predict the "most harmful" breast cancer one year in advance. The area under the ROC curve (AUC) was used to assess model performance. We observed that the AUCs of Ridge-LR and Lasso-LR models were 0.818 and 0.839 respectively. For both the Ridge-LR and Lasso-LR models, the predictive performance of the whole EHR variables was significantly higher than that of each individual component (p<0.001). In conclusion, EHR variables can be used to predict the "most harmful" breast cancer, providing the possibility to personalize care for those women at the highest risk in clinical practice.

Keywords: breast cancer; electronic health records (EHRs); least absolute shrinkage and selection operator (Lasso); regularized prediction model.

PubMed Disclaimer

Figures

**Figure 1**
Specification of censor date. EHR data in the red oval were used to predict the “most harmful” breast cancer.

**Figure 2**
ROC curves for Ridge-LR models.

**Figure 3**
ROC curves for Lasso-LR models.

See this image and copyright information in PMC

References

1. Feig S. Overdiagnosis of breast cancer at screening is clinically insignificant. Acad Radiol. 2015;22:961–966. - PubMed
1. Helvie M, Chang J, Hendrick R, et al. Reduction in late-stage breast cancer incidence in the mammography era: Implications for overdiagnosis of invasive cancer. Cancer. 2014;120(17):2649–56. - PubMed
1. Ong M, Mandl K. National expenditure for false-positive mammograms and breast cancer overdiagnoses estimated at $4 billion a year. Health Aff. 2015;34(4):576–83. - PubMed
1. Gail M, Brinton L, Byar D, et al. Projecting individualized probabilities of developing breast cancer for white females who are being examined annually. J Natl Cancer Inst. 1989;81(24):1879–86. - PubMed
1. Gail MH. Discriminatory accuracy from single-nucleotide polymorphisms in models to predict breast cancer risk. J Natl Cancer Inst. 2008;100(14):1037–41. - PMC - PubMed

Grants and funding

LinkOut - more resources

Full Text Sources
- Europe PubMed Central
- PubMed Central
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Quantifying predictive capability of electronic health records for the most harmful breast cancer

Affiliations

Quantifying predictive capability of electronic health records for the most harmful breast cancer

Authors

Affiliations

Abstract

Figures

References

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources