Opportunities and challenges in developing risk prediction models with electronic health records data: a systematic review

Benjamin A Goldstein^{1

2}, Ann Marie Navar^{2

3}, Michael J Pencina^{4

2}, John P A Ioannidis^{5

6}

Affiliations

¹ Department of Biostatistics & Bioinformatics, Duke University, Durham, NC 27710, USA ben.goldstein@duke.edu.
² Center for Predictive Medicine, Duke Clinical Research Institute, Duke University, Durham, NC 27710, USA.
³ Division of Cardiology at Duke University Medical Center, Duhram, NC 27710, USA.
⁴ Department of Biostatistics & Bioinformatics, Duke University, Durham, NC 27710, USA.
⁵ Department of Medicine, Stanford University, Palo Alto, CA 94305, USA.
⁶ Department of Health Research and Policy, and Statistics and Meta-Research Innovation Center at Stanford, Stanford University, Palo Alto, CA 94305, USA.

PMID: 27189013
PMCID: PMC5201180
DOI: 10.1093/jamia/ocw042

Opportunities and challenges in developing risk prediction models with electronic health records data: a systematic review

Benjamin A Goldstein et al. J Am Med Inform Assoc. 2017 Jan.

. 2017 Jan;24(1):198-208.

doi: 10.1093/jamia/ocw042. Epub 2016 May 17.

Authors

Benjamin A Goldstein^{1

2}, Ann Marie Navar^{2

3}, Michael J Pencina^{4

2}, John P A Ioannidis^{5

6}

Affiliations

¹ Department of Biostatistics & Bioinformatics, Duke University, Durham, NC 27710, USA ben.goldstein@duke.edu.
² Center for Predictive Medicine, Duke Clinical Research Institute, Duke University, Durham, NC 27710, USA.
³ Division of Cardiology at Duke University Medical Center, Duhram, NC 27710, USA.
⁴ Department of Biostatistics & Bioinformatics, Duke University, Durham, NC 27710, USA.
⁵ Department of Medicine, Stanford University, Palo Alto, CA 94305, USA.
⁶ Department of Health Research and Policy, and Statistics and Meta-Research Innovation Center at Stanford, Stanford University, Palo Alto, CA 94305, USA.

PMID: 27189013
PMCID: PMC5201180
DOI: 10.1093/jamia/ocw042

Abstract

Objective: Electronic health records (EHRs) are an increasingly common data source for clinical risk prediction, presenting both unique analytic opportunities and challenges. We sought to evaluate the current state of EHR based risk prediction modeling through a systematic review of clinical prediction studies using EHR data.

Methods: We searched PubMed for articles that reported on the use of an EHR to develop a risk prediction model from 2009 to 2014. Articles were extracted by two reviewers, and we abstracted information on study design, use of EHR data, model building, and performance from each publication and supplementary documentation.

Results: We identified 107 articles from 15 different countries. Studies were generally very large (median sample size = 26 100) and utilized a diverse array of predictors. Most used validation techniques (n = 94 of 107) and reported model coefficients for reproducibility (n = 83). However, studies did not fully leverage the breadth of EHR data, as they uncommonly used longitudinal information (n = 37) and employed relatively few predictor variables (median = 27 variables). Less than half of the studies were multicenter (n = 50) and only 26 performed validation across sites. Many studies did not fully address biases of EHR data such as missing data or loss to follow-up. Average c-statistics for different outcomes were: mortality (0.84), clinical prediction (0.83), hospitalization (0.71), and service utilization (0.71).

Conclusions: EHR data present both opportunities and challenges for clinical risk prediction. There is room for improvement in designing such studies.

Keywords: Electronic Medical Record; Review; Risk Assessment.

PubMed Disclaimer

Figures

**Figure 1.**
Leveraging of available data across different studies.

**Figure 2.**
Distribution of c-statistics across different outcomes. Thirteen modeled more than one outcome type.

See this image and copyright information in PMC

References

1. Charles D, Gabriel M, Searcy T. Adoption of electronic health record systems among U.S. non-federal acute care hospitals: 2008-2014. 2015https://www.healthit.gov/sites/default/files/data-brief/2014HospitalAdop....
1. Rothman B, Leonard JC, Vigoda MM. Future of electronic health records: implications for decision support. Mt Sinai J Med NY. 2012;79(6): 757–768. - PubMed
1. Wilson PW, D’Agostino RB, Levy D, Belanger AM, Silbershatz H, Kannel WB. Prediction of coronary heart disease using risk factor categories. Circulation. 1998;97(18):1837–1847. - PubMed
1. Hersh WR, Weiner MG, Embi PJ, et al. Caveats for the use of operational electronic health record data in comparative effectiveness research. Med Care. 2013;51(8 Suppl 3):S30–S37. - PMC - PubMed
1. Richesson RL, Rusincovitch SA, Wixted D, et al. A comparison of phenotype definitions for diabetes mellitus. J Am Med Inform Assoc. 2013;20(e2): e319–e326. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions

Grants and funding

K25 DK097279/DK/NIDDK NIH HHS/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Opportunities and challenges in developing risk prediction models with electronic health records data: a systematic review

Affiliations

Opportunities and challenges in developing risk prediction models with electronic health records data: a systematic review

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources