. 2020 Apr 6;20(1):77.

doi: 10.1186/s12874-020-00956-6.

Prevalence estimation by joint use of big data and health survey: a demonstration study using electronic health records in New York city

Ryung S Kim¹, Viswanathan Shankar²

Affiliations

¹ Department of Epidemiology and Population Health, Albert Einstein College of Medicine, 1300 Morris Park Ave, Bronx, NY, 10461, USA. ryung.kim@einsteinmed.org.
² Department of Epidemiology and Population Health, Albert Einstein College of Medicine, 1300 Morris Park Ave, Bronx, NY, 10461, USA.

PMID: 32252642
PMCID: PMC7137316
DOI: 10.1186/s12874-020-00956-6

Prevalence estimation by joint use of big data and health survey: a demonstration study using electronic health records in New York city

Ryung S Kim et al. BMC Med Res Methodol. 2020.

. 2020 Apr 6;20(1):77.

doi: 10.1186/s12874-020-00956-6.

Authors

Ryung S Kim¹, Viswanathan Shankar²

Affiliations

¹ Department of Epidemiology and Population Health, Albert Einstein College of Medicine, 1300 Morris Park Ave, Bronx, NY, 10461, USA. ryung.kim@einsteinmed.org.
² Department of Epidemiology and Population Health, Albert Einstein College of Medicine, 1300 Morris Park Ave, Bronx, NY, 10461, USA.

PMID: 32252642
PMCID: PMC7137316
DOI: 10.1186/s12874-020-00956-6

Abstract

Background: Electronic Health Records (EHR) has been increasingly used as a tool to monitor population health. However, subject-level errors in the records can yield biased estimates of health indicators. There is an urgent need for methods to estimate the prevalence of health indicators using large and real-time EHR while correcting the potential bias.

Methods: We demonstrate joint analyses of EHR and a smaller gold-standard health survey. We first adopted Mosteller's method that pools two estimators, among which one is potentially biased. It only requires knowing the prevalence estimates from two data sources and their standard errors. Then, we adopted the method of Schenker et al., which uses multiple imputations of subject-level health outcomes that are missing for the subjects in EHR. This procedure requires information to link some subjects between two sources and modeling the mechanism of misclassification in EHR as well as modeling inclusion probabilities to both sources.

Results: In a simulation study, both estimators yielded negligible bias even when EHR was biased. They performed as well as health survey estimator when EHR bias was large and better than health survey estimator when EHR bias was moderate. It may be challenging to model the misclassification mechanism in real data for the subject-level imputation estimator. We illustrated the methods analyzing six health indicators from 2013 to 14 NYC HANES and the 2013 NYC Macroscope, and a study that linked some subjects in both data sources.

Conclusions: When a small gold-standard health survey exists, it can serve as a safeguard against potential bias in EHR through the joint analysis of the two sources.

Keywords: Big data; Electronic health records; Measurement error; Multiple imputations; Population health surveillance; Selection bias.

PubMed Disclaimer

Conflict of interest statement

Drs. Kim and Shankar have provided paid statistical consultation to NYC DOHMH on projects, including the joint analysis of Macroscope and NYC HANES data.

Figures

**Fig. 1**
Data elements in the 2013–14 NYC HANES, limited to the in-care population and stratified by whether the participant was in the chart review study, and 2013 NYC Macroscope

See this image and copyright information in PMC

References

1. Paul MM, Greene CM, Newton-Dame R, Thorpe LE, Perlman SE, McVeigh KH, et al. The state of population health surveillance using electronic health records: a narrative review. Popul Health Manag. 2015;18(3):209–216. doi: 10.1089/pop.2014.0093. - DOI - PubMed
1. Newton-Dame R, McVeigh KH, Schreibstein L, Perlman S, Lurie-Moroni E, Jacobson L, et al. Design of the New York City Macroscope: innovations in population health surveillance using electronic health records. EGEMS (Washington, DC) 2016;4(1):1265. - PMC - PubMed
1. Thorpe LE, McVeigh KH, Perlman S, Chan PY, Bartley K, Schreibstein L, et al. Monitoring prevalence, treatment, and control of metabolic conditions in New York City adults using 2013 primary care electronic health records: a surveillance validation study. EGEMS (Washington, DC) 2016;4(1):1266. - PMC - PubMed
1. McVeigh KH, Newton-Dame R, Chan PY, Thorpe LE, Schreibstein L, Tatem KS, et al. Can electronic health records be used for population health surveillance? Validating population health metrics against established survey data. EGEMS (Washington, DC) 2016;4(1):1267. - PMC - PubMed
1. McVeigh KH, Lurie-Moroni E, Chan PY, Newton-Dame R, Schreibstein L, Tatem KS, et al. Generalizability of indicators from the New York city macroscope electronic health record surveillance system to systems based on other EHR platforms. EGEMS (Washington, DC) 2017;5(1):25. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Prevalence estimation by joint use of big data and health survey: a demonstration study using electronic health records in New York city

Affiliations

Prevalence estimation by joint use of big data and health survey: a demonstration study using electronic health records in New York city

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources