Prevalence estimation by joint use of big data and health survey: a demonstration study using electronic health records in New York city
- PMID: 32252642
- PMCID: PMC7137316
- DOI: 10.1186/s12874-020-00956-6
Prevalence estimation by joint use of big data and health survey: a demonstration study using electronic health records in New York city
Abstract
Background: Electronic Health Records (EHR) has been increasingly used as a tool to monitor population health. However, subject-level errors in the records can yield biased estimates of health indicators. There is an urgent need for methods to estimate the prevalence of health indicators using large and real-time EHR while correcting the potential bias.
Methods: We demonstrate joint analyses of EHR and a smaller gold-standard health survey. We first adopted Mosteller's method that pools two estimators, among which one is potentially biased. It only requires knowing the prevalence estimates from two data sources and their standard errors. Then, we adopted the method of Schenker et al., which uses multiple imputations of subject-level health outcomes that are missing for the subjects in EHR. This procedure requires information to link some subjects between two sources and modeling the mechanism of misclassification in EHR as well as modeling inclusion probabilities to both sources.
Results: In a simulation study, both estimators yielded negligible bias even when EHR was biased. They performed as well as health survey estimator when EHR bias was large and better than health survey estimator when EHR bias was moderate. It may be challenging to model the misclassification mechanism in real data for the subject-level imputation estimator. We illustrated the methods analyzing six health indicators from 2013 to 14 NYC HANES and the 2013 NYC Macroscope, and a study that linked some subjects in both data sources.
Conclusions: When a small gold-standard health survey exists, it can serve as a safeguard against potential bias in EHR through the joint analysis of the two sources.
Keywords: Big data; Electronic health records; Measurement error; Multiple imputations; Population health surveillance; Selection bias.
Conflict of interest statement
Drs. Kim and Shankar have provided paid statistical consultation to NYC DOHMH on projects, including the joint analysis of Macroscope and NYC HANES data.
Figures
Similar articles
-
Using Calibration to Reduce Measurement Error in Prevalence Estimates Based on Electronic Health Records.Prev Chronic Dis. 2018 Dec 13;15:E155. doi: 10.5888/pcd15.180371. Prev Chronic Dis. 2018. PMID: 30576279 Free PMC article.
-
Can Electronic Health Records Be Used for Population Health Surveillance? Validating Population Health Metrics Against Established Survey Data.EGEMS (Wash DC). 2016 Dec 15;4(1):1267. doi: 10.13063/2327-9214.1267. eCollection 2016. EGEMS (Wash DC). 2016. PMID: 28154837 Free PMC article.
-
Monitoring Prevalence, Treatment, and Control of Metabolic Conditions in New York City Adults Using 2013 Primary Care Electronic Health Records: A Surveillance Validation Study.EGEMS (Wash DC). 2016 Dec 15;4(1):1266. doi: 10.13063/2327-9214.1266. eCollection 2016. EGEMS (Wash DC). 2016. PMID: 28154836 Free PMC article.
-
The use of electronic health records to inform cancer surveillance efforts: a scoping review and test of indicators for public health surveillance of cancer prevention and control.BMC Med Inform Decis Mak. 2022 Apr 6;22(1):91. doi: 10.1186/s12911-022-01831-8. BMC Med Inform Decis Mak. 2022. PMID: 35387655 Free PMC article.
-
Statistical Methods for Phenotype Estimation and Analysis Using Electronic Health Records [Internet].Washington (DC): Patient-Centered Outcomes Research Institute (PCORI); 2021 Mar. Washington (DC): Patient-Centered Outcomes Research Institute (PCORI); 2021 Mar. PMID: 39133799 Free Books & Documents. Review.
Cited by
-
Small-area estimation for public health surveillance using electronic health record data: reducing the impact of underrepresentation.BMC Public Health. 2022 Aug 9;22(1):1515. doi: 10.1186/s12889-022-13809-2. BMC Public Health. 2022. PMID: 35945537 Free PMC article.
-
Leveraging Electronic Health Records to Construct a Phenotype for Hypertension Surveillance in the United States.Am J Hypertens. 2023 Nov 15;36(12):677-685. doi: 10.1093/ajh/hpad081. Am J Hypertens. 2023. PMID: 37696605 Free PMC article.
References
-
- Thorpe LE, McVeigh KH, Perlman S, Chan PY, Bartley K, Schreibstein L, et al. Monitoring prevalence, treatment, and control of metabolic conditions in New York City adults using 2013 primary care electronic health records: a surveillance validation study. EGEMS (Washington, DC) 2016;4(1):1266. - PMC - PubMed
Publication types
MeSH terms
Grants and funding
- The study began as the authors worked as paid statistical consultants for NYC DOHMH to analyze NYC Macroscope jointly with NYC HANES. When the work ended, the authors continued further simulation studies to write the manuscript./New York City Department of Health and Mental Hygine/International
- The study began as the authors worked as paid statistical consultants for NYC DOHMH to analyze NYC Macroscope jointly with NYC HANES. When the work ended, the authors continued further simulation studies to write the manuscript./New York City Department of Health and Mental Hygiene (US)/International
LinkOut - more resources
Full Text Sources