Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024;2(2):e001666.
doi: 10.1136/bmjph-2024-001666.

Addressing Selection Biases within Electronic Health Record Data for Estimation of Diabetes Prevalence among New York City Young Adults: A Cross-Sectional Study

Affiliations

Addressing Selection Biases within Electronic Health Record Data for Estimation of Diabetes Prevalence among New York City Young Adults: A Cross-Sectional Study

Sarah Conderino et al. BMJ Public Health. 2024.

Abstract

Introduction: There is growing interest in using electronic health records (EHRs) for chronic disease surveillance. However, these data are convenience samples of in-care individuals, which are not representative of target populations for public health surveillance, generally defined, for the relevant period, as resident populations within city, state, or other jurisdictions. We focus on using EHR data for estimation of diabetes prevalence among young adults in New York City, as rising diabetes burden in younger ages call for better surveillance capacity.

Methods: This article applies common nonprobability sampling methods, including raking, post-stratification, and multilevel regression with post-stratification, to real and simulated data for the cross-sectional estimation of diabetes prevalence among those aged 18-44 years. Within real data analyses, we externally validate city- and neighborhood-level EHR-based estimates to gold-standard estimates from a local health survey. Within data simulations, we probe the extent to which residual biases remain when selection into the EHR sample is non-ignorable.

Results: Within the real data analyses, these methods reduced the impact of selection biases in the citywide prevalence estimate compared to gold standard. Residual biases remained at the neighborhood-level, where prevalence tended to be overestimated, especially in neighborhoods where a higher proportion of residents were captured in the sample. Simulation results demonstrated these methods may be sufficient, except when selection into the EHR is non-ignorable, depending on unmeasured factors or on diabetes status.

Conclusions: While EHRs offer potential to innovate on chronic disease surveillance, care is needed when estimating prevalence for small geographies or when selection is non-ignorable.

Keywords: diabetes mellitus; electronic health records; prevalence; selection bias; surveillance.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors declare no competing interests.

Figures

Figure 1
Figure 1. Simulation study directed acyclic graph with baseline OR associations. Observed diabetes within those selected into the EHR sample; scenario 1 (orange): modified the level of misclassification of the auxiliary variable W compared with the unobserved variable U at levels equivalent to 10%, 30%, 50%, 70% and 90% misclassification; scenario 2 (purple): modified the association between diabetes and selection at OR levels of 0.33, 0.67, 1.0, 1.5 and 3.0. DM, diabetes mellitus; EHR, electronic health record; HIS, Hispanic; NHB, non-Hispanic Black; OTH, Other race.
Figure 2
Figure 2. Characterisation of the NYU Langone patient sample and comparison of NYU EHR-based to gold standard diabetes prevalence estimates for young adults aged 18–44 years by New York City PUMA neighbourhood. (A) Proportion of general population captured within the EHR sample by NYC PUMA, calculated by dividing NYU Langone patient counts by the total NYC PUMA population estimates from the American Community Survey 2019 5-year data, obtained through IPUMS USA. (B) Comparison of NYU EHR-based to gold standard diabetes prevalence estimates. Each point represents a PUMA neighbourhood. EHR estimates are defined using NYU Langone Health 2019 data. The gold standard estimate is defined using NYC CHS 2015–2020 data. (C) Comparison of relative bias in NYU EHR-based prevalence estimates versus proportion of the general population captured within the EHR sample. Relative bias is calculated as the per cent change between the gold standard and EHR-based prevalence estimate for each NYC PUMA neighbourhood. CHS, Community Health Survey; EHR, electronic health record; MLRP, multilevel regression with post-stratification; NYC, New York City; NYU, NYU Langone Health; PUMA, Public Use Microdata Area.
Figure 3
Figure 3. Mean relative bias in the EHR-based estimates versus true diabetes prevalence by simulation scenario. Error bars represent SD in mean relative bias across simulations. (A) Scenario 1 modified the level of misclassification of the auxiliary variable W compared with the unobserved variable U; (B) scenario 2 modified the association between diabetes and selection (ORDM). EHR, electronic health record; MLRP, multilevel regression with post-stratification.

Similar articles

Cited by

References

    1. Perlman SE. Use and Visualization of Electronic Health Record Data to Advance Public Health. Am J Public Health. 2021;111:180–2. doi: 10.2105/AJPH.2020.306073. - DOI - PMC - PubMed
    1. Kruse CS, Stein A, Thomas H, et al. The use of Electronic Health Records to Support Population Health: A Systematic Review of the Literature. J Med Syst. 2018;42:1–16. doi: 10.1007/s10916-018-1075-6. - DOI - PMC - PubMed
    1. Queenan JA, Williamson T, Khan S, et al. Representativeness of patients and providers in the Canadian Primary Care Sentinel Surveillance Network: a cross-sectional study. CMAJ Open. 2016;4:E28–32. - PMC - PubMed
    1. Romo ML, Chan PY, Lurie-Moroni E, et al. Characterizing Adults Receiving Primary Medical Care in New York City: Implications for Using Electronic Health Records for Chronic Disease Surveillance. Prev Chronic Dis. 2016;13:E56. doi: 10.5888/pcd13.150500. - DOI - PMC - PubMed
    1. Bower JK, Patel S, Rudy JE, et al. Addressing Bias in Electronic Health Record-Based Surveillance of Cardiovascular Disease Risk: Finding the Signal Through the Noise. Curr Epidemiol Rep. 2017;4:346–52. doi: 10.1007/s40471-017-0130-z. - DOI - PMC - PubMed

LinkOut - more resources