Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Feb 4:12:133-141.
doi: 10.2147/CLEP.S232540. eCollection 2020.

External Validation of an Algorithm to Identify Patients with High Data-Completeness in Electronic Health Records for Comparative Effectiveness Research

Affiliations

External Validation of an Algorithm to Identify Patients with High Data-Completeness in Electronic Health Records for Comparative Effectiveness Research

Kueiyu Joshua Lin et al. Clin Epidemiol. .

Abstract

Objective: Electronic health records (EHR) data-discontinuity, i.e. receiving care outside of a particular EHR system, may cause misclassification of study variables. We aimed to validate an algorithm to identify patients with high EHR data-continuity to reduce such bias.

Materials and methods: We analyzed data from two EHR systems linked with Medicare claims data from 2007 through 2014, one in Massachusetts (MA, n=80,588) and the other in North Carolina (NC, n=33,207). We quantified EHR data-continuity by Mean Proportion of Encounters Captured (MPEC) by the EHR system when compared to complete recording in claims data. The prediction model for MPEC was developed in MA and validated in NC. Stratified by predicted EHR data-continuity, we quantified misclassification of 40 key variables by Mean Standardized Differences (MSD) between the proportions of these variables based on EHR alone vs the linked claims-EHR data.

Results: The mean MPEC was 27% in the MA and 26% in the NC system. The predicted and observed EHR data-continuity was highly correlated (Spearman correlation=0.78 and 0.73, respectively). The misclassification (MSD) of 40 variables in patients of the predicted EHR data-continuity cohort was significantly smaller (44%, 95% CI: 40-48%) than that in the remaining population.

Discussion: The comorbidity profiles were similar in patients with high vs low EHR data-continuity. Therefore, restricting an analysis to patients with high EHR data-continuity may reduce information bias while preserving the representativeness of the study cohort.

Conclusion: We have successfully validated an algorithm that can identify a high EHR data-continuity cohort representative of the source population.

Keywords: comparative effectiveness research; continuity; data linkage; electronic medical records; external validation; information bias.

PubMed Disclaimer

Conflict of interest statement

Dr Robert Glynn reports grants from Kowa, Novartis, Pfizer, and Astra Zeneca, outside the submitted work; Dr Sebastian Schneeweise reports personal fees from WHISCON, LLC, Aetion, Inc., outside the submitted work. The authors report no other conflicts of interest in this work.

Figures

Figure 1
Figure 1
Proportion of encounters captured by electronic health record systems.
Figure 2
Figure 2
Sensitivity and misclassification by predicted EHR data-continuity in the training (MA) and validation (NC) EHR systems. Abbreviations: EHR, electronic health records; NC, North Carolina; MA, Massachusetts.
Figure 3
Figure 3
Representativeness: Comparison of combined comorbidity score in patients with high vs low predicted EHR data-continuity in the validation EHR system (NC). Notes: aPatients in the lower 8 deciles of predicted EHR data-continuity; bPatients in the top 2 deciles of predicted EHR data-continuity. Stand diff = Standardized difference. Combined comorbidity score ranges between −2 and 26 with a higher score associated with higher mortality; cell size<10 were not presented here.

References

    1. Smith M, Stuckhardt I. Best Care at Lower Cost: The Path to Continuously Learning Health Care in America. National Academies Press; 2012. - PubMed
    1. Randhawa GS. Building electronic data infrastructure for comparative effectiveness research: accomplishments, lessons learned and future steps. J Comp Eff Res. 2014;3(6):567–572. doi:10.2217/cer.14.73 - DOI - PubMed
    1. Corley DA, Feigelson HS, Lieu TA, McGlynn EA. Building data infrastructure to evaluate and improve quality: pCORnet. J Oncol Pract. 2015;11(3):204–206. doi:10.1200/JOP.2014.003194 - DOI - PMC - PubMed
    1. Weber GM, Adams WG, Bernstam EV, et al. Biases introduced by filtering electronic health records for patients with “complete data”. J Am Med Inform Assoc. 2017;24(6):1134–1141. doi:10.1093/jamia/ocx071 - DOI - PMC - PubMed
    1. Lin KJ, Glynn RJ, Singer DE, Murphy SN, Lii J, Schneeweiss S. Out-of-system care and recording of patient characteristics critical for comparative effectiveness research. Epidemiology. 2018;29(3):356–363. doi:10.1097/EDE.0000000000000794 - DOI - PMC - PubMed

LinkOut - more resources