Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Dec 17;1(3):1035.
doi: 10.13063/2327-9214.1035. eCollection 2013.

Strategies for handling missing data in electronic health record derived data

Affiliations

Strategies for handling missing data in electronic health record derived data

Brian J Wells et al. EGEMS (Wash DC). .

Abstract

Electronic health records (EHRs) present a wealth of data that are vital for improving patient-centered outcomes, although the data can present significant statistical challenges. In particular, EHR data contains substantial missing information that if left unaddressed could reduce the validity of conclusions drawn. Properly addressing the missing data issue in EHR data is complicated by the fact that it is sometimes difficult to differentiate between missing data and a negative value. For example, a patient without a documented history of heart failure may truly not have disease or the clinician may have simply not documented the condition. Approaches for reducing missing data in EHR systems come from multiple angles, including: increasing structured data documentation, reducing data input errors, and utilization of text parsing / natural language processing. This paper focuses on the analytical approaches for handling missing data, primarily multiple imputation. The broad range of variables available in typical EHR systems provide a wealth of information for mitigating potential biases caused by missing data. The probability of missing data may be linked to disease severity and healthcare utilization since unhealthier patients are more likely to have comorbidities and each interaction with the health care system provides an opportunity for documentation. Therefore, any imputation routine should include predictor variables that assess overall health status (e.g. Charlson Comorbidity Index) and healthcare utilization (e.g. number of encounters) even when these comorbidities and patient encounters are unrelated to the disease of interest. Linking the EHR data with other sources of information (e.g. National Death Index and census data) can also provide less biased variables for imputation. Additional methodological research with EHR data and improved epidemiological training of clinical investigators is warranted.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Overview of the missing data problem with electronic health records.

References

    1. Blumenthal D. Launching HITECH. N Engl J Med. 2010;362:382–385. - PubMed
    1. Kokkonen EW, Davis SA, Lin HC, Dabade TS, Feldman SR, Fleischer AB., Jr Use of electronic medical records differs by specialty and office settings. J Am Med Inform Assoc. 2013;20:e33–8. - PMC - PubMed
    1. Ammenwerth E, Schnell-Inderst P, Machan C, Siebert U. The effect of electronic prescribing on medication errors and adverse drug events: a systematic review. J Am Med Inform Assoc. 2008;15:585–600. - PMC - PubMed
    1. Devine EB, Hansen RN, Wilson-Norton JL, Lawless NM, Fisk AW, Blough DK, Martin DP, Sullivan SD. The impact of computerized provider order entry on medication errors in a multispecialty group practice. J Am Med Inform Assoc. 2010;17:78–84. - PMC - PubMed
    1. Romano MJ, Stafford RS. Electronic health records and clinical decision support systems: impact on national ambulatory care quality. Arch Intern Med. 2011;171:897–903. - PMC - PubMed