Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jan;100(1):e80.
doi: 10.1002/cphg.80. Epub 2018 Dec 5.

Using Electronic Health Records To Generate Phenotypes For Research

Affiliations

Using Electronic Health Records To Generate Phenotypes For Research

Sarah A Pendergrass et al. Curr Protoc Hum Genet. 2019 Jan.

Abstract

Electronic health records contain patient-level data collected during and for clinical care. Data within the electronic health record include diagnostic billing codes, procedure codes, vital signs, laboratory test results, clinical imaging, and physician notes. With repeated clinic visits, these data are longitudinal, providing important information on disease development, progression, and response to treatment or intervention strategies. The near universal adoption of electronic health records nationally has the potential to provide population-scale real-world clinical data accessible for biomedical research, including genetic association studies. For this research potential to be realized, high-quality research-grade variables must be extracted from these clinical data warehouses. We describe here common and emerging electronic phenotyping approaches applied to electronic health records, as well as current limitations of both the approaches and the biases associated with these clinically collected data that impact their use in research. © 2018 by John Wiley & Sons, Inc.

Keywords: computable phenotyping; electronic health records; electronic medical records; electronic phenotyping; precision medicine.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.. Anatomy of International Classification of Diseases (ICD) codes.
ICD-9-CM and ICD-10-CM codes are given for dry (nonexudative) age-related macular degeneration. Note that the expansion of characters in ICD-10-CM codes (3–7) compared with ICD-9-CM codes (3–5) allows for both laterality (6th position) and staging (7th position). 362.51 (362.50 is “macular degeneration (senile), unspecified”; 362.51 is “dry”). H35.3111 (Nonexudative age-related macular degeneration, dry age-related; right eye, early dry stage).
Figure 2.
Figure 2.. Example rule-based algorithm flow chart used to assign case status of a generic disease.
In this simple rule-based example, disease status is first considered using presence or absence of structured data such as ICD-9-CM/ICD-10-CM codes, procedure codes, or mentions of the disease in the problems list. If yes, the algorithm then requires a corroborating laboratory measure and threshold value associated with the disease or condition of interest. True cases of the disease may have the required code or problems list mention but might be missing the laboratory data. In these cases, the algorithm then asks if the patient with the required code or problems list mention has an EHR mention of a mediation associated with the disease or condition of interest. Rule-based algorithms in practice are more complex than shown here and can incorporate a combination of codes, temporal relationships between diagnosis, laboratory tests, or imaging and medication mentions, and natural language processing techniques to search the clinical free text for evidence of case status.

References

    1. (1996). Health Insurance Portability and Accountability Act (HIPPA). Public Law 104–191. t. Congress Public Law 104.-.
    1. (2011). “The Benefits Of Health Information Technology: A Review Of The Recent Literature Shows Predominantly Positive Results.” Health Affairs 30(3): 464–471. - PubMed
    1. (CDC), C. f. D. C. a. P. and N. C. f. H. S. (NCHS) (2012) National Health and Nutrition Examination Surveys (NHANES).
    1. (CMS), C. f. M. M. S. and HHS (2010). “Medicare and Medicaid pograms; electronic health record incentive program. Final rule.” Fed Regist 75(144): 44313–44588. - PubMed
    1. Abul-Husn NS, et al. (2016). “Genetic identification of familial hypercholesterolemia within a single U.S. health care system.” Science 354(6319). - PubMed

Publication types

LinkOut - more resources