Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Apr 30;7(1):41.
doi: 10.1186/s13073-015-0166-y. eCollection 2015.

Extracting research-quality phenotypes from electronic health records to support precision medicine

Affiliations

Extracting research-quality phenotypes from electronic health records to support precision medicine

Wei-Qi Wei et al. Genome Med. .

Abstract

The convergence of two rapidly developing technologies - high-throughput genotyping and electronic health records (EHRs) - gives scientists an unprecedented opportunity to utilize routine healthcare data to accelerate genomic discovery. Institutions and healthcare systems have been building EHR-linked DNA biobanks to enable such a vision. However, the precise extraction of detailed disease and drug-response phenotype information hidden in EHRs is not an easy task. EHR-based studies have successfully replicated known associations, made new discoveries for diseases and drug response traits, rapidly contributed cases and controls to large meta-analyses, and demonstrated the potential of EHRs for broad-based phenome-wide association studies. In this review, we summarize the advantages and challenges of repurposing EHR data for genetic research. We also highlight recent notable studies and novel approaches to provide an overview of advanced EHR-based phenotyping.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Algorithm for the identification of subjects with type 2 diabetes. Normal glucose values are random glucose >200 mg/dl, fasting glucose >125 mg/dl. Normal HbA1c ≥6.5%. Dx, diagnosis; HbA1c, hemoglobin A1c; ICD-9, International Classification of Diseases, Ninth Revision; Rx, treatment; T1DM, type 1 diabetes mellitus; T2DM, type 2 diabetes mellitus. Figure reprinted with permission from Kho et al. [57].
Figure 2
Figure 2
EHR data structure and accurate phenotyping. (a) Electronic health record (EHR) data can be structured or unstructured. Structured data are easy to retrieve whereas unstructured data require additional tools to be used for phenotyping, such as natural language processing (NLP). (b) Accurate phenotyping often requires extracting information from billing codes, prescriptions, laboratory tests and clinical notes. This information can be either structured or unstructured. ICD-9, International Classification of Diseases, Ninth Revision.
Figure 3
Figure 3
The numbers of GWAS papers and EHR-based genetic studies per year. The horizontal axis represents time. The vertical axis is the log of the number of publications. Data source: National Human Genome Research Institute GWAS Catalog and PubMed.

References

    1. Manolio TA. Genomewide association studies and assessment of the risk of disease. N Engl J Med. 2010;363:166–176. doi: 10.1056/NEJMra0905980. - DOI - PubMed
    1. Welter D, MacArthur J, Morales J, Burdett T, Hall P, Junkins H, Klemm A, et al. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res. 2014;42:D1001–D1006. doi: 10.1093/nar/gkt1229. - DOI - PMC - PubMed
    1. Boycott KM, Vanstone MR, Bulman DE, MacKenzie AE. Rare-disease genetics in the era of next-generation sequencing: discovery to translation. Nat Rev Genet. 2013;14:681–691. doi: 10.1038/nrg3555. - DOI - PubMed
    1. SIGMA Type 2 Diabetes Consortium. Williams AL, Jacobs SB, Moreno-Macías H, Huerta-Chagoya A, Churchhouse C, et al. Sequence variants in SLC16A11 are a common risk factor for type 2 diabetes in Mexico. Nature. 2014;506:97–101. - PMC - PubMed
    1. Consortium GLG, Willer CJ, Schmidt EM. Sengupta S, Peloso GM, Gustafsson S, et al. Discovery and refinement of loci associated with lipid levels. Nat Genet. 2013;45:1274–1283. doi: 10.1038/ng.2797. - DOI - PMC - PubMed

LinkOut - more resources