Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Nov 1;27(11):1675-1687.
doi: 10.1093/jamia/ocaa104.

PheMap: a multi-resource knowledge base for high-throughput phenotyping within electronic health records

Affiliations

PheMap: a multi-resource knowledge base for high-throughput phenotyping within electronic health records

Neil S Zheng et al. J Am Med Inform Assoc. .

Abstract

Objective: Developing algorithms to extract phenotypes from electronic health records (EHRs) can be challenging and time-consuming. We developed PheMap, a high-throughput phenotyping approach that leverages multiple independent, online resources to streamline the phenotyping process within EHRs.

Materials and methods: PheMap is a knowledge base of medical concepts with quantified relationships to phenotypes that have been extracted by natural language processing from publicly available resources. PheMap searches EHRs for each phenotype's quantified concepts and uses them to calculate an individual's probability of having this phenotype. We compared PheMap to clinician-validated phenotyping algorithms from the Electronic Medical Records and Genomics (eMERGE) network for type 2 diabetes mellitus (T2DM), dementia, and hypothyroidism using 84 821 individuals from Vanderbilt Univeresity Medical Center's BioVU DNA Biobank. We implemented PheMap-based phenotypes for genome-wide association studies (GWAS) for T2DM, dementia, and hypothyroidism, and phenome-wide association studies (PheWAS) for variants in FTO, HLA-DRB1, and TCF7L2.

Results: In this initial iteration, the PheMap knowledge base contains quantified concepts for 841 disease phenotypes. For T2DM, dementia, and hypothyroidism, the accuracy of the PheMap phenotypes were >97% using a 50% threshold and eMERGE case-control status as a reference standard. In the GWAS analyses, PheMap-derived phenotype probabilities replicated 43 of 51 previously reported disease-associated variants for the 3 phenotypes. For 9 of the 11 top associations, PheMap provided an equivalent or more significant P value than eMERGE-based phenotypes. The PheMap-based PheWAS showed comparable or better performance to a traditional phecode-based PheWAS. PheMap is publicly available online.

Conclusions: PheMap significantly streamlines the process of extracting research-quality phenotype information from EHRs, with comparable or better performance to current phenotyping approaches.

Keywords: electronic health records; high-throughput phenotyping; natural language processing.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
(A) Venn diagram for the 841 unique phenotypes found in the 5 online medical resources. The phenotypes are represented by phecodes, which are manually aggregated diagnosis codes designed for PheWAS with EHRs. An overlap between 2 resources indicates that both resources have descriptions about those phenotypes. There are 774 (92%) phenotypes that are covered by at least 2 resources. (B) Flowchart describing the process of constructing the PheMap knowledge base and calculating phenotype scores and phenotype probabilities. EHR: electronic health record; PheWAS: phenome-wide association studies; TF-IDF: term frequency–inverse document frequency; NLP: natural language processing.
Figure 2.
Figure 2.
PheMap phenotype score distributions of (A) T2DM, (B) dementia, and (C) hypothyroidism as box plots (left) and density plots (right), stratified by case-control status defined with clinician-validated eMERGE phenotyping algorithms. For each box plot, the band indicates the median, the boxes indicate the IQR, and the whiskers indicate the minimum and maximum values within 1.5 × IQR from the first and third quartiles, respectively. The circles indicate individual outlier values. eMERGE: Electronic Medical Records and Genomics; IQR: interquartile range; T2DM: type 2 diabetes mellitus.
Figure 3.
Figure 3.
Manhattan plots of genome-wide association analyses with eMERGE case-control status (left) and PheMap phenotype probability (right) in (A) T2DM, (B) dementia, and (C) hypothyroidism. The red lines on Manhattan plots show the genome-wide significance level (5.0 × 10−8). eMERGE: Electronic Medical Records and Genomics; T2DM: type 2 diabetes mellitus.
Figure 4.
Figure 4.
Manhattan plots of phenome-wide association analyses with phecodes (left) and PheMap phenotype probability (right) in (A) FTO (rs8050136), (B) HLA-DRB1 (rs3135388), and (C) TCF7L2 (rs7903146). The red lines on Manhattan plots show the Bonferroni level of significance (5.0 × 10−5). Only phenotypes that cross the Bonferroni level of significance are annotated.

References

    1. Kohane IS. Using electronic health records to drive discovery in disease genomics. Nat Rev Genet 2011; 12(6): 417–28. doi:10.1038/nrg2999 - DOI - PubMed
    1. Wei WQ, Denny JC. Extracting research-quality phenotypes from electronic health records to support precision medicine. Genome Med 2015; 7(1): 41. doi:10.1186/s13073-015-0166-y - DOI - PMC - PubMed
    1. Gottesman O, Kuivaniemi H, Tromp G, et al. The Electronic Medical Records and Genomics (eMERGE) network: past, present, and future. Genet Med 2013; 15(10): 761–71. doi:10.1038/gim.2013.72 - DOI - PMC - PubMed
    1. Newton KM, Peissig PL, Kho AN, et al. Validation of electronic medical record-based phenotyping algorithms: results and lessons learned from the eMERGE network. J Am Med Inform Assoc 2013; 20(e1): e147–54. doi:10.1136/amiajnl-2012-000896 - DOI - PMC - PubMed
    1. Hripcsak G, Shang N, Peissig PL, et al. Facilitating phenotype transfer using a common data model. J Biomed Inform 2019; 96: 103253. doi:10.1016/j.jbi.2019.103253 - DOI - PMC - PubMed

Publication types