Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Jul 5;103(1):58-73.
doi: 10.1016/j.ajhg.2018.05.010. Epub 2018 Jun 28.

Deep Phenotyping on Electronic Health Records Facilitates Genetic Diagnosis by Clinical Exomes

Affiliations

Deep Phenotyping on Electronic Health Records Facilitates Genetic Diagnosis by Clinical Exomes

Jung Hoon Son et al. Am J Hum Genet. .

Abstract

Integration of detailed phenotype information with genetic data is well established to facilitate accurate diagnosis of hereditary disorders. As a rich source of phenotype information, electronic health records (EHRs) promise to empower diagnostic variant interpretation. However, how to accurately and efficiently extract phenotypes from heterogeneous EHR narratives remains a challenge. Here, we present EHR-Phenolyzer, a high-throughput EHR framework for extracting and analyzing phenotypes. EHR-Phenolyzer extracts and normalizes Human Phenotype Ontology (HPO) concepts from EHR narratives and then prioritizes genes with causal variants on the basis of the HPO-coded phenotype manifestations. We assessed EHR-Phenolyzer on 28 pediatric individuals with confirmed diagnoses of monogenic diseases and found that the genes with causal variants were ranked among the top 100 genes selected by EHR-Phenolyzer for 16/28 individuals (p < 2.2 × 10-16), supporting the value of phenotype-driven gene prioritization in diagnostic sequence interpretation. To assess the generalizability, we replicated this finding on an independent EHR dataset of ten individuals with a positive diagnosis from a different institution. We then assessed the broader utility by examining two additional EHR datasets, including 31 individuals who were suspected of having a Mendelian disease and underwent different types of genetic testing and 20 individuals with positive diagnoses of specific Mendelian etiologies of chronic kidney disease from exome sequencing. Finally, through several retrospective case studies, we demonstrated how combined analyses of genotype data and deep phenotype data from EHRs can expedite genetic diagnoses. In summary, EHR-Phenolyzer leverages EHR narratives to automate phenotype-driven analysis of clinical exomes or genomes, facilitating the broader implementation of genomic medicine.

Keywords: biomedical informatics; diagnosis; electronic health records; exome; genome; knowledge engineering; natural language processing; next-generation sequencing; phenotyping; precision medicine.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Overview of the Comparative Analysis for Evaluating Different NLP Tools
Figure 2
Figure 2
Illustration of How NLPs Work to Extract Phenotype Terms from Natural Language in Clinical Notes The same clinical note was analyzed by MetaMap (A) and MedLEE (B) for the generation of HPO terms.
Figure 3
Figure 3
Comparison of Four Methods of Ranking Genes with Causal Variants 28 individuals in the primary site (A) and ten individuals in the secondary site (B). For each individual, three methods were used to extract phenotype terms and then used in Phenolyzer or Phenomizer to find a ranked list of candidate genes. The MedLEE approach achieved the best performance in ranking the genes with causal variants within the top 100 of all genes in both datasets.
Figure 4
Figure 4
Detailed Analysis of Genetic Counselors’ Notes and Genetic Diagnostic Reports on 46 Affected Individuals from Cohort 3 (A) A breakdown of the affected individuals according to diagnostic genetic testing. (B) The distribution of various genetic tests that were used on this cohort. (C) The distribution of the types of phenotype information used in genetic diagnosis. (D) Performance of EHR-Phenolyzer in ranking the genes with causal variants among all candidate genes.
Figure 5
Figure 5
Phenolyzer Can Tolerate Inaccuracies in the Phenotype-Term Extraction of an Individual Affected by Schmid-type Metaphyseal Chondrodysplasia (A) Only five phenotype terms were shared among three different phenotype-extraction methods. (B) All three methods ranked the gene with a causal mutation as #4. (C and D) The network of prioritized genes and phenotype terms, where the phenotypes were extracted by an expert (C) or by MedLEE (D). COL10A1 with a causal mutation is highlighted in the network. The size of each pie section is positively related to the Phenolyzer ranking.
Figure 6
Figure 6
Molecular Diagnosis of KBG Syndrome in an Individual with a Frameshift Mutation in ANKRD11 through Combined Genotype and Phenotype Analysis

Similar articles

Cited by

References

    1. van Nimwegen K.J., Schieving J.H., Willemsen M.A., Veltman J.A., van der Burg S., van der Wilt G.J., Grutters J.P. The diagnostic pathway in complex paediatric neurology: A cost analysis. Eur. J. Paediatr. Neurol. 2015;19:233–239. - PubMed
    1. Vissers L.E.L.M., van Nimwegen K.J.M., Schieving J.H., Kamsteeg E.J., Kleefstra T., Yntema H.G., Pfundt R., van der Wilt G.J., Krabbenborg L., Brunner H.G. A clinical utility study of exome sequencing versus conventional genetic testing in pediatric neurology. Genet. Med. 2017;19:1055–1063. - PMC - PubMed
    1. Graungaard A.H., Skov L. Why do we need a diagnosis? A qualitative study of parents’ experiences, coping and needs, when the newborn child is severely disabled. Child Care Health Dev. 2007;33:296–307. - PubMed
    1. Sawyer S.L., Hartley T., Dyment D.A., Beaulieu C.L., Schwartzentruber J., Smith A., Bedford H.M., Bernard G., Bernier F.P., Brais B., FORGE Canada Consortium. Care4Rare Canada Consortium Utility of whole-exome sequencing for those near the end of the diagnostic odyssey: Time to address gaps in care. Clin. Genet. 2016;89:275–284. - PMC - PubMed
    1. Ng S.B., Buckingham K.J., Lee C., Bigham A.W., Tabor H.K., Dent K.M., Huff C.D., Shannon P.T., Jabs E.W., Nickerson D.A. Exome sequencing identifies the cause of a mendelian disorder. Nat. Genet. 2010;42:30–35. - PMC - PubMed

Publication types

LinkOut - more resources