Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Jul 2;97(1):111-24.
doi: 10.1016/j.ajhg.2015.05.020. Epub 2015 Jun 25.

The Human Phenotype Ontology: Semantic Unification of Common and Rare Disease

Affiliations

The Human Phenotype Ontology: Semantic Unification of Common and Rare Disease

Tudor Groza et al. Am J Hum Genet. .

Abstract

The Human Phenotype Ontology (HPO) is widely used in the rare disease community for differential diagnostics, phenotype-driven analysis of next-generation sequence-variation data, and translational research, but a comparable resource has not been available for common disease. Here, we have developed a concept-recognition procedure that analyzes the frequencies of HPO disease annotations as identified in over five million PubMed abstracts by employing an iterative procedure to optimize precision and recall of the identified terms. We derived disease models for 3,145 common human diseases comprising a total of 132,006 HPO annotations. The HPO now comprises over 250,000 phenotypic annotations for over 10,000 rare and common diseases and can be used for examining the phenotypic overlap among common diseases that share risk alleles, as well as between Mendelian diseases and common diseases linked by genomic location. The annotations, as well as the HPO itself, are freely available.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Algorithm 1 Summary of the algorithm used to identify a set of HPO term annotated to diseases. See Material and Methods for explanations.
Figure 2
Figure 2
Overview of CR and Bioinformatic Analysis The analysis was performed in several major steps. (1) Bio-LarK was used to analyze the PubMed-MEDLINE 2014 corpus, which resulted in a total of 5,136,645 abstracts annotated with MeSH terms and phenotypic features. (2) For each of 3,145 resulting diseases, the frequency and specificity of HPO terms found in the abstract were used for inferring phenotypic annotations. (3) These annotations were used for producing disease models for each of the diseases. (4) Medical validation of the annotations was performed on the basis of disease, phenotype, and SNP annotations in GWAS Central for phenotype sharing in common disease. (5) Validation with OMIM, Orphanet, and DO was used for assessing phenotype sharing between rare and common diseases linked to the same locus.
Figure 3
Figure 3
Phenotypic Network of Common Disease A total of 1,678 common diseases could be mapped to at least one of 13 top-level DO categories (Figures S5 and S6). 1,148 of these diseases displayed a connection to another disease with a phenotypic similarity score of at least 2.0. They are shown as a node in the graph and are colored according to membership in the upper-level disease categories. The thickness of the connections between the nodes reflects the degree of phenotypic similarity
Figure 4
Figure 4
Phenotype-SNP Network For constructing this network, individual HPO terms were connected to SNPs if the SNP was significantly associated with a disease characterized by the HPO term in question. For instance, the SNP rs5029939 is significantly associated with both Sjögren syndrome and systemic lupus erythematosus. The diseases also share a number of phenotypic features, including “antinuclear antibody positivity” (HP: 0003493) and “xerostomia” (HP: 0000217). A small and particularly dense subset of the network was manually chosen. The network is centered on ten HPO terms representing clinical features that are common in autoimmune diseases.

References

    1. Köhler S., Doelken S.C., Mungall C.J., Bauer S., Firth H.V., Bailleul-Forestier I., Black G.C., Brown D.L., Brudno M., Campbell J. The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data. Nucleic Acids Res. 2014;42:D966–D974. - PMC - PubMed
    1. Robinson P.N., Köhler S., Bauer S., Seelow D., Horn D., Mundlos S. The Human Phenotype Ontology: a tool for annotating and analyzing human hereditary disease. Am. J. Hum. Genet. 2008;83:610–615. - PMC - PubMed
    1. Köhler S., Schulz M.H., Krawitz P., Bauer S., Dölken S., Ott C.E., Mundlos C., Horn D., Mundlos S., Robinson P.N. Clinical diagnostics in human genetics with semantic similarity searches in ontologies. Am. J. Hum. Genet. 2009;85:457–464. - PMC - PubMed
    1. Bauer S., Köhler S., Schulz M.H., Robinson P.N. Bayesian ontology querying for accurate and noise-tolerant semantic searches. Bioinformatics. 2012;28:2502–2508. - PMC - PubMed
    1. Soden S.E., Saunders C.J., Willig L.K., Farrow E.G., Smith L.D., Petrikin J.E., LePichon J.B., Miller N.A., Thiffault I., Dinwiddie D.L. Effectiveness of exome and genome sequencing guided by acuity of illness for diagnosis of neurodevelopmental disorders. Sci Transl Med. 2014;6:265ra16. - PMC - PubMed

Publication types