Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Dec;85(6):801-8.
doi: 10.1016/j.ajhg.2009.10.026.

The biological coherence of human phenome databases

Affiliations

The biological coherence of human phenome databases

Martin Oti et al. Am J Hum Genet. 2009 Dec.

Abstract

Disease networks are increasingly explored as a complement to networks centered around interactions between genes and proteins. The quality of disease networks is heavily dependent on the amount and quality of phenotype information in phenotype databases of human genetic diseases. We explored which aspects of phenotype database architecture and content best reflect the underlying biology of disease. We used the OMIM-based HPO, Orphanet, and POSSUM phenotype databases for this purpose and devised a biological coherence score based on the sharing of gene ontology annotation to investigate the degree to which phenotype similarity in these databases reflects related pathobiology. Our analyses support the notion that a fine-grained phenotype ontology enhances the accuracy of phenome representation. In addition, we find that the OMIM database that is most used by the human genetics community is heavily underannotated. We show that this problem can easily be overcome by simply adding data available in the POSSUM database to improve OMIM phenotype representations in the HPO. Also, we find that the use of feature frequency estimates--currently implemented only in the Orphanet database--significantly improves the quality of the phenome representation. Our data suggest that there is much to be gained by improving human phenome databases and that some of the measures needed to achieve this are relatively easy to implement. More generally, we propose that curation and more systematic annotation of human phenome databases can greatly improve the power of the phenotype for genetic disease analysis.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Comprehensively Annotated Syndromes Cluster Better than Sparsely Annotated Syndromes (A) The Ehlers-Danlos and Charcot-Marie-Tooth syndrome families overlap in the phenome landscape of the OMIM data set, but separate when the phenotype descriptions are supplemented with POSSUM annotation. The phenome landscapes were created with multidimensional scaling of the HPO feature-based OMIM distance matrices (left), supplemented with POSSUM annotation (right). The more similar the annotations of two syndromes are, the closer they are on the landscape. The background colors indicate the density of syndromes in that region of the landscape. Lighter colors represent higher densities. (B) Mean phenotypic similarity is consistently greater for the POSSUM-supplemented OMIM data set (red dashed lines) than for the original OMIM data set (blue dashed lines). Besides Ehlers-Danlos (n = 12) and Charcot-Marie-Tooth (n = 12), the more phenotypically diverse family of ciliopathies is also shown (n = 59; Table S2). Continuous lines show the distributions of mean distances for randomly composed syndrome families of equivalent size (n = 107) for the original and supplemented OMIM data sets.
Figure 2
Figure 2
Biological Coherence of Phenotypic Clusters for Different Data Sets and Conditions The box plots show relative enrichment of shared GO terms for genes associated with diseases within clusters compared to randomly permutated phenotype data sets (n = 30). Box limits show the 25th and 75th percentiles, whiskers extend out up to 1.5× the box range, and points outside this range are plotted individually. (A) The full HPO ontology results in biologically more coherent phenotype clusters than a simplified HPO ontology containing only more general features, but only when the OMIM phenotypes are supplemented with POSSUM annotation (purple boxes). (B) Artificial underannotation of the POSSUM and Orphanet databases by randomly halving the syndrome feature lists (“sparse”) leads to strong reductions in cluster biological coherence. However, limiting the Orphanet syndrome descriptions to just the very frequent features has limited impact on cluster coherence, despite the strong reduction in the average number of features per syndrome to just 57% of the original. (C) Weighting Orphanet features according to their prevalence within affected patients improves the biological coherence of clustered phenotypes. Counter-weighting them by assigning higher weights to less frequently occurring features abolishes the biological coherence of the resulting phenotype clusters almost completely. (D) Weighting annotated features according to their specificity (number of syndromes they occur in) via the inverse document frequency (I.D.F.) weighting scheme diminishes cluster biological coherence for well-annotated POSSUM syndromes, but improves it for underannotated syndromes.

Similar articles

Cited by

References

    1. Oti M., Brunner H.G. The modular nature of genetic diseases. Clin. Genet. 2007;71:1–11. - PubMed
    1. Oti M., Huynen M.A., Brunner H.G. Phenome connections. Trends Genet. 2008;24:103–106. - PubMed
    1. Brunner H.G., van Driel M.A. From syndrome families to functional genomics. Nat. Rev. Genet. 2004;5:545–551. - PubMed
    1. Butte A.J., Kohane I.S. Creation and implications of a phenome-genome network. Nat. Biotechnol. 2006;24:55–62. - PMC - PubMed
    1. Freudenberg J., Propping P. A similarity-based method for genome-wide prediction of disease-relevant human genes. Bioinformatics. 2002;18(Suppl 2):S110–S115. - PubMed

Publication types

LinkOut - more resources