Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Sep;12(5):423-35.
doi: 10.1093/bib/bbr034. Epub 2011 Jul 7.

Conceptual framework and pilot study to benchmark phylogenomic databases based on reference gene trees

Affiliations

Conceptual framework and pilot study to benchmark phylogenomic databases based on reference gene trees

Brigitte Boeckmann et al. Brief Bioinform. 2011 Sep.

Abstract

Phylogenomic databases provide orthology predictions for species with fully sequenced genomes. Although the goal seems well-defined, the content of these databases differs greatly. Seven ortholog databases (Ensembl Compara, eggNOG, HOGENOM, InParanoid, OMA, OrthoDB, Panther) were compared on the basis of reference trees. For three well-conserved protein families, we observed a generally high specificity of orthology assignments for these databases. We show that differences in the completeness of predicted gene relationships and in the phylogenetic information are, for the great majority, not due to the methods used, but to differences in the underlying database concepts. According to our metrics, none of the databases provides a fully correct and comprehensive protein classification. Our results provide a framework for meaningful and systematic comparisons of phylogenomic databases. In the future, a sustainable set of 'Gold standard' phylogenetic trees could provide a robust method for phylogenomic databases to assess their current quality status, measure changes following new database releases and diagnose improvements subsequent to an upgrade of the analysis procedure.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
Concepts of selected phylogenomic databases. Rows (from top to bottom) indicate the different database concepts, the structure of ortholog groups, the completeness of predicted gene relationships and the implied tree structures. Latter visualizes the captured phylogenetic information.
Figure 2:
Figure 2:
Reference tree for the V-type ATPase β-subunit subfamily and corresponding ortholog predictions from seven phylogenomic databases. The different grouping strategies are clearly reflected: OMA, InParanoid and the unlabeled trees of HOGENOM occur as mutually exclusive groups, while all other databases possess hierarchical grouping strategies. Most orthology predictions coincide with those of the reference tree, but none of the phylogenomic databases is in full agreement with all of them: OMA groups are split into more groups than necessary, which results in less predicted gene relationships; InParanoid predicts the B2 subunit of Ornithorhynchus anatinus to be an ortholog of the human B1 subunit and lacks some of the arthropod orthologs; OrthoDB assigns corresponding 1:1 orthologs only for closely related species such as primates or rodents; eggNOG gives contradictory information on the B2 subunit of Xenopus tropicalis; the tree topology of Panther suggests lineage-specific duplications for the paralogs of X. tropicalis, Caenorhabditis elegans and C. briggsae; the tree of Compara includes an additional duplication event within the vertebrate B2 clade; HOGENOM differs from the reference tree only by the inversion of a speciation node (data not shwn) and lacks one of the expected orthologs in the data set. Missing orthologs are also observed for OMA, InParanoid and Panther. Explanation: the left block (headed ‘Ortholog hierarchies’) indicates the ortholog classification derived from the reference tree, with the largest homolog group given in the first column; different levels of orthologous hierarchies are shown as patterned cells in the right-handed columns. Corresponding groups defined by the phylogenomic databases are patterned accordingly, if relevant to the benchmarked ortholog classification. Triangle: gene duplication event. White cell: gene of species that are not covered by the database. Plain gray cell: gene assigned to an unexpected ortholog group. Descending diagonal: expected gene that was missing in an ortholog group. Ascending diagonal: false positive prediction. Black horizontal bar: groups of the same hierarchical level within the same column. For OrthoDB the black bar also separates the three taxonomic sections of the database (VeRTebrate, ARThropods, FUNgi). For more details, see Supplementary Figure S3.

References

    1. Fitch WM. Distinguishing homologous from analogous proteins. Syst Zool. 1970;19:99–113. - PubMed
    1. Studer RA, Robinson-Rechavi M. How confident can we be that orthologs are similar, but paralogs differ? Trends Genet. 2009;25:210–6. - PubMed
    1. Alexeyenko A, Lindberg J, Pérez-Bercoff A, et al. Overview and comparison of ortholog databases. Drug Discov Today: Technol. 2006;3:137–43. - PubMed
    1. Chen F, Mackey AJ, Vermunt JK, et al. Assessing performance of orthology detection strategies applied to eukaryotic genomes. PLoS One. 2007;2:e383. - PMC - PubMed
    1. Hulsen T, Huynen MA, de Vlieg J, et al. Benchmarking ortholog identification methods using functional genomics data. Genome Biol. 2006;7:R31. - PMC - PubMed

Publication types