Phylogenetic and functional assessment of orthologs inference projects and methods
- PMID: 19148271
- PMCID: PMC2612752
- DOI: 10.1371/journal.pcbi.1000262
Phylogenetic and functional assessment of orthologs inference projects and methods
Abstract
Accurate genome-wide identification of orthologs is a central problem in comparative genomics, a fact reflected by the numerous orthology identification projects developed in recent years. However, only a few reports have compared their accuracy, and indeed, several recent efforts have not yet been systematically evaluated. Furthermore, orthology is typically only assessed in terms of function conservation, despite the phylogeny-based original definition of Fitch. We collected and mapped the results of nine leading orthology projects and methods (COG, KOG, Inparanoid, OrthoMCL, Ensembl Compara, Homologene, RoundUp, EggNOG, and OMA) and two standard methods (bidirectional best-hit and reciprocal smallest distance). We systematically compared their predictions with respect to both phylogeny and function, using six different tests. This required the mapping of millions of sequences, the handling of hundreds of millions of predicted pairs of orthologs, and the computation of tens of thousands of trees. In phylogenetic analysis or in functional analysis where high specificity is required, we find that OMA and Homologene perform best. At lower functional specificity but higher coverage level, OrthoMCL outperforms Ensembl Compara, and to a lesser extent Inparanoid. Lastly, the large coverage of the recent EggNOG can be of interest to build broad functional grouping, but the method is not specific enough for phylogenetic or detailed function analyses. In terms of general methodology, we observe that the more sophisticated tree reconstruction/reconciliation approach of Ensembl Compara was at times outperformed by pairwise comparison approaches, even in phylogenetic tests. Furthermore, we show that standard bidirectional best-hit often outperforms projects with more complex algorithms. First, the present study provides guidance for the broad community of orthology data users as to which database best suits their needs. Second, it introduces new methodology to verify orthology. And third, it sets performance standards for current and future approaches.
Conflict of interest statement
The authors have declared that no competing interests exist.
Figures




Similar articles
-
Conceptual framework and pilot study to benchmark phylogenomic databases based on reference gene trees.Brief Bioinform. 2011 Sep;12(5):423-35. doi: 10.1093/bib/bbr034. Epub 2011 Jul 7. Brief Bioinform. 2011. PMID: 21737420 Free PMC article.
-
Detecting non-orthology in the COGs database and other approaches grouping orthologs using genome-specific best hits.Nucleic Acids Res. 2006 Jul 11;34(11):3309-16. doi: 10.1093/nar/gkl433. Print 2006. Nucleic Acids Res. 2006. PMID: 16835308 Free PMC article.
-
Computational methods for Gene Orthology inference.Brief Bioinform. 2011 Sep;12(5):379-91. doi: 10.1093/bib/bbr030. Epub 2011 Jun 19. Brief Bioinform. 2011. PMID: 21690100 Free PMC article.
-
Inferring orthology and paralogy.Methods Mol Biol. 2012;855:259-79. doi: 10.1007/978-1-61779-582-4_9. Methods Mol Biol. 2012. PMID: 22407712 Review.
-
Large-scale assignment of orthology: back to phylogenetics?Genome Biol. 2008 Oct 30;9(10):235. doi: 10.1186/gb-2008-9-10-235. Genome Biol. 2008. PMID: 18983710 Free PMC article. Review.
Cited by
-
Inferring hierarchical orthologous groups from orthologous gene pairs.PLoS One. 2013;8(1):e53786. doi: 10.1371/journal.pone.0053786. Epub 2013 Jan 14. PLoS One. 2013. PMID: 23342000 Free PMC article.
-
Error, signal, and the placement of Ctenophora sister to all other animals.Proc Natl Acad Sci U S A. 2015 May 5;112(18):5773-8. doi: 10.1073/pnas.1503453112. Epub 2015 Apr 20. Proc Natl Acad Sci U S A. 2015. PMID: 25902535 Free PMC article.
-
Joining forces in the quest for orthologs.Genome Biol. 2009;10(9):403. doi: 10.1186/gb-2009-10-9-403. Epub 2009 Sep 29. Genome Biol. 2009. PMID: 19785718 Free PMC article.
-
Surveying alignment-free features for Ortholog detection in related yeast proteomes by using supervised big data classifiers.BMC Bioinformatics. 2018 May 3;19(1):166. doi: 10.1186/s12859-018-2148-8. BMC Bioinformatics. 2018. PMID: 29724166 Free PMC article.
-
Metatranscriptome analysis of fungal strains Penicillium camemberti and Geotrichum candidum reveal cheese matrix breakdown and potential development of sensory properties of ripened Camembert-type cheese.BMC Genomics. 2014 Mar 26;15:235. doi: 10.1186/1471-2164-15-235. BMC Genomics. 2014. PMID: 24670012 Free PMC article.
References
-
- Tatusov RL, Koonin EV, Lipman DJ. A genomic perspective on protein families. Science. 1997;278:631–637. - PubMed
-
- Remm M, Storm C, Sonnhammer E. Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. J Mol Biol. 2001;314:1041–1052. - PubMed
-
- Dessimoz C, Cannarozzi G, Gil M, Margadant D, Roth A, et al. McLysath A, Huson DH, editors. OMA, a comprehensive, automated project for the identification of orthologs from complete genome data: Introduction and first achievements. RECOMB 2005 Workshop on Comparative Genomics. 2005. pp. 61–72. Springer-Verlag, volume LNBI 3678 of Lecture Notes in Bioinformatics.
-
- DeLuca TF, Wu IH, Pu J, Monaghan T, Peshkin L, et al. Roundup: a multi-genome repository of orthologs and evolutionary distances. Bioinformatics. 2006;22:2044–2046. - PubMed
MeSH terms
LinkOut - more resources
Full Text Sources