Detecting non-orthology in the COGs database and other approaches grouping orthologs using genome-specific best hits
- PMID: 16835308
- PMCID: PMC1500873
- DOI: 10.1093/nar/gkl433
Detecting non-orthology in the COGs database and other approaches grouping orthologs using genome-specific best hits
Abstract
Correct orthology assignment is a critical prerequisite of numerous comparative genomics procedures, such as function prediction, construction of phylogenetic species trees and genome rearrangement analysis. We present an algorithm for the detection of non-orthologs that arise by mistake in current orthology classification methods based on genome-specific best hits, such as the COGs database. The algorithm works with pairwise distance estimates, rather than computationally expensive and error-prone tree-building methods. The accuracy of the algorithm is evaluated through verification of the distribution of predicted cases, case-by-case phylogenetic analysis and comparisons with predictions from other projects using independent methods. Our results show that a very significant fraction of the COG groups include non-orthologs: using conservative parameters, the algorithm detects non-orthology in a third of all COG groups. Consequently, sequence analysis sensitive to correct orthology assignments will greatly benefit from these findings.
Figures






Similar articles
-
On the quality of tree-based protein classification.Bioinformatics. 2005 May 1;21(9):1876-90. doi: 10.1093/bioinformatics/bti244. Epub 2005 Jan 12. Bioinformatics. 2005. PMID: 15647305
-
OrthologID: automation of genome-scale ortholog identification within a parsimony framework.Bioinformatics. 2006 Mar 15;22(6):699-707. doi: 10.1093/bioinformatics/btk040. Epub 2006 Jan 12. Bioinformatics. 2006. PMID: 16410324
-
Assessment of phylogenomic and orthology approaches for phylogenetic inference.Bioinformatics. 2007 Apr 1;23(7):815-24. doi: 10.1093/bioinformatics/btm015. Epub 2007 Jan 19. Bioinformatics. 2007. PMID: 17237036
-
The quest for orthologs: finding the corresponding gene across genomes.Trends Genet. 2008 Nov;24(11):539-51. doi: 10.1016/j.tig.2008.08.009. Epub 2008 Sep 24. Trends Genet. 2008. PMID: 18819722 Review.
-
Inferring orthology and paralogy.Methods Mol Biol. 2012;855:259-79. doi: 10.1007/978-1-61779-582-4_9. Methods Mol Biol. 2012. PMID: 22407712 Review.
Cited by
-
GIGA: a simple, efficient algorithm for gene tree inference in the genomic age.BMC Bioinformatics. 2010 Jun 9;11:312. doi: 10.1186/1471-2105-11-312. BMC Bioinformatics. 2010. PMID: 20534164 Free PMC article.
-
OrthoSelect: a protocol for selecting orthologous groups in phylogenomics.BMC Bioinformatics. 2009 Jul 16;10:219. doi: 10.1186/1471-2105-10-219. BMC Bioinformatics. 2009. PMID: 19607672 Free PMC article.
-
QuartetS: a fast and accurate algorithm for large-scale orthology detection.Nucleic Acids Res. 2011 Jul;39(13):e88. doi: 10.1093/nar/gkr308. Epub 2011 May 13. Nucleic Acids Res. 2011. PMID: 21572104 Free PMC article.
-
Fast estimation of the difference between two PAM/JTT evolutionary distances in triplets of homologous sequences.BMC Bioinformatics. 2006 Dec 5;7:529. doi: 10.1186/1471-2105-7-529. BMC Bioinformatics. 2006. PMID: 17147817 Free PMC article.
-
Assessing the evolutionary rate of positional orthologous genes in prokaryotes using synteny data.BMC Evol Biol. 2007 Nov 29;7:237. doi: 10.1186/1471-2148-7-237. BMC Evol Biol. 2007. PMID: 18047665 Free PMC article.
References
-
- Fitch W.M. Distinguishing homologous from analogous proteins. Syst Zool. 1970;19:99–113. - PubMed
-
- Koonin E.V. Orthologs, paralogs, and evolutionary genomics. Annu. Rev. Genet. 2005;39:309–338. - PubMed
-
- Tatusov R.L., Koonin E.V., Lipman D.J. A genomic perspective on protein families. Science. 1997;278:631–637. - PubMed