Detecting non-orthology in the COGs database and other approaches grouping orthologs using genome-specific best hits
- PMID: 16835308
- PMCID: PMC1500873
- DOI: 10.1093/nar/gkl433
Detecting non-orthology in the COGs database and other approaches grouping orthologs using genome-specific best hits
Abstract
Correct orthology assignment is a critical prerequisite of numerous comparative genomics procedures, such as function prediction, construction of phylogenetic species trees and genome rearrangement analysis. We present an algorithm for the detection of non-orthologs that arise by mistake in current orthology classification methods based on genome-specific best hits, such as the COGs database. The algorithm works with pairwise distance estimates, rather than computationally expensive and error-prone tree-building methods. The accuracy of the algorithm is evaluated through verification of the distribution of predicted cases, case-by-case phylogenetic analysis and comparisons with predictions from other projects using independent methods. Our results show that a very significant fraction of the COG groups include non-orthologs: using conservative parameters, the algorithm detects non-orthology in a third of all COG groups. Consequently, sequence analysis sensitive to correct orthology assignments will greatly benefit from these findings.
Figures
References
-
- Fitch W.M. Distinguishing homologous from analogous proteins. Syst Zool. 1970;19:99–113. - PubMed
-
- Koonin E.V. Orthologs, paralogs, and evolutionary genomics. Annu. Rev. Genet. 2005;39:309–338. - PubMed
-
- Tatusov R.L., Koonin E.V., Lipman D.J. A genomic perspective on protein families. Science. 1997;278:631–637. - PubMed
