Automatic clustering of orthologs and in-paralogs from pairwise species comparisons
- PMID: 11743721
- DOI: 10.1006/jmbi.2000.5197
Automatic clustering of orthologs and in-paralogs from pairwise species comparisons
Abstract
Orthologs are genes in different species that originate from a single gene in the last common ancestor of these species. Such genes have often retained identical biological roles in the present-day organisms. It is hence important to identify orthologs for transferring functional information between genes in different organisms with a high degree of reliability. For example, orthologs of human proteins are often functionally characterized in model organisms. Unfortunately, orthology analysis between human and e.g. invertebrates is often complex because of large numbers of paralogs within protein families. Paralogs that predate the species split, which we call out-paralogs, can easily be confused with true orthologs. Paralogs that arose after the species split, which we call in-paralogs, however, are bona fide orthologs by definition. Orthologs and in-paralogs are typically detected with phylogenetic methods, but these are slow and difficult to automate. Automatic clustering methods based on two-way best genome-wide matches on the other hand, have so far not separated in-paralogs from out-paralogs effectively. We present a fully automatic method for finding orthologs and in-paralogs from two species. Ortholog clusters are seeded with a two-way best pairwise match, after which an algorithm for adding in-paralogs is applied. The method bypasses multiple alignments and phylogenetic trees, which can be slow and error-prone steps in classical ortholog detection. Still, it robustly detects complex orthologous relationships and assigns confidence values for both orthologs and in-paralogs. The program, called INPARANOID, was tested on all completely sequenced eukaryotic genomes. To assess the quality of INPARANOID results, ortholog clusters were generated from a dataset of worm and mammalian transmembrane proteins, and were compared to clusters derived by manual tree-based ortholog detection methods. This study led to the identification with a high degree of confidence of over a dozen novel worm-mammalian ortholog assignments that were previously undetected because of shortcomings of phylogenetic methods.A WWW server that allows searching for orthologs between human and several fully sequenced genomes is installed at http://www.cgb.ki.se/inparanoid/. This is the first comprehensive resource with orthologs of all fully sequenced eukaryotic genomes. Programs and tables of orthology assignments are available from the same location.
Copyright 2001 Academic Press.
Similar articles
-
OrthoMCL: identification of ortholog groups for eukaryotic genomes.Genome Res. 2003 Sep;13(9):2178-89. doi: 10.1101/gr.1224503. Genome Res. 2003. PMID: 12952885 Free PMC article.
-
OrthoDisease: a database of human disease orthologs.Hum Mutat. 2004 Aug;24(2):112-9. doi: 10.1002/humu.20068. Hum Mutat. 2004. PMID: 15241792
-
Inparanoid: a comprehensive database of eukaryotic orthologs.Nucleic Acids Res. 2005 Jan 1;33(Database issue):D476-80. doi: 10.1093/nar/gki107. Nucleic Acids Res. 2005. PMID: 15608241 Free PMC article.
-
Orthologs, paralogs, and evolutionary genomics.Annu Rev Genet. 2005;39:309-38. doi: 10.1146/annurev.genet.39.073003.114725. Annu Rev Genet. 2005. PMID: 16285863 Review.
-
OrthoDisease: tracking disease gene orthologs across 100 species.Brief Bioinform. 2011 Sep;12(5):463-73. doi: 10.1093/bib/bbr024. Epub 2011 May 12. Brief Bioinform. 2011. PMID: 21565935 Review.
Cited by
-
Automated hierarchical classification of protein domain subfamilies based on functionally-divergent residue signatures.BMC Bioinformatics. 2012 Jun 22;13:144. doi: 10.1186/1471-2105-13-144. BMC Bioinformatics. 2012. PMID: 22726767 Free PMC article.
-
Replicon-dependent bacterial genome evolution: the case of Sinorhizobium meliloti.Genome Biol Evol. 2013;5(3):542-58. doi: 10.1093/gbe/evt027. Genome Biol Evol. 2013. PMID: 23431003 Free PMC article.
-
FSRD: fungal stress response database.Database (Oxford). 2013 Jun 11;2013:bat037. doi: 10.1093/database/bat037. Print 2013. Database (Oxford). 2013. PMID: 23757396 Free PMC article.
-
Cis-regulatory signatures of orthologous stress-associated bZIP transcription factors from rice, sorghum and Arabidopsis based on phylogenetic footprints.BMC Genomics. 2012 Sep 20;13:497. doi: 10.1186/1471-2164-13-497. BMC Genomics. 2012. PMID: 22992304 Free PMC article.
-
Transcriptome response to nitrogen starvation in rice.J Biosci. 2012 Sep;37(4):731-47. doi: 10.1007/s12038-012-9242-2. J Biosci. 2012. PMID: 22922198
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
Molecular Biology Databases
Research Materials
Miscellaneous