InParanoid 7: new algorithms and tools for eukaryotic orthology analysis

Gabriel Ostlund¹, Thomas Schmitt, Kristoffer Forslund, Tina Köstler, David N Messina, Sanjit Roopra, Oliver Frings, Erik L L Sonnhammer

Affiliations

Affiliation

¹ Department of Biochemistry and Biophysics, Stockholm Bioinformatics Centre, AlbaNova University Centre, Stockholm University, SE-10691 Stockholm, Sweden. gabriel.ostlund@sbc.su.se

PMID: 19892828
PMCID: PMC2808972
DOI: 10.1093/nar/gkp931

InParanoid 7: new algorithms and tools for eukaryotic orthology analysis

Gabriel Ostlund et al. Nucleic Acids Res. 2010 Jan.

. 2010 Jan;38(Database issue):D196-203.

doi: 10.1093/nar/gkp931. Epub 2009 Nov 5.

Authors

Gabriel Ostlund¹, Thomas Schmitt, Kristoffer Forslund, Tina Köstler, David N Messina, Sanjit Roopra, Oliver Frings, Erik L L Sonnhammer

Affiliation

¹ Department of Biochemistry and Biophysics, Stockholm Bioinformatics Centre, AlbaNova University Centre, Stockholm University, SE-10691 Stockholm, Sweden. gabriel.ostlund@sbc.su.se

PMID: 19892828
PMCID: PMC2808972
DOI: 10.1093/nar/gkp931

Abstract

The InParanoid project gathers proteomes of completely sequenced eukaryotic species plus Escherichia coli and calculates pairwise ortholog relationships among them. The new release 7.0 of the database has grown by an order of magnitude over the previous version and now includes 100 species and their collective 1.3 million proteins organized into 42.7 million pairwise ortholog groups. The InParanoid algorithm itself has been revised and is now both more specific and sensitive. Based on results from our recent benchmarking of low-complexity filters in homology assignment, a two-pass BLAST approach was developed that makes use of high-precision compositional score matrix adjustment, but avoids the alignment truncation that sometimes follows. We have also updated the InParanoid web site (http://InParanoid.sbc.su.se). Several features have been added, the response times have been improved and the site now sports a new, clearer look. As the number of ortholog databases has grown, it has become difficult to compare among these resources due to a lack of standardized source data and incompatible representations of ortholog relationships. To facilitate data exchange and comparisons among ortholog databases, we have developed and are making available two XML schemas: SeqXML for the input sequences and OrthoXML for the output ortholog clusters.

PubMed Disclaimer

Figures

**Figure 1.**
A diagram showing the use of XML in the InParanoid workflow. The InParanoid convert program starts with simple FASTA files that each have a different header line format. With the help of the species.xml file, it parses and converts them to SeqXML files, which can be easily processed and validated as input to the InParanoid algorithm. On the web site, the user can choose between different data formats; currently supported are SQL, TXT, HTML and OrthoXML.

**Figure 2.**
The new InParanoid web interface. The screenshot in the upper left corner shows the InParanoid clusters between *O. sativa* and *E. coli*. For every cluster, i.e. ortholog group, the members are listed with the identifiers of the proteome source and a description. The InParanoid score is shown for every cluster member and bootstrap values are given for the seed orthologs. The bootstrap value indicates the fraction of intracluster bootstrap runs that placed the seed ortholog as the best match. Clicking on the cluster number leads to the details page of the cluster (right), again listing the members and also presenting their domain annotations and a neighbor-joining bootstrap tree of them. In the tree, branches leading to sequences of the same species have the same color, and upon clicking a domain, one is redirected to its Pfam page. In addition, the details page provides a range of possibilities to further investigate the cluster. A multiple sequence alignment can be viewed in Kalignvu (37) or downloaded in various formats such as FASTA, Stockholm, MSF or SELEX. The protein tree can be can be downloaded as picture or in NH format, and it is possible to edit the tree interactively in the ATV tree viewer (38).

**Figure 3.**
Histogram of the average number of inparalogs/cluster per species for all species–species comparisons in InParanoid 7. Vertebrates and fungi generally have a lower number of inparalogs per clusters—always <3, whereas invertebrates, protists and plants can have as many as five inparalogs/cluster on average.

See this image and copyright information in PMC

References

1. Fitch WM. Distinguishing homologous from analogous proteins. Syst. Zool. 1970;19:99–113. - PubMed
1. Sonnhammer ELL, Koonin EV. Orthology, paralogy and proposed classification for paralog subtypes. Trends Genet. 2002;18:619–620. - PubMed
1. Alexeyenko A, Lindberg J, Perez-Bercoff A, Sonnhammer ELL. Overview and comparison of ortholog databases. Drug Discov. Today Tech. 2006;3:137–143. - PubMed
1. Hulsen T, Huynen MA, de Vlieg J, Groenen PM. Benchmarking ortholog identification methods using functional genomics data. Genome Biol. 2006;7:R31. - PMC - PubMed
1. Dolinski K, Botstein D. Orthology and functional conservation in eukaryotes. Annu. Rev. Genet. 2007;41:465–507. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

InParanoid 7: new algorithms and tools for eukaryotic orthology analysis

Affiliation

InParanoid 7: new algorithms and tools for eukaryotic orthology analysis

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Research Materials