HaMStR: profile hidden markov model based search for orthologs in ESTs

Ingo Ebersberger¹, Sascha Strauss, Arndt von Haeseler

Affiliations

PMID: 19586527
PMCID: PMC2723089
DOI: 10.1186/1471-2148-9-157

HaMStR: profile hidden markov model based search for orthologs in ESTs

Ingo Ebersberger et al. BMC Evol Biol. 2009.

. 2009 Jul 8:9:157.

doi: 10.1186/1471-2148-9-157.

Authors

Ingo Ebersberger¹, Sascha Strauss, Arndt von Haeseler

Affiliation

¹ Center for Integrative Bioinformatics Vienna, Max F, Perutz Laboratories, Vienna, Austria. ingo.ebersberger@univie.ac.at

PMID: 19586527
PMCID: PMC2723089
DOI: 10.1186/1471-2148-9-157

Abstract

Background: EST sequencing is a versatile approach for rapidly gathering protein coding sequences. They provide direct access to an organism's gene repertoire bypassing the still error-prone procedure of gene prediction from genomic data. Therefore, ESTs are often the only source for biological sequence data from taxa outside mainstream interest. The widespread use of ESTs in evolutionary studies and particularly in molecular systematics studies is still hindered by the lack of efficient and reliable approaches for automated ortholog predictions in ESTs. Existing methods either depend on a known species tree or cannot cope with redundancy in EST data.

Results: We present a novel approach (HaMStR) to mine EST data for the presence of orthologs to a curated set of genes. HaMStR combines a profile Hidden Markov Model search and a subsequent BLAST search to extend existing ortholog cluster with sequences from further taxa. We show that the HaMStR results are consistent with those obtained with existing orthology prediction methods that require completely sequenced genomes. A case study on the phylogeny of 35 fungal taxa illustrates that HaMStR is well suited to compile informative data sets for phylogenomic studies from ESTs and protein sequence data.

Conclusion: HaMStR extends in a standardized manner a pre-defined set of orthologs with ESTs from further taxa. In the same fashion HaMStR can be applied to protein sequence data, and thus provides a comprehensive approach to compile ortholog cluster from any protein coding data. The resulting orthology predictions serve as the data basis for a variety of evolutionary studies. Here, we have demonstrated the application of HaMStR in a molecular systematics study. However, we envision that studies tracing the evolutionary fate of individual genes or functional complexes of genes will greatly benefit from HaMStR orthology predictions as well.

PubMed Disclaimer

Figures

**Figure 1**
**Workflow of the HaMStR approach**. Standard orthology prediction tools are used to identify orthologous groups, the so called core-orthologs, for a set of completely sequenced primer taxa (Proteome A - F). The sequences in a core-ortholog are aligned and converted into a profile HMM (pHMM). A compilation of protein sequences or translated ESTs from a taxon not included in the primer-taxa (Protein set G) is searched for hits with the pHMM. The resulting candidates display features that are characteristic for the protein modelled by the pHMM. To determine the orthology status of the candidates, we introduce a reciprocity criterion. Each candidate is compared by BLASTP with the proteome of one of the primer-taxa, the so-called reference-taxon (Proteome F). If the best BLASTP hit sequence from the reference taxon corresponds to the protein that contributed to the pHMM, the candidate is called candidate-ortholog, else it is discarded.

**Figure 2**
**Sensitivity of HaMStR as a function of CDS coverage**. Fraction of the coding sequence (CDS) covered by the ESTs that have been correctly annotated and missed by HaMStR, respectively.

**Figure 3**
**A maximum likelihood phylogeny of 35 fungi based on 178 genes**. Unless otherwise stated, all splits in the tree have bootstrap support values of 100. For taxa in all upper case letters the annotated proteome was used. For the remaining taxa orthologs were predicted from ESTs.

See this image and copyright information in PMC

References

1. Baurain D, Brinkmann H, Philippe H. Lack of resolution in the animal phylogeny: closely spaced cladogeneses or undetected systematic errors? Mol Biol Evol. 2007;24:6–9. doi: 10.1093/molbev/msl137. - DOI - PubMed
1. Dunn CW, Hejnol A, Matus DQ, Pang K, Browne WE, Smith SA, Seaver E, Rouse GW, Obst M, Edgecombe GD, et al. Broad phylogenomic sampling improves resolution of the animal tree of life. Nature. 2008;452:745–749. doi: 10.1038/nature06614. - DOI - PubMed
1. Fitzpatrick DA, Logue ME, Stajich JE, Butler G. A fungal phylogeny based on 42 complete genomes derived from supertree and combined gene analysis. BMC Evol Biol. 2006;6:99. doi: 10.1186/1471-2148-6-99. - DOI - PMC - PubMed
1. Delsuc F, Brinkmann H, Philippe H. Phylogenomics and the reconstruction of the tree of life. Nat Rev Genet. 2005;6:361–375. doi: 10.1038/nrg1603. - DOI - PubMed
1. Philippe H, Lartillot N, Brinkmann H. Multigene analyses of bilaterian animals corroborate the monophyly of Ecdysozoa, Lophotrochozoa, and Protostomia. Mol Biol Evol. 2005;22:1246–1253. doi: 10.1093/molbev/msi111. - DOI - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

HaMStR: profile hidden markov model based search for orthologs in ESTs

Affiliation

HaMStR: profile hidden markov model based search for orthologs in ESTs

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Molecular Biology Databases

Research Materials