A new generation of homology search tools based on probabilistic inference
- PMID: 20180275
A new generation of homology search tools based on probabilistic inference
Abstract
Many theoretical advances have been made in applying probabilistic inference methods to improve the power of sequence homology searches, yet the BLAST suite of programs is still the workhorse for most of the field. The main reason for this is practical: BLAST's programs are about 100-fold faster than the fastest competing implementations of probabilistic inference methods. I describe recent work on the HMMER software suite for protein sequence analysis, which implements probabilistic inference using profile hidden Markov models. Our aim in HMMER3 is to achieve BLAST's speed while further improving the power of probabilistic inference based methods. HMMER3 implements a new probabilistic model of local sequence alignment and a new heuristic acceleration algorithm. Combined with efficient vector-parallel implementations on modern processors, these improvements synergize. HMMER3 uses more powerful log-odds likelihood scores (scores summed over alignment uncertainty, rather than scoring a single optimal alignment); it calculates accurate expectation values (E-values) for those scores without simulation using a generalization of Karlin/Altschul theory; it computes posterior distributions over the ensemble of possible alignments and returns posterior probabilities (confidences) in each aligned residue; and it does all this at an overall speed comparable to BLAST. The HMMER project aims to usher in a new generation of more powerful homology search tools based on probabilistic inference methods.
Similar articles
-
Toward an accurate statistics of gapped alignments.Bull Math Biol. 2005 Jan;67(1):169-91. doi: 10.1016/j.bulm.2004.07.001. Bull Math Biol. 2005. PMID: 15691544
-
Calibrating E-values for hidden Markov models using reverse-sequence null models.Bioinformatics. 2005 Nov 15;21(22):4107-15. doi: 10.1093/bioinformatics/bti629. Epub 2005 Aug 25. Bioinformatics. 2005. PMID: 16123115
-
Accelerated Profile HMM Searches.PLoS Comput Biol. 2011 Oct;7(10):e1002195. doi: 10.1371/journal.pcbi.1002195. Epub 2011 Oct 20. PLoS Comput Biol. 2011. PMID: 22039361 Free PMC article.
-
Fast model-based protein homology detection without alignment.Bioinformatics. 2007 Jul 15;23(14):1728-36. doi: 10.1093/bioinformatics/btm247. Epub 2007 May 8. Bioinformatics. 2007. PMID: 17488755
-
Sequence comparison and protein structure prediction.Curr Opin Struct Biol. 2006 Jun;16(3):374-84. doi: 10.1016/j.sbi.2006.05.006. Epub 2006 May 19. Curr Opin Struct Biol. 2006. PMID: 16713709 Review.
Cited by
-
Mapping and analysis of Caenorhabditis elegans transcription factor sequence specificities.Elife. 2015 Apr 23;4:e06967. doi: 10.7554/eLife.06967. Elife. 2015. PMID: 25905672 Free PMC article.
-
Pan-phylum Comparison of Nematode Metabolic Potential.PLoS Negl Trop Dis. 2015 May 22;9(5):e0003788. doi: 10.1371/journal.pntd.0003788. eCollection 2015 May. PLoS Negl Trop Dis. 2015. PMID: 26000881 Free PMC article.
-
Evolutionarily consistent families in SCOP: sequence, structure and function.BMC Struct Biol. 2012 Oct 18;12:27. doi: 10.1186/1472-6807-12-27. BMC Struct Biol. 2012. PMID: 23078280 Free PMC article.
-
CRISPR-Cas systems target a diverse collection of invasive mobile genetic elements in human microbiomes.Genome Biol. 2013 Apr 29;14(4):R40. doi: 10.1186/gb-2013-14-4-r40. Genome Biol. 2013. PMID: 23628424 Free PMC article.
-
The dispanins: a novel gene family of ancient origin that contains 14 human members.PLoS One. 2012;7(2):e31961. doi: 10.1371/journal.pone.0031961. Epub 2012 Feb 20. PLoS One. 2012. PMID: 22363774 Free PMC article.
MeSH terms
LinkOut - more resources
Other Literature Sources
Research Materials