On subset seeds for protein alignment
- PMID: 19644175
- DOI: 10.1109/TCBB.2009.4
On subset seeds for protein alignment
Abstract
We apply the concept of subset seeds to similarity search in protein sequences. The main question studied is the design of efficient seed alphabets to construct seeds with optimal sensitivity/selectivity trade-offs. We propose several different design methods and use them to construct several alphabets. We then perform a comparative analysis of seeds built over those alphabets and compare them with the standard Blastp seeding method, as well as with the family of vector seeds. While the formalism of subset seeds is less expressive (but less costly to implement) than the cumulative principle used in Blastp and vector seeds, our seeds show a similar or even better performance than Blastp on Bernoulli models of proteins compatible with the common BLOSUM62 matrix. Finally, we perform a large-scale benchmarking of our seeds against several main databases of protein alignments. Here again, the results show a comparable or better performance of our seeds versus Blastp.
Similar articles
-
Reduced amino acid alphabets exhibit an improved sensitivity and selectivity in fold assignment.Bioinformatics. 2009 Jun 1;25(11):1356-62. doi: 10.1093/bioinformatics/btp164. Epub 2009 Apr 7. Bioinformatics. 2009. PMID: 19351620 Free PMC article.
-
transAlign: using amino acids to facilitate the multiple alignment of protein-coding DNA sequences.BMC Bioinformatics. 2005 Jun 22;6:156. doi: 10.1186/1471-2105-6-156. BMC Bioinformatics. 2005. PMID: 15969769 Free PMC article.
-
OXBench: a benchmark for evaluation of protein multiple sequence alignment accuracy.BMC Bioinformatics. 2003 Oct 10;4:47. doi: 10.1186/1471-2105-4-47. BMC Bioinformatics. 2003. PMID: 14552658 Free PMC article.
-
Optimizing multiple seeds for protein homology search.IEEE/ACM Trans Comput Biol Bioinform. 2005 Jan-Mar;2(1):29-38. doi: 10.1109/TCBB.2005.13. IEEE/ACM Trans Comput Biol Bioinform. 2005. PMID: 17044162
-
Protein database searches using compositionally adjusted substitution matrices.FEBS J. 2005 Oct;272(20):5101-9. doi: 10.1111/j.1742-4658.2005.04945.x. FEBS J. 2005. PMID: 16218944 Free PMC article. Review.
Cited by
-
Adaptive seeds tame genomic sequence comparison.Genome Res. 2011 Mar;21(3):487-93. doi: 10.1101/gr.113985.110. Epub 2011 Jan 5. Genome Res. 2011. PMID: 21209072 Free PMC article.
-
PLAST: parallel local alignment search tool for database comparison.BMC Bioinformatics. 2009 Oct 12;10:329. doi: 10.1186/1471-2105-10-329. BMC Bioinformatics. 2009. PMID: 19821978 Free PMC article.
-
Best hits of 11110110111: model-free selection and parameter-free sensitivity calculation of spaced seeds.Algorithms Mol Biol. 2017 Feb 14;12:1. doi: 10.1186/s13015-017-0092-1. eCollection 2017. Algorithms Mol Biol. 2017. PMID: 28289437 Free PMC article.
-
SANSparallel: interactive homology search against Uniprot.Nucleic Acids Res. 2015 Jul 1;43(W1):W24-9. doi: 10.1093/nar/gkv317. Epub 2015 Apr 8. Nucleic Acids Res. 2015. PMID: 25855811 Free PMC article.
-
A bioinformatician's guide to the forefront of suffix array construction algorithms.Brief Bioinform. 2014 Mar;15(2):138-54. doi: 10.1093/bib/bbt081. Epub 2014 Jan 10. Brief Bioinform. 2014. PMID: 24413184 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources