A unifying framework for seed sensitivity and its application to subset seeds
- PMID: 16819802
- PMCID: PMC2824148
- DOI: 10.1142/s0219720006001977
A unifying framework for seed sensitivity and its application to subset seeds
Abstract
We propose a general approach to compute the seed sensitivity, that can be applied to different definitions of seeds. It treats separately three components of the seed sensitivity problem--a set of target alignments, an associated probability distribution, and a seed model--that are specified by distinct finite automata. The approach is then applied to a new concept of subset seeds for which we propose an efficient automaton construction. Experimental results confirm that sensitive subset seeds can be efficiently designed using our approach, and can then be used in similarity search producing better results than ordinary spaced seeds.
Figures
Similar articles
-
Choosing the best heuristic for seeded alignment of DNA sequences.BMC Bioinformatics. 2006 Mar 13;7:133. doi: 10.1186/1471-2105-7-133. BMC Bioinformatics. 2006. PMID: 16533404 Free PMC article.
-
All hits all the time: parameter-free calculation of spaced seed sensitivity.Bioinformatics. 2009 Feb 1;25(3):302-8. doi: 10.1093/bioinformatics/btn643. Epub 2008 Dec 18. Bioinformatics. 2009. PMID: 19095701
-
Designing multiple simultaneous seeds for DNA similarity search.J Comput Biol. 2005 Jul-Aug;12(6):847-61. doi: 10.1089/cmb.2005.12.847. J Comput Biol. 2005. PMID: 16108721
-
How does DNA sequence motif discovery work?Nat Biotechnol. 2006 Aug;24(8):959-61. doi: 10.1038/nbt0806-959. Nat Biotechnol. 2006. PMID: 16900144 Review. No abstract available.
-
The many faces of sequence alignment.Brief Bioinform. 2005 Mar;6(1):6-22. doi: 10.1093/bib/6.1.6. Brief Bioinform. 2005. PMID: 15826353 Review.
Cited by
-
BOND: Basic OligoNucleotide Design.BMC Bioinformatics. 2013 Feb 27;14:69. doi: 10.1186/1471-2105-14-69. BMC Bioinformatics. 2013. PMID: 23444904 Free PMC article.
-
A mostly traditional approach improves alignment of bisulfite-converted DNA.Nucleic Acids Res. 2012 Jul;40(13):e100. doi: 10.1093/nar/gks275. Epub 2012 Mar 28. Nucleic Acids Res. 2012. PMID: 22457070 Free PMC article.
-
YOC, A new strategy for pairwise alignment of collinear genomes.BMC Bioinformatics. 2015 Apr 2;16(1):111. doi: 10.1186/s12859-015-0530-3. BMC Bioinformatics. 2015. PMID: 25885358 Free PMC article.
-
Multiple pattern matching: a Markov chain approach.J Math Biol. 2008 Jan;56(1-2):51-92. doi: 10.1007/s00285-007-0109-3. Epub 2007 Aug 1. J Math Biol. 2008. PMID: 17668213
-
A coverage criterion for spaced seeds and its applications to support vector machine string kernels and k-mer distances.J Comput Biol. 2014 Dec;21(12):947-63. doi: 10.1089/cmb.2014.0173. J Comput Biol. 2014. PMID: 25393923 Free PMC article.
References
-
- Aho AV, Corasick MJ. Efficient string matching: An aid to bibliographic search. Communications of the ACM. 1975;18(6):333–340.
-
- Brejova B, Brown D, Vinar T. Optimal spaced seeds for Hidden Markov Models, with application to homologous coding regions. In: Baeza-Yates MCR, Chavez E, editors. Lecture Notes in Computer Science; Proceedings of the 14th Symposium on Combinatorial Pattern Matching; Morelia (Mexico). June 2003; Springer; pp. 42–54.
-
- Brejova B, Brown D, Vinar T. Vector seeds: an extension to spaced seeds allows substantial improvements in sensitivity and specificity. In: Benson G, Page R, editors. Lecture Notes in Computer Science; Proceedings of the 3rd International Workshop in Algorithms in Bioinformatics (WABI); Budapest (Hungary). September 2003; Springer;
-
- Brejova B, Brown D, Vinar T. Optimal spaced seeds for homologous coding regions. Journal of Bioinformatics and Computational Biology. 2004 Jan;1(4):595–610. - PubMed
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources