SeqAn an efficient, generic C++ library for sequence analysis
- PMID: 18184432
- PMCID: PMC2246154
- DOI: 10.1186/1471-2105-9-11
SeqAn an efficient, generic C++ library for sequence analysis
Abstract
Background: The use of novel algorithmic techniques is pivotal to many important problems in life science. For example the sequencing of the human genome 1 would not have been possible without advanced assembly algorithms. However, owing to the high speed of technological progress and the urgent need for bioinformatics tools, there is a widening gap between state-of-the-art algorithmic techniques and the actual algorithmic components of tools that are in widespread use.
Results: To remedy this trend we propose the use of SeqAn, a library of efficient data types and algorithms for sequence analysis in computational biology. SeqAn comprises implementations of existing, practical state-of-the-art algorithmic components to provide a sound basis for algorithm testing and development. In this paper we describe the design and content of SeqAn and demonstrate its use by giving two examples. In the first example we show an application of SeqAn as an experimental platform by comparing different exact string matching algorithms. The second example is a simple version of the well-known MUMmer tool rewritten in SeqAn. Results indicate that our implementation is very efficient and versatile to use.
Conclusion: We anticipate that SeqAn greatly simplifies the rapid development of new bioinformatics tools by providing a collection of readily usable, well-designed algorithmic components which are fundamental for the field of sequence analysis. This leverages not only the implementation of new algorithms, but also enables a sound analysis and comparison of existing algorithms.
Figures



Similar articles
-
The SeqAn C++ template library for efficient sequence analysis: A resource for programmers.J Biotechnol. 2017 Nov 10;261:157-168. doi: 10.1016/j.jbiotec.2017.07.017. Epub 2017 Sep 6. J Biotechnol. 2017. PMID: 28888961
-
Segment-based multiple sequence alignment.Bioinformatics. 2008 Aug 15;24(16):i187-92. doi: 10.1093/bioinformatics/btn281. Bioinformatics. 2008. PMID: 18689823
-
HotSwap for bioinformatics: a STRAP tutorial.BMC Bioinformatics. 2006 Feb 9;7:64. doi: 10.1186/1471-2105-7-64. BMC Bioinformatics. 2006. PMID: 16469097 Free PMC article.
-
Software packages for quantitative microarray-based gene expression analysis.Curr Pharm Biotechnol. 2003 Dec;4(6):417-37. doi: 10.2174/1389201033377436. Curr Pharm Biotechnol. 2003. PMID: 14683435 Review.
-
An overview of multiple sequence alignment.Curr Protoc Bioinformatics. 2003 Nov;Chapter 3:3.7.1-3.7.26. doi: 10.1002/0471250953.bi0307s03. Curr Protoc Bioinformatics. 2003. PMID: 18428699 Review.
Cited by
-
Denoising DNA deep sequencing data-high-throughput sequencing errors and their correction.Brief Bioinform. 2016 Jan;17(1):154-79. doi: 10.1093/bib/bbv029. Epub 2015 May 29. Brief Bioinform. 2016. PMID: 26026159 Free PMC article.
-
Probabilistic error correction for RNA sequencing.Nucleic Acids Res. 2013 May 1;41(10):e109. doi: 10.1093/nar/gkt215. Epub 2013 Apr 4. Nucleic Acids Res. 2013. PMID: 23558750 Free PMC article.
-
Single-cell mutation identification via phylogenetic inference.Nat Commun. 2018 Dec 4;9(1):5144. doi: 10.1038/s41467-018-07627-7. Nat Commun. 2018. PMID: 30514897 Free PMC article.
-
Reliable variant calling during runtime of Illumina sequencing.Sci Rep. 2019 Nov 11;9(1):16502. doi: 10.1038/s41598-019-52991-z. Sci Rep. 2019. PMID: 31712740 Free PMC article.
-
Seq: A High-Performance Language for Bioinformatics.Proc ACM Program Lang. 2019 Oct;3:125. doi: 10.1145/3360551. Epub 2019 Oct 10. Proc ACM Program Lang. 2019. PMID: 35775031 Free PMC article.
References
-
- Venter JC, Reinert K, et al. The Sequence of the Human Genome. Science. 2001;291:1145–1434.
-
- Myers EW. A fast bit-vector algorithm for approximate string matching based on dynamic programming. Journal of the ACM. 1999;46:395–415.
-
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. Journal of Molecular Biology. 1990;215:403–410. - PubMed
-
- Manber U, Myers E. SODA'90: Proceedings of the first annual ACM-SIAM symposium on Discrete algorithms. Society for Industrial and Applied Mathematics; 1990. Suffix arrays: a new method for on-line string searches; pp. 319–327.
-
- Myers EW, Sutton GG, Delcher AL, Dew IM, Fasulo DP, Flanigan MJ, Kravitz SA, Mobarry CM, Reinert KHJ, Remington KA, Anson EL, Bolanos RA, Chou HH, Jordan CM, Halpern AL, Lonardi S, Beasley EM, Brandon RC, Chen L, Dunn PJ, Lai Z, Liang Y, Nusskern DR, Zhan M, Zhang Q, Zheng X, Rubin GM, Adams MD, Venter JC. A Whole-Genome Assembly of Drosophila. Science. 2000;287:2196–2204. - PubMed
MeSH terms
LinkOut - more resources
Full Text Sources
Molecular Biology Databases
Miscellaneous