Fast and accurate long-read alignment with Burrows-Wheeler transform
- PMID: 20080505
- PMCID: PMC2828108
- DOI: 10.1093/bioinformatics/btp698
Fast and accurate long-read alignment with Burrows-Wheeler transform
Abstract
Motivation: Many programs for aligning short sequencing reads to a reference genome have been developed in the last 2 years. Most of them are very efficient for short reads but inefficient or not applicable for reads >200 bp because the algorithms are heavily and specifically tuned for short queries with low sequencing error rate. However, some sequencing platforms already produce longer reads and others are expected to become available soon. For longer reads, hashing-based software such as BLAT and SSAHA2 remain the only choices. Nonetheless, these methods are substantially slower than short-read aligners in terms of aligned bases per unit time.
Results: We designed and implemented a new algorithm, Burrows-Wheeler Aligner's Smith-Waterman Alignment (BWA-SW), to align long sequences up to 1 Mb against a large sequence database (e.g. the human genome) with a few gigabytes of memory. The algorithm is as accurate as SSAHA2, more accurate than BLAT, and is several to tens of times faster than both.
Availability: http://bio-bwa.sourceforge.net
Figures

Similar articles
-
YOABS: yet other aligner of biological sequences--an efficient linearly scaling nucleotide aligner.Bioinformatics. 2012 Apr 15;28(8):1070-7. doi: 10.1093/bioinformatics/bts102. Epub 2012 Mar 7. Bioinformatics. 2012. PMID: 22402614
-
Fast and accurate short read alignment with Burrows-Wheeler transform.Bioinformatics. 2009 Jul 15;25(14):1754-60. doi: 10.1093/bioinformatics/btp324. Epub 2009 May 18. Bioinformatics. 2009. PMID: 19451168 Free PMC article.
-
Anatomy of a hash-based long read sequence mapping algorithm for next generation DNA sequencing.Bioinformatics. 2011 Jan 15;27(2):189-95. doi: 10.1093/bioinformatics/btq648. Epub 2010 Nov 18. Bioinformatics. 2011. PMID: 21088030
-
A survey of sequence alignment algorithms for next-generation sequencing.Brief Bioinform. 2010 Sep;11(5):473-83. doi: 10.1093/bib/bbq015. Epub 2010 May 11. Brief Bioinform. 2010. PMID: 20460430 Free PMC article. Review.
-
Technology dictates algorithms: recent developments in read alignment.Genome Biol. 2021 Aug 26;22(1):249. doi: 10.1186/s13059-021-02443-7. Genome Biol. 2021. PMID: 34446078 Free PMC article. Review.
Cited by
-
A cluster of Shiga Toxin-producing Escherichia coli O157:H7 highlights raw pet food as an emerging potential source of infection in humans.Epidemiol Infect. 2021 May 6;149:e124. doi: 10.1017/S0950268821001072. Epidemiol Infect. 2021. PMID: 33955833 Free PMC article.
-
Evidence for early dispersal of domestic sheep into Central Asia.Nat Hum Behav. 2021 Sep;5(9):1169-1179. doi: 10.1038/s41562-021-01083-y. Epub 2021 Apr 8. Nat Hum Behav. 2021. PMID: 33833423
-
Prevalence and Prognostic Relevance of Homologous Recombination Repair Gene Mutations in Uterine Serous Carcinoma.Cells. 2022 Nov 11;11(22):3563. doi: 10.3390/cells11223563. Cells. 2022. PMID: 36428992 Free PMC article.
-
Identification of immune subtypes of cervical squamous cell carcinoma predicting prognosis and immunotherapy responses.J Transl Med. 2021 May 24;19(1):222. doi: 10.1186/s12967-021-02894-3. J Transl Med. 2021. PMID: 34030694 Free PMC article.
-
BrAN contributes to leafy head formation by regulating leaf width in Chinese cabbage (Brassica rapa L. ssp. pekinensis).Hortic Res. 2022 Jul 27;9:uhac167. doi: 10.1093/hr/uhac167. eCollection 2022. Hortic Res. 2022. PMID: 36204207 Free PMC article.
References
-
- Blumer A, et al. The smallest automaton recognizing the subwords of a text. Theor. Comput. Sci. 1985;40:31–55.
-
- Burrows M, Wheeler DJ. Technical report 124. Palo Alto, CA: Digital Equipment Corporation; 1994. A block-sorting lossless data compression algorithm.
-
- Eid J, et al. Real-time DNA sequencing from single polymerase molecules. Science. 2009;323:133–138. - PubMed
-
- Ferragina P, Manzini G. Proceedings of the 41st Symposium on Foundations of Computer Science (FOCS 2000) Redondo Beach, CA, USA: 2000. Opportunistic data structures with applications; pp. 390–398.
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Medical
Molecular Biology Databases
Miscellaneous