Anatomy of a hash-based long read sequence mapping algorithm for next generation DNA sequencing
- PMID: 21088030
- DOI: 10.1093/bioinformatics/btq648
Anatomy of a hash-based long read sequence mapping algorithm for next generation DNA sequencing
Abstract
Motivation: Recently, a number of programs have been proposed for mapping short reads to a reference genome. Many of them are heavily optimized for short-read mapping and hence are very efficient for shorter queries, but that makes them inefficient or not applicable for reads longer than 200 bp. However, many sequencers are already generating longer reads and more are expected to follow. For long read sequence mapping, there are limited options; BLAT, SSAHA2, FANGS and BWA-SW are among the popular ones. However, resequencing and personalized medicine need much faster software to map these long sequencing reads to a reference genome to identify SNPs or rare transcripts.
Results: We present AGILE (AliGnIng Long rEads), a hash table based high-throughput sequence mapping algorithm for longer 454 reads that uses diagonal multiple seed-match criteria, customized q-gram filtering and a dynamic incremental search approach among other heuristics to optimize every step of the mapping process. In our experiments, we observe that AGILE is more accurate than BLAT, and comparable to BWA-SW and SSAHA2. For practical error rates (< 5%) and read lengths (200-1000 bp), AGILE is significantly faster than BLAT, SSAHA2 and BWA-SW. Even for the other cases, AGILE is comparable to BWA-SW and several times faster than BLAT and SSAHA2.
Availability: http://www.ece.northwestern.edu/~smi539/agile.html.
Similar articles
-
YOABS: yet other aligner of biological sequences--an efficient linearly scaling nucleotide aligner.Bioinformatics. 2012 Apr 15;28(8):1070-7. doi: 10.1093/bioinformatics/bts102. Epub 2012 Mar 7. Bioinformatics. 2012. PMID: 22402614
-
Fast and accurate long-read alignment with Burrows-Wheeler transform.Bioinformatics. 2010 Mar 1;26(5):589-95. doi: 10.1093/bioinformatics/btp698. Epub 2010 Jan 15. Bioinformatics. 2010. PMID: 20080505 Free PMC article.
-
FR-HIT, a very fast program to recruit metagenomic reads to homologous reference genomes.Bioinformatics. 2011 Jun 15;27(12):1704-5. doi: 10.1093/bioinformatics/btr252. Epub 2011 Apr 19. Bioinformatics. 2011. PMID: 21505035 Free PMC article.
-
Review of alignment and SNP calling algorithms for next-generation sequencing data.J Appl Genet. 2016 Feb;57(1):71-9. doi: 10.1007/s13353-015-0292-7. Epub 2015 Jun 9. J Appl Genet. 2016. PMID: 26055432 Review.
-
Alignment of Next-Generation Sequencing Reads.Annu Rev Genomics Hum Genet. 2015;16:133-51. doi: 10.1146/annurev-genom-090413-025358. Epub 2015 May 4. Annu Rev Genomics Hum Genet. 2015. PMID: 25939052 Review.
Cited by
-
Technology dictates algorithms: recent developments in read alignment.Genome Biol. 2021 Aug 26;22(1):249. doi: 10.1186/s13059-021-02443-7. Genome Biol. 2021. PMID: 34446078 Free PMC article. Review.
-
Whole genome sequencing as a means to assess pathogenic mutations in medical genetics and cancer.Cell Mol Life Sci. 2015 Apr;72(8):1463-71. doi: 10.1007/s00018-014-1807-9. Epub 2014 Dec 30. Cell Mol Life Sci. 2015. PMID: 25548800 Free PMC article. Review.
-
Short Read Mapping: An Algorithmic Tour.Proc IEEE Inst Electr Electron Eng. 2017 Mar;105(3):436-458. doi: 10.1109/JPROC.2015.2455551. Epub 2015 Sep 7. Proc IEEE Inst Electr Electron Eng. 2017. PMID: 28502990 Free PMC article.
-
Experience of targeted Usher exome sequencing as a clinical test.Mol Genet Genomic Med. 2014 Jan;2(1):30-43. doi: 10.1002/mgg3.25. Epub 2013 Jul 10. Mol Genet Genomic Med. 2014. PMID: 24498627 Free PMC article.
-
The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote.Nucleic Acids Res. 2013 May 1;41(10):e108. doi: 10.1093/nar/gkt214. Epub 2013 Apr 4. Nucleic Acids Res. 2013. PMID: 23558742 Free PMC article.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources