Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Sep 15;28(18):2366-73.
doi: 10.1093/bioinformatics/bts450. Epub 2012 Jul 18.

Fast and accurate read alignment for resequencing

Affiliations

Fast and accurate read alignment for resequencing

John C Mu et al. Bioinformatics. .

Abstract

Motivation: Next-generation sequence analysis has become an important task both in laboratory and clinical settings. A key stage in the majority sequence analysis workflows, such as resequencing, is the alignment of genomic reads to a reference genome. The accurate alignment of reads with large indels is a computationally challenging task for researchers.

Results: We introduce SeqAlto as a new algorithm for read alignment. For reads longer than or equal to 100 bp, SeqAlto is up to 10 × faster than existing algorithms, while retaining high accuracy and the ability to align reads with large (up to 50 bp) indels. This improvement in efficiency is particularly important in the analysis of future sequencing data where the number of reads approaches many billions. Furthermore, SeqAlto uses less than 8 GB of memory to align against the human genome. SeqAlto is benchmarked against several existing tools with both real and simulated data.

Availability: Linux and Mac OS X binaries free for academic use are available at http://www.stanford.edu/group/wonglab/seqalto

Contact: whwong@stanford.edu.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
100-bp single-end reads with deletion in random location. Reported results are for MAPQ ≥ value in parentheses. Data points with < 30% of the reads aligned correctly were removed due to erratic behavior of the percent correct value, (a) percent of reads aligned (b) percent of aligned reads correct.
Fig. 2.
Fig. 2.
100-bp single-end reads with insertion in random location. Reported results are for MAPQ ≥ value in parentheses. Data points with < 30% of the reads aligned correctly were removed due to erratic behavior of the percent correct value, (a) percent of reads aligned (b) percent of aligned reads correct.

References

    1. Baeza-yates RA, Perleberg CH. Combinatorial Pattern Matching, Third Annual Symposium. Springer-Verlag; 1992. Fast and practical approximate string matching; pp. 185–192.
    1. Burrows M, Wheeler DJ. A block-sorting lossless data compression algorithm. HP Labs Technical Reports, SRC-RR-124. 1994
    1. David M, et al. SHRiMP2: sensitive yet practical short read mapping. Bioinformatics. 2011;27:1011–1012. - PubMed
    1. DePristo MA, et al. A framework for variation discovery and genotyping using next-generation dna sequencing data. Nat. Genet. 2011;43:491–498. - PMC - PubMed
    1. Ferragina P, Manzini G. In Proc. 41st IEEE Symposium on Foundations of Computer Science (FOCS). Redondo, Beach, CA, USA. Redondo Beach, CA, USA: 2000. Opportunistic data structures with applications.

Publication types