Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Aug 1;33(15):2281-2287.
doi: 10.1093/bioinformatics/btx189.

Kart: a divide-and-conquer algorithm for NGS read alignment

Affiliations

Kart: a divide-and-conquer algorithm for NGS read alignment

Hsin-Nan Lin et al. Bioinformatics. .

Abstract

Motivation: Next-generation sequencing (NGS) provides a great opportunity to investigate genome-wide variation at nucleotide resolution. Due to the huge amount of data, NGS applications require very fast and accurate alignment algorithms. Most existing algorithms for read mapping basically adopt seed-and-extend strategy, which is sequential in nature and takes much longer time on longer reads.

Results: We develop a divide-and-conquer algorithm, called Kart, which can process long reads as fast as short reads by dividing a read into small fragments that can be aligned independently. Our experiment result indicates that the average size of fragments requiring the more time-consuming gapped alignment is around 20 bp regardless of the original read length. Furthermore, it can tolerate much higher error rates. The experiments show that Kart spends much less time on longer reads than other aligners and still produce reliable alignments even when the error rate is as high as 15%.

Availability and implementation: Kart is available at https://github.com/hsinnan75/Kart/ .

Contact: hsu@iis.sinica.edu.tw.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
The algorithm to explore all LMEMs with length ≥ k. BWT_search is the function to search for the occurrences of the maximal exact match for R[start, stop] on the given BWT array. It returns desirable LMEMs as well as their occurrences on the reference genome
Fig. 2
Fig. 2
Simple pair A overlaps with simple pair B. Kart removes the overlap by shrinking the size of the smaller simple pair
Fig. 3
Fig. 3
Simple pairs and normal pairs. A read sequence can be decomposed into different parts according to the alignment with the genome sequence. A simple pair represents a pair of identical sequence fragments; a normal pair represents a pair of sequence fragments which contains some sequence variations in the alignment

References

    1. Altschul S.F. et al. (1990) Basic local alignment search tool. J. Mol. Biol., 215, 403–410. - PubMed
    1. Chaisson M.J., Tesler G. (2012) Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory. BMC Bioinformatics, 13, 238. - PMC - PubMed
    1. Frith M.C. et al. (2010) Incorporating sequence quality data into alignment improves DNA read mapping. Nucleic Acids Res., 38, e100. - PMC - PubMed
    1. Hoffmann S. et al. (2009) Fast mapping of short sequences with mismatches, insertions and deletions using index structures. Plos Comput. Biol., 5, e1000502. - PMC - PubMed
    1. Homer N. et al. (2009) BFAST: an alignment tool for large scale genome resequencing. Plos One, 4, A95–A106. - PMC - PubMed