Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013;14 Suppl 11(Suppl 11):S4.
doi: 10.1186/1471-2105-14-S11-S4. Epub 2013 Nov 4.

libgapmis: extending short-read alignments

libgapmis: extending short-read alignments

Nikolaos Alachiotis et al. BMC Bioinformatics. 2013.

Abstract

Background: A wide variety of short-read alignment programmes have been published recently to tackle the problem of mapping millions of short reads to a reference genome, focusing on different aspects of the procedure such as time and memory efficiency, sensitivity, and accuracy. These tools allow for a small number of mismatches in the alignment; however, their ability to allow for gaps varies greatly, with many performing poorly or not allowing them at all. The seed-and-extend strategy is applied in most short-read alignment programmes. After aligning a substring of the reference sequence against the high-quality prefix of a short read--the seed--an important problem is to find the best possible alignment between a substring of the reference sequence succeeding and the remaining suffix of low quality of the read--extend. The fact that the reads are rather short and that the gap occurrence frequency observed in various studies is rather low suggest that aligning (parts of) those reads with a single gap is in fact desirable.

Results: In this article, we present libgapmis, a library for extending pairwise short-read alignments. Apart from the standard CPU version, it includes ultrafast SSE- and GPU-based implementations. libgapmis is based on an algorithm computing a modified version of the traditional dynamic-programming matrix for sequence alignment. Extensive experimental results demonstrate that the functions of the CPU version provided in this library accelerate the computations by a factor of 20 compared to other programmes. The analogous SSE- and GPU-based implementations accelerate the computations by a factor of 6 and 11, respectively, compared to the CPU version. The library also provides the user the flexibility to split the read into fragments, based on the observed gap occurrence frequency and the length of the read, thereby allowing for a variable, but bounded, number of gaps in the alignment.

Conclusions: We present libgapmis, a library for extending pairwise short-read alignments. We show that libgapmis is better-suited and more efficient than existing algorithms for this task. The importance of our contribution is underlined by the fact that the provided functions may be seamlessly integrated into any short-read alignment pipeline. The open-source code of libgapmis is available at http://www.exelixis-lab.org/gapmis.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Seed-and-extend strategy. The alignment between the fragment of the reference sequence, starting at position 1 and ending at position 9, and the read with one mismatch at position 8 and a gap of length two inserted in the read after position 4. This figure was taken from [21].
Figure 2
Figure 2
Global, local, and semi-global alignment. The global, local, and semi-global alignments between t = CGTCCGAAGTG and x = TACGAA. This figure was taken from [21].
Figure 3
Figure 3
Distribution of gap lengths in exome sequencing. The distribution of gap lengths in exome sequencing. The data were generated by the Exome Sequencing Programme at the NIHR Biomedical Research Centre at Guy's and St Thomas' NHS Foundation Trust in partnership with King's College London. This figure was taken from [21].
Figure 4
Figure 4
Dynamic-programming matrices. The matrices G, H, GP, and HP for t = AGGTCAT, x = GGGTA, and β = 2. This figure was taken from [21].
Figure 5
Figure 5
Single-gap alignment. The single-gap alignment between t = AGGTCAT and x = GGGTA for k = 1, α = 1, and β = 1. This figure was taken from [21].
Figure 6
Figure 6
Inter-sequence GPU memory organisation. The inter-sequence GPU memory organisation. This figure was taken from [21].
Figure 7
Figure 7
Correct alignments. The correct alignments of Tables 1-3.
Figure 8
Figure 8
Processing times of needle and gapmis. The processing times of needle and gapmis for aligning 10, 000 pairs of sequences.
Figure 9
Figure 9
Processing times of gapmis_one_to_many. The processing times of gapmis_one_to_many for aligning a query sequence and 4, 639, 576 target sequences.
Figure 10
Figure 10
Processing times of gapmis_many_to_many. The processing times of gapmis_many_to_many for aligning 1, 000, 000 query sequences and 200 target sequences.

References

    1. Levenshtein VI. Tech Rep 8. Soviet Physics Doklady; 1966. Binary codes capable of correcting deletions, insertions, and reversals.
    1. Wagner RA, Fischer MJ. The String-to-String Correction Problem. Journal of the ACM. 1974;21:168–173. doi: 10.1145/321796.321811. - DOI
    1. Sellers PH. On the Theory and Computation of Evolutionary Distances. SIAM Journal on Applied Mathematics. 1974;26(4):787–793. doi: 10.1137/0126070. - DOI
    1. Heckel P. A technique for isolating differences between files. Communications of the ACM. 1978;21(4):264–268. doi: 10.1145/359460.359467. - DOI
    1. Peterson JL. Computer programs for detecting and correcting spelling errors. Communications of the ACM. 1980;23(12):676–687. doi: 10.1145/359038.359041. - DOI

Publication types