Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 May 31;8(5):e65632.
doi: 10.1371/journal.pone.0065632. Print 2013.

SOAP3-dp: fast, accurate and sensitive GPU-based short read aligner

Affiliations

SOAP3-dp: fast, accurate and sensitive GPU-based short read aligner

Ruibang Luo et al. PLoS One. .

Erratum in

  • PLoS One. 2013;8(8). doi:10.1371/annotation/823f3670-ed17-41ec-ba51-b50281651915

Abstract

To tackle the exponentially increasing throughput of Next-Generation Sequencing (NGS), most of the existing short-read aligners can be configured to favor speed in trade of accuracy and sensitivity. SOAP3-dp, through leveraging the computational power of both CPU and GPU with optimized algorithms, delivers high speed and sensitivity simultaneously. Compared with widely adopted aligners including BWA, Bowtie2, SeqAlto, CUSHAW2, GEM and GPU-based aligners BarraCUDA and CUSHAW, SOAP3-dp was found to be two to tens of times faster, while maintaining the highest sensitivity and lowest false discovery rate (FDR) on Illumina reads with different lengths. Transcending its predecessor SOAP3, which does not allow gapped alignment, SOAP3-dp by default tolerates alignment similarity as low as 60%. Real data evaluation using human genome demonstrates SOAP3-dp's power to enable more authentic variants and longer Indels to be discovered. Fosmid sequencing shows a 9.1% FDR on newly discovered deletions. SOAP3-dp natively supports BAM file format and provides the same scoring scheme as BWA, which enables it to be integrated into existing analysis pipelines. SOAP3-dp has been deployed on Amazon-EC2, NIH-Biowulf and Tianhe-1A.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Alignment workflow.
For each read (paired-end specifically, single-end is only with step 1 and step 3), the alignment would be decided in at most three steps. In step 1, SOAP3-dp aligns both ends of a read-pair to the reference genome by using GPU version 2way-BWT algorithm (Methods). Pairs with only one end aligned proceed to step 2 for a GPU accelerated dynamic programming (Methods) alignment at candidate regions inferred from the aligned end. Pairs with both ends unaligned in step 1 and those ends failed in step 2 proceed to step 3 to perform a more comprehensive alignment across the whole genome until all seed hits (substrings from the read) are examined or until a sufficient number of alignments are examined.
Figure 2
Figure 2. Speed and sensitivity of alignment using simulated paired-end reads.
We recorded the number of correct and incorrect alignments stratified by reported mapping quality for each dataset. We then calculated the cumulative number of correct and incorrect alignments from high to low mapping quality. We considered an alignment correct only if the leftmost position was within 50 bp of the position assigned by the simulator on the same strand according to the previous study of Bowtie2 to avoid soft-clipping artifacts.
Figure 3
Figure 3. The accumulated number of incorrectly aligned reads categorized at different mapping quality scores by the five aligners.
Figure 4
Figure 4. Alignment time consumption of using GPU card “GTX680” and previous generation GPU card “Tesla C2070” respectively.
Figure 5
Figure 5. The length distribution of Indels identified by SOAP3-dp and BWA respectively using full set of 100 bp paired-end YH sample reads.
a. Indels smaller than or equal to 20 bp, b. larger than 20 bp.

References

    1. The 1000 Genomes Project Consortium (2010) A map of human genome variation from population-scale sequencing. Nature 467: 1061–1073. - PMC - PubMed
    1. Li H, Ruan J, Durbin R (2008) Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res 18: 1851–1858. - PMC - PubMed
    1. Langmead B, Trapnell C, Pop M, Salzberg SL (2009) Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome biology 10: R25. - PMC - PubMed
    1. Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25: 1754–1760. - PMC - PubMed
    1. Li R, Yu C, Li Y, Lam TW, Yiu SM, et al. (2009) SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics 25: 1966–1967. - PubMed

Publication types