. 2013 May 31;8(5):e65632.

doi: 10.1371/journal.pone.0065632. Print 2013.

SOAP3-dp: fast, accurate and sensitive GPU-based short read aligner

Ruibang Luo¹, Thomas Wong, Jianqiao Zhu, Chi-Man Liu, Xiaoqian Zhu, Edward Wu, Lap-Kei Lee, Haoxiang Lin, Wenjuan Zhu, David W Cheung, Hing-Fung Ting, Siu-Ming Yiu, Shaoliang Peng, Chang Yu, Yingrui Li, Ruiqiang Li, Tak-Wah Lam

Affiliations

PMID: 23741504
PMCID: PMC3669295
DOI: 10.1371/journal.pone.0065632

SOAP3-dp: fast, accurate and sensitive GPU-based short read aligner

Ruibang Luo et al. PLoS One. 2013.

. 2013 May 31;8(5):e65632.

doi: 10.1371/journal.pone.0065632. Print 2013.

Authors

Affiliation

¹ HKU-BGI Bioinformatics Algorithms and Core Technology Research Laboratory & Department of Computer Science, University of Hong Kong, Hong Kong, China.

PMID: 23741504
PMCID: PMC3669295
DOI: 10.1371/journal.pone.0065632

Erratum in

PLoS One. 2013;8(8). doi:10.1371/annotation/823f3670-ed17-41ec-ba51-b50281651915

Abstract

To tackle the exponentially increasing throughput of Next-Generation Sequencing (NGS), most of the existing short-read aligners can be configured to favor speed in trade of accuracy and sensitivity. SOAP3-dp, through leveraging the computational power of both CPU and GPU with optimized algorithms, delivers high speed and sensitivity simultaneously. Compared with widely adopted aligners including BWA, Bowtie2, SeqAlto, CUSHAW2, GEM and GPU-based aligners BarraCUDA and CUSHAW, SOAP3-dp was found to be two to tens of times faster, while maintaining the highest sensitivity and lowest false discovery rate (FDR) on Illumina reads with different lengths. Transcending its predecessor SOAP3, which does not allow gapped alignment, SOAP3-dp by default tolerates alignment similarity as low as 60%. Real data evaluation using human genome demonstrates SOAP3-dp's power to enable more authentic variants and longer Indels to be discovered. Fosmid sequencing shows a 9.1% FDR on newly discovered deletions. SOAP3-dp natively supports BAM file format and provides the same scoring scheme as BWA, which enables it to be integrated into existing analysis pipelines. SOAP3-dp has been deployed on Amazon-EC2, NIH-Biowulf and Tianhe-1A.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

**Figure 1. Alignment workflow.**
For each read (paired-end specifically, single-end is only with step 1 and step 3), the alignment would be decided in at most three steps. In step 1, SOAP3-dp aligns both ends of a read-pair to the reference genome by using GPU version 2way-BWT algorithm (Methods). Pairs with only one end aligned proceed to step 2 for a GPU accelerated dynamic programming (Methods) alignment at candidate regions inferred from the aligned end. Pairs with both ends unaligned in step 1 and those ends failed in step 2 proceed to step 3 to perform a more comprehensive alignment across the whole genome until all seed hits (substrings from the read) are examined or until a sufficient number of alignments are examined.

**Figure 2. Speed and sensitivity of alignment using simulated paired-end reads.**
We recorded the number of correct and incorrect alignments stratified by reported mapping quality for each dataset. We then calculated the cumulative number of correct and incorrect alignments from high to low mapping quality. We considered an alignment correct only if the leftmost position was within 50 bp of the position assigned by the simulator on the same strand according to the previous study of Bowtie2 to avoid soft-clipping artifacts.

**Figure 3. The accumulated number of incorrectly aligned reads categorized at different mapping quality scores by the five aligners.**

**Figure 4. Alignment time consumption of using GPU card “GTX680” and previous generation GPU card “Tesla C2070” respectively.**

**Figure 5. The length distribution of Indels identified by SOAP3-dp and BWA respectively using full set of 100 bp paired-end YH sample reads.**
a. Indels smaller than or equal to 20 bp, b. larger than 20 bp.

See this image and copyright information in PMC

References

1. The 1000 Genomes Project Consortium (2010) A map of human genome variation from population-scale sequencing. Nature 467: 1061–1073. - PMC - PubMed
1. Li H, Ruan J, Durbin R (2008) Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res 18: 1851–1858. - PMC - PubMed
1. Langmead B, Trapnell C, Pop M, Salzberg SL (2009) Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome biology 10: R25. - PMC - PubMed
1. Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25: 1754–1760. - PMC - PubMed
1. Li R, Yu C, Li Y, Lam TW, Yiu SM, et al. (2009) SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics 25: 1966–1967. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations
Research Materials
- NCI CPTC Antibody Characterization Program
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

SOAP3-dp: fast, accurate and sensitive GPU-based short read aligner

Affiliation

SOAP3-dp: fast, accurate and sensitive GPU-based short read aligner

Authors

Affiliation

Erratum in

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials

Miscellaneous