Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Apr;34(4):374-6.
doi: 10.1038/nbt.3511.

Compressive mapping for next-generation sequencing

Affiliations

Compressive mapping for next-generation sequencing

Deniz Yorukoglu et al. Nat Biotechnol. 2016 Apr.
No abstract available

PubMed Disclaimer

Figures

Figure 1
Figure 1
(a) Run-time comparison results between conventional read mapping methods and CORA for whole-genome gapped and ungapped all-mapping of 1000 Genomes Phase 1 Illumina 2 × 108 bp paired-end read data sets of one, two, and four Finnish individuals (FIN1, FIN2 and FIN4, with approximately 4×, 8× and 16× read depth-coverage, respectively; graph at left). The mapping similarity threshold is defined as the Levenshtein (edit) distance of 4 for each 108 bp-long read end. For the FIN4 data set, we additionally performed ungapped mapping experiments with the similarity threshold set as Hamming distance of 4 for each end (graph at right). We compared all-mapping run-times of Bowtie2 v2.1.0 (with ‘–a’ parameter) and BWA aln v0.7.5a (with ‘–N’ parameter) against compressively accelerated versions of each (CORA-Bowtie2 and CORA-BWA); for the ungapped mapping experiment, we also compared against mrsFAST-Ultra v3.3, which does not perform gapped mapping. We included read data set compression in the run-time for CORA mappers, but not time to build the homology table; similarly, we did not include genome indexing for other mappers. To ensure consistency across run-time comparisons, we assumed that all paired-end mappings of a read should be reported individually and consecutively, so that a downstream method can directly use the mapping output. Both CORA mappers and Bowtie2 readily satisfied these criteria; the additional computation needed to ensure this for BWA and mrsFAST-Ultra are indicated with a lighter shade (Supplementary Text). (b) Sensitivity comparisons indicate that CORA mappers are substantially more sensitive than BWA and Bowtie2 for both gapped (lower) and ungapped (upper) all-mapping. Though it does not have 100% sensitivity like mrsFAST-Ultra, CORA is able to report mapping results with near-perfect sensitivity (~99.7%) for ungapped all-mapping. Color key as in a. (c) CORA’s compressive framework achieves speed gains inversely related to the sequencing error rate. The graph shows the run-time of full and coarse ungapped mappings of CORA-BWA when aligning 20 million simulated paired-end reads (100 bp) onto hg19 chromosome 20 at varying sequencing error rates and a fixed mutation rate of 0.1%. ‘Coarse only’ run-time stands for the time required to run BWA within the CORA-BWA pipeline.

References

    1. Berger B, Peng J, Singh M. Nature Reviews Genetics. 2013;14:333–346. - PMC - PubMed
    1. Loh PR, Baym M, Berger B. Nature Biotechnology. 2012;30:627–630. - PubMed
    1. Li H, Durbin R. Bioinformatics. 2009;25:1754–1760. - PMC - PubMed
    1. Langmead B, Salzberg SL. Nature Methods. 2012;9:357–359. - PMC - PubMed
    1. Huang L, Popic V, Batzoglou S. Bioinformatics. 2013;29:i361–i370. - PMC - PubMed

Publication types

LinkOut - more resources