Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Oct 26;12(10):e0185056.
doi: 10.1371/journal.pone.0185056. eCollection 2017.

BBMerge - Accurate paired shotgun read merging via overlap

Affiliations

BBMerge - Accurate paired shotgun read merging via overlap

Brian Bushnell et al. PLoS One. .

Abstract

Merging paired-end shotgun reads generated on high-throughput sequencing platforms can substantially improve various subsequent bioinformatics processes, including genome assembly, binning, mapping, annotation, and clustering for taxonomic analysis. With the inexorable growth of sequence data volume and CPU core counts, the speed and scalability of read-processing tools becomes ever-more important. The accuracy of shotgun read merging is crucial as well, as errors introduced by incorrect merging percolate through to reduce the quality of downstream analysis. Thus, we designed a new tool to maximize accuracy and minimize processing time, allowing the use of read merging on larger datasets, and in analyses highly sensitive to errors. We present BBMerge, a new merging tool for paired-end shotgun sequence data. We benchmark BBMerge by comparison with eight other widely used merging tools, assessing speed, accuracy and scalability. Evaluations of both synthetic and real-world datasets demonstrate that BBMerge produces merged shotgun reads with greater accuracy and at higher speed than any existing merging tool examined. BBMerge also provides the ability to merge non-overlapping shotgun read pairs by using k-mer frequency information to assemble the unsequenced gap between reads, achieving a significantly higher merge rate while maintaining or increasing accuracy.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared no competing interests exist.

Figures

Fig 1
Fig 1
Merging scenarios in BBMerge modes: default (A-B), REM (C-F), and RSEM (G-I). The left column (Fig 1A,C,D,F) displays scenarios resulting in successfully merged reads, while the right column (Fig 1B,E,G,H) displays scenarios resulting in discarded unmerged pairs.
Fig 2
Fig 2. Relationship between % merged reads and genome coverage.
Fig 3
Fig 3
Comparison of merging accuracy by program using synthetic (A) and shotgun metagenome sequences (B). Correctly merged reads are defined as % of total input pairs. Program performance at default sensitivity is indicated by a triangle.
Fig 4
Fig 4. Speed comparison by program of shotgun metagenome sequences.
Fig 5
Fig 5. Scalability of each program, determined by measuring speed using various numbers of threads.
Fig 6
Fig 6. NA50 length and misassembly rates for a SPAdes assembly of each program’s output at default settings.

References

    1. Berka J, Chen Z, Egholm M, Godwin BC. Paired end sequencing. US Patent Office; 2009.
    1. Singer E, Andreopoulos B, Bowers RM, Lee J, Deshpande S, Chiniquy J, et al. Next generation sequencing data of a defined microbial mock community. Scientific Data. 2016;3: 160081 doi: 10.1038/sdata.2016.81 - DOI - PMC - PubMed
    1. Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, et al. Initial sequencing and analysis of the human genome. Nature. Nature Publishing Group; 2001;409: 860–921. doi: 10.1038/35057062 - DOI - PubMed
    1. Singer E, Andreopoulos B, Bowers RM, Lee J, Deshpande S, Chiniquy J, et al. Next generation sequencing data of a defined microbial mock community. Scientific Data. - PMC - PubMed
    1. Ng P, Wei C-L, Sung W-K, Chiu KP, Lipovich L, Ang CC, et al. Gene identification signature (GIS) analysis for transcriptome characterization and genome annotation. Nat Meth. 2005;2: 105–111. doi: 10.1038/nmeth733 - DOI - PubMed

MeSH terms