Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Apr 1;32(7):1009-15.
doi: 10.1093/bioinformatics/btv688. Epub 2015 Nov 20.

hybridSPAdes: an algorithm for hybrid assembly of short and long reads

Affiliations

hybridSPAdes: an algorithm for hybrid assembly of short and long reads

Dmitry Antipov et al. Bioinformatics. .

Abstract

Motivation: Recent advances in single molecule real-time (SMRT) and nanopore sequencing technologies have enabled high-quality assemblies from long and inaccurate reads. However, these approaches require high coverage by long reads and remain expensive. On the other hand, the inexpensive short reads technologies produce accurate but fragmented assemblies. Thus, a hybrid approach that assembles long reads (with low coverage) and short reads has a potential to generate high-quality assemblies at reduced cost.

Results: We describe hybridSPAdes algorithm for assembling short and long reads and benchmark it on a variety of bacterial assembly projects. Our results demonstrate that hybridSPAdes generates accurate assemblies (even in projects with relatively low coverage by long reads) thus reducing the overall cost of genome sequencing. We further present the first complete assembly of a genome from single cells using SMRT reads.

Availability and implementation: hybridSPAdes is implemented in C++ as a part of SPAdes genome assembler and is publicly available at http://bioinf.spbau.ru/en/spades

Contact: d.antipov@spbu.ru

Supplementary information: supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Three pairs of long edges in the assembly graph (corresponding to unique regions in the genome and shown as colored edges) separated by short edges that represent repeats in the genome (shown in black). The genome path traverses edges of the same color in the consecutive fashion. Two dotted paths represent two different options for a long read (with fixed length and alignment to long edges) to traverse this repetitive region. The goal is to figure which of these dotted paths is correct

References

    1. Ashton P.M., et al. (2015) MinION nanopore sequencing identifies the position and structure of a bacterial antibiotic resistance island. Nat. Biotechnol., 33, 296–300. - PubMed
    1. Bankevich A., et al. (2012) SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol., 19, 455–477. - PMC - PubMed
    1. Berlin K., et al. (2015) Assembling large genomes with single-molecule sequencing and locality-sensitive hashing. Nat. Biotechnol., 33, 623–630. - PubMed
    1. Boisvert S., et al. (2010) Ray: Simultaneous assembly of reads from a mix of high-throughput sequencing technologies. J. Comput. Biol., 17, 1519–1533. - PMC - PubMed
    1. Bresler M., et al. (2012) Telescoper: de novo assembly of highly repetitive regions. Bioinformatics, 28, 311–317. - PMC - PubMed