Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 May;27(5):737-746.
doi: 10.1101/gr.214270.116. Epub 2017 Jan 18.

Fast and accurate de novo genome assembly from long uncorrected reads

Affiliations

Fast and accurate de novo genome assembly from long uncorrected reads

Robert Vaser et al. Genome Res. 2017 May.

Abstract

The assembly of long reads from Pacific Biosciences and Oxford Nanopore Technologies typically requires resource-intensive error-correction and consensus-generation steps to obtain high-quality assemblies. We show that the error-correction step can be omitted and that high-quality consensus sequences can be generated efficiently with a SIMD-accelerated, partial-order alignment-based, stand-alone consensus module called Racon. Based on tests with PacBio and Oxford Nanopore data sets, we show that Racon coupled with miniasm enables consensus genomes with similar or better quality than state-of-the-art methods while being an order of magnitude faster.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Racon's speedup compared with FALCON and Canu.
Figure 2.
Figure 2.
Scalability of Racon as a function of genome size. Read coverage was subsampled to be 81× (limited by the Caenorhabditis elegans data set), and the figure shows results for one iteration of Racon.
Figure 3.
Figure 3.
Overview of the Racon consensus process.
Algorithm 1.
Algorithm 1.
The Racon algorithm for consensus generation.
Algorithm 2.
Algorithm 2.
Functions for filtering mappings/overlaps in Racon.
Figure 4.
Figure 4.
Depiction of the SIMD vectorization approach used in SPOA.
Algorithm 3.
Algorithm 3.
Pseudocode for the SPOA algorithm. The displayed function aligns a sequence to a preconstructed POA graph using SIMD intrinsics. Capitalized variables are SIMD vectors. Alignment mode is Needleman-Wunsch.

References

    1. Berlin K, Koren S, Chin C-S, Drake JP, Landolin JM, Phillippy AM. 2015. Assembling large genomes with single-molecule sequencing and locality-sensitive hashing. Nat Biotechnol 33: 623–630. - PubMed
    1. Chaisson MJ, Tesler G. 2012. Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory. BMC Bioinformatics 13: 238. - PMC - PubMed
    1. Chin C-S, Alexander DH, Marks P, Klammer AA, Drake J, Heiner C, Clum A, Copeland A, Huddleston J, Eichler EE, et al. 2013. Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat Methods 10: 563–569. - PubMed
    1. Chin C-S, Peluso P, Sedlazeck FJ, Nattestad M, Concepcion GT, Clum A, Dunn C, O'Malley R, Figueroa-Balderas R, Morales-Cruz A, et al. 2016. Phased diploid genome assembly with single molecule real-time sequencing. Nat Methods 13: 1050–1054. - PMC - PubMed
    1. Delcher AL, Salzberg SL, Phillippy AM. 2003. Using MUMmer to identify similar regions in large sequence sets. Curr Protoc Bioinformatics Chapter 10: Unit 10.3. - PubMed

Publication types