Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Nov 1;29(21):2669-77.
doi: 10.1093/bioinformatics/btt476. Epub 2013 Aug 29.

The MaSuRCA genome assembler

Affiliations

The MaSuRCA genome assembler

Aleksey V Zimin et al. Bioinformatics. .

Abstract

Motivation: Second-generation sequencing technologies produce high coverage of the genome by short reads at a low cost, which has prompted development of new assembly methods. In particular, multiple algorithms based on de Bruijn graphs have been shown to be effective for the assembly problem. In this article, we describe a new hybrid approach that has the computational efficiency of de Bruijn graph methods and the flexibility of overlap-based assembly strategies, and which allows variable read lengths while tolerating a significant level of sequencing error. Our method transforms large numbers of paired-end reads into a much smaller number of longer 'super-reads'. The use of super-reads allows us to assemble combinations of Illumina reads of differing lengths together with longer reads from 454 and Sanger sequencing technologies, making it one of the few assemblers capable of handling such mixtures. We call our system the Maryland Super-Read Celera Assembler (abbreviated MaSuRCA and pronounced 'mazurka').

Results: We evaluate the performance of MaSuRCA against two of the most widely used assemblers for Illumina data, Allpaths-LG and SOAPdenovo2, on two datasets from organisms for which high-quality assemblies are available: the bacterium Rhodobacter sphaeroides and chromosome 16 of the mouse genome. We show that MaSuRCA performs on par or better than Allpaths-LG and significantly better than SOAPdenovo on these data, when evaluated against the finished sequence. We then show that MaSuRCA can significantly improve its assemblies when the original data are augmented with long reads.

Availability: MaSuRCA is available as open-source code at ftp://ftp.genome.umd.edu/pub/MaSuRCA/. Previous (pre-publication) releases have been publicly available for over a year.

Contact: alekseyz@ipst.umd.edu.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Reads 1, 2 and 3 yield the same super-read. Reads are depicted by the black solid lines. Dashed lines represent the k-mer extensions starting from k-mers in the reads 1, 2 and 3. The super-read is depicted by the thick solid line. All reads that extend to the same super-read are replaced by that super-read
Fig. 2.
Fig. 2.
An example of a read whose super-read has two k-unitigs. Read R contains k-mers M1 and M2 on its ends. M1 and M2 each belong to k-unitigs K1 and K2, respectively. K-unitigs K1 and K2 are shown in blue, and the matching k-mers M1 and M2 are shown in red and green. K1 and K2 overlap by k-1 bases. We extend read R on both ends producing a super-read, also depicted in blue. A super-read can consist of one k-unitig or can contain many k-unitigs

References

    1. Bankevich A, et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 2012;19:455–477. - PMC - PubMed
    1. Batzoglou S, et al. ARACHNE: a whole-genome shotgun assembler. Genome Res. 2002;12:177–189. - PMC - PubMed
    1. Chaisson MJ, Pevzner PA. Short read fragment assembly of bacterial genomes. Genome Res. 2008;18:324–330. - PMC - PubMed
    1. Choudhary M, et al. Genome analyses of three strains of Rhodobacter sphaeroides: evidence of rapid evolution of chromosome II. J. Bacteriol. 2007;189:1914–1921. - PMC - PubMed
    1. Chevreux B, et al. Using the miraEST assembler for reliable and automated mRNA transcript assembly and SNP detection in sequenced ESTs. Genome Res. 2004;14:1147–1159. - PMC - PubMed

Publication types