Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Jul 1;10(1):10689.
doi: 10.1038/s41598-020-67416-5.

Terabase-scale metagenome coassembly with MetaHipMer

Affiliations

Terabase-scale metagenome coassembly with MetaHipMer

Steven Hofmeyr et al. Sci Rep. .

Abstract

Metagenome sequence datasets can contain terabytes of reads, too many to be coassembled together on a single shared-memory computer; consequently, they have only been assembled sample by sample (multiassembly) and combining the results is challenging. We can now perform coassembly of the largest datasets using MetaHipMer, a metagenome assembler designed to run on supercomputers and large clusters of compute nodes. We have reported on the implementation of MetaHipMer previously; in this paper we focus on analyzing the impact of very large coassembly. In particular, we show that coassembly recovers a larger genome fraction than multiassembly and enables the discovery of more complete genomes, with lower error rates, whereas multiassembly recovers more dominant strain variation. Being able to coassemble a large dataset does not preclude one from multiassembly; rather, having a fast, scalable metagenome assembler enables a user to more easily perform coassembly and multiassembly, and assemble both abundant, high strain variation genomes, and low-abundance, rare genomes. We present several assemblies of terabyte datasets that could never be coassembled before, demonstrating MetaHipMer's scaling power. MetaHipMer is available for public use under an open source license and all datasets used in the paper are available for public download.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
Genome fractions for references from MarRef found in the WA assemblies.
Figure 2
Figure 2
Genome fractions for strains of P. ubique found in the WA assemblies.
Figure 3
Figure 3
Genome fraction vs depth for synthetic reference genomes within WAmix.
Figure 4
Figure 4
Cumulative lengths for contigs aligned to synthetic reference genomes within WAmix.
Figure 5
Figure 5
Genome fraction vs depth for assemblies of the ArcticSynth dataset.
Figure 6
Figure 6
Cumulative lengths for contigs for assemblies of the ArcticSynth dataset.
Figure 7
Figure 7
Iterative contig generation workflow in MetaHipMer. Image source: Georganas et al.. Reproduced under a CC BY 4.0 open access license by permission of E. Georganas.

References

    1. Howe AC, et al. Tackling soil diversity with the assembly of large, complex metagenomes. Proc. Nat. Acad. Sci. 2014;111:4904–4909. doi: 10.1073/pnas.1402564111. - DOI - PMC - PubMed
    1. Scholz M, Lo C-C, Chain PSG. Improved assemblies using a source-agnostic pipeline for MetaGenomic Assembly by Merging (MeGAMerge) of contigs. Sci. Rep. 2014;4:6480. doi: 10.1038/srep06480. - DOI - PMC - PubMed
    1. Deng X, et al. An ensemble strategy that significantly improves de novo assembly of microbial genomes from metagenomic next-generation sequencing data. Nucleic Acids Res. 2015;43:e46. doi: 10.1093/nar/gkv002. - DOI - PMC - PubMed
    1. Royalty, T.M. & Steen, A.D. Simulation-based approaches to characterize the effect of sequencing depth on the quantity and quality of metagenome-assembled genomes. bioRxiv 356840 (2018).
    1. Rodriguez-R, L. M. et al. Nonpareil 3: fast estimation of metagenomic coverage and sequence diversity. mSystems3, e00039. 10.1128/mSystems.00039-18 (2018). - PMC - PubMed

Publication types