Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 May 13:13:868280.
doi: 10.3389/fgene.2022.868280. eCollection 2022.

Enhancing Long-Read-Based Strain-Aware Metagenome Assembly

Affiliations

Enhancing Long-Read-Based Strain-Aware Metagenome Assembly

Xiao Luo et al. Front Genet. .

Abstract

Microbial communities are usually highly diverse and often involve multiple strains from the participating species due to the rapid evolution of microorganisms. In such a complex microecosystem, different strains may show different biological functions. While reconstruction of individual genomes at the strain level is vital for accurately deciphering the composition of microbial communities, the problem has largely remained unresolved so far. Next-generation sequencing has been routinely used in metagenome assembly but there have been struggles to generate strain-specific genome sequences due to the short-read length. This explains why long-read sequencing technologies have recently provided unprecedented opportunities to carry out haplotype- or strain-resolved genome assembly. Here, we propose MetaBooster and MetaBooster-HiFi, as two pipelines for strain-aware metagenome assembly from PacBio CLR and Oxford Nanopore long-read sequencing data. Benchmarking experiments on both simulated and real sequencing data demonstrate that either the MetaBooster or the MetaBooster-HiFi pipeline drastically outperforms the state-of-the-art de novo metagenome assemblers, in terms of all relevant metagenome assembly criteria, involving genome fraction, contig length, and error rates.

Keywords: genome assembly; haplotype; long reads; metagenome; strain.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

FIGURE 1
FIGURE 1
Workflow overview of MetaBooster and MetaBooster-HiFi. MetaBooster takes raw reads as input and outputs contigs using Canu (blue), whereas MetaBooster-HiFi takes raw reads as input and outputs contigs using HiCanu (red). The dotted arrow lines denote running Strainberry is optional. Canu (correction) means running Canu’s correction and trimming modules only. Canu (assembly) means running Canu’s assembly module only.

References

    1. Baaijens J. A., Stougie L., Schönhuth A. (2020). Strain-aware Assembly of Genomes from Mixed Samples Using Flow Variation Graphs. RECOMB, 221–222. 10.1007/978-3-030-45257-5_14 - DOI
    1. Bankevich A., Nurk S., Antipov D., Gurevich A. A., Dvorkin M., Kulikov A. S. (2012). Spades: a New Genome Assembly Algorithm and its Applications to Single-Cell Sequencing. J. Comput. Biol. 19, 455–477. 10.1089/cmb.2012.0021 - DOI - PMC - PubMed
    1. Bishara A., Moss E. L., Kolmogorov M., Parada A. E., Weng Z., Sidow A., et al. (2018). High-quality Genome Sequences of Uncultured Microbes by Assembly of Read Clouds. Nat. Biotechnol. 36, 1067–1075. 10.1038/nbt.4266 - DOI - PMC - PubMed
    1. Bonanno L., Loukiadis E., Mariani-Kurkdjian P., Oswald E., Garnier L., Michel V., et al. (2015). Diversity of Shiga Toxin-Producing escherichia Coli (Stec) O26: H11 Strains Examined via Stx Subtypes and Insertion Sites of Stx and Espk Bacteriophages. Appl. Environ. Microbiol. 81, 3712–3721. 10.1128/aem.00077-15 - DOI - PMC - PubMed
    1. Burger R. (2012). Ehec o104: H4 in germany 2011: Large outbreak of bloody diarrhea and haemolytic uraemic syndrome by shiga toxin-producing e. coli via contaminated food

LinkOut - more resources