Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2024 Jun;21(6):954-966.
doi: 10.1038/s41592-024-02262-1. Epub 2024 Apr 30.

Unveiling microbial diversity: harnessing long-read sequencing technology

Affiliations
Review

Unveiling microbial diversity: harnessing long-read sequencing technology

Daniel P Agustinho et al. Nat Methods. 2024 Jun.

Abstract

Long-read sequencing has recently transformed metagenomics, enhancing strain-level pathogen characterization, enabling accurate and complete metagenome-assembled genomes, and improving microbiome taxonomic classification and profiling. These advancements are not only due to improvements in sequencing accuracy, but also happening across rapidly changing analysis methods. In this Review, we explore long-read sequencing's profound impact on metagenomics, focusing on computational pipelines for genome assembly, taxonomic characterization and variant detection, to summarize recent advancements in the field and provide an overview of available analytical methods to fully leverage long reads. We provide insights into the advantages and disadvantages of long reads over short reads and their evolution from the early days of long-read sequencing to their recent impact on metagenomics and clinical diagnostics. We further point out remaining challenges for the field such as the integration of methylation signals in sub-strain analysis and the lack of benchmarks.

PubMed Disclaimer

Conflict of interest statement

Competing interests

F.J.S. has received research funding from Illumina, PacBio, Genentech and Oxford Nanopore. V.K.M. is an employee of Genentech. The remaining authors declare no competing interests.

Figures

Fig. 1 |
Fig. 1 |. Overview of long reads in metagenomics.
a, Differences between short-read and long-read technologies. Long reads have some advantages over short-read technologies. They can generate less fragmented genome assemblies, lower-level (species/strain) taxonomic characterization, DNA/RNA methylation pattern identification, large SV detection and highly portable sequencers. In contrast, short-read technologies still present overall cheaper sequencing costs and lower DNA input requirements due to the amplification step in library preparation. Image credit: Oxford Nanopore Technologies plc. b, The growth of long-read-related submission to the Sequence Read Archive (SRA) in recent years. Long-read platforms (ONT and PacBio SMRT) are being more widely used each year. The plot represents the accumulated number of data submissions related to each tag to the SRA each year.
Fig. 2 |
Fig. 2 |. A generalized decision tree for metagenomic studies.
When embarking on a metagenomic study, one of the first decisions a researcher must make is whether to use a targeted or an untargeted approach. A targeted approach involves sequencing a specific organism or group of organisms, often requiring enriching the target organism’s genetic material from the microbiome sample using tiling amplicon panels, adaptive sampling or capture panels and probe designs. In contrast, an untargeted approach involves sequencing the entire population without prior selection. Another option to untargeted metagenomics is the 16S/18S rRNA approach, which involves sequencing the amplicons of a set of marker genes using primers specific to conserved regions of these genes to sequence the maximum number of organisms possible. The choice of approach considerably affects the available analyses and hypotheses that can be tested in each experiment. In this Review, we describe three main metagenomic analysis pipelines: mapping, de novo assembly, and taxonomic characterization. Each pipeline is more appropriate for different studies based on their objectives. Finally, we discuss post-sequencing steps that can be taken based on the different designs proposed. Dashed line arrows represent indirect processes that require other intermediate steps.
Fig. 3 |
Fig. 3 |. Graph representation of a metagenome assembly.
Data were generated using HiFi reads and hifiasm-meta for long-read-only assembly. The figures represent all the contigs arranged by decreasing length, with each color representing a single contig. The sample is a commercially available pooled human gut reference (ZymoBIOMICS D6323). The dataset was generated with four SMRT Cells (8 M) on the PacBio Sequel IIe system, which yielded 11.9 million HiFi reads and 88.3 Gb of total data. There are 56 large circular contigs visible in the graph, ranging from 1.5 Mb to 6 Mb in size, along with numerous circular plasmids.

References

    1. Edwards RA et al. Using pyrosequencing to shed light on deep mine microbial ecology. BMC Genomics 7, 57 (2006). - PMC - PubMed
    1. Tamburini FB et al. Short- and long-read metagenomics of urban and rural South African gut microbiomes reveal a transitional composition and undescribed taxa. Nat. Commun. 13, 926 (2022). - PMC - PubMed
    1. van Almsick V, Schuler F, Mellmann A & Schwierzeck V The use of long-read sequencing technologies in infection control: horizontal transfer of a blaCTX-M-27 containing lncFII plasmid in a patient screening sample. Microorganisms 10, 491 (2022). - PMC - PubMed
    1. Sedlazeck FJ et al. Accurate detection of complex structural variations using single-molecule sequencing. Nat. Methods 15, 461–468 (2018). - PMC - PubMed
    1. Sedlazeck FJ, Lee H, Darby CA & Schatz MC Piercing the dark matter: bioinformatics of long-range sequencing and mapping. Nat. Rev. Genet. 19, 329–346 (2018). - PubMed

LinkOut - more resources