Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025:5:5.
doi: 10.1038/s43586-024-00376-6. Epub 2025 Jan 23.

Analysis of metagenomic data

Affiliations

Analysis of metagenomic data

Shaopeng Liu et al. Nat Rev Methods Primers. 2025.

Abstract

Metagenomics has revolutionized our understanding of microbial communities, offering unprecedented insights into their genetic and functional diversity across Earth's diverse ecosystems. Beyond their roles as environmental constituents, microbiomes act as symbionts, profoundly influencing the health and function of their host organisms. Given the inherent complexity of these communities and the diverse environments where they reside, the components of a metagenomics study must be carefully tailored to yield accurate results that are representative of the populations of interest. This Primer article examines the methodological advancements and current practices that have shaped the field, from initial stages of sample collection and DNA extraction to the advanced bioinformatics tools employed for data analysis, with a particular focus on the profound impact of next-generation sequencing (NGS) on the scale and accuracy of metagenomics studies. We critically assess the challenges and limitations inherent in metagenomics experimentation, available technologies and computational analysis methods. Beyond technical methodologies, we explore the application of metagenomics across various domains, including human health, agriculture and environmental monitoring. Looking ahead, we advocate for the development of more robust computational frameworks and enhanced interdisciplinary collaborations. This Primer serves as a comprehensive guide for advancing the precision and applicability of metagenomic studies, positioning them to address the complexities of microbial ecology and their broader implications for human health and environmental sustainability.

PubMed Disclaimer

Conflict of interest statement

Competing interests The authors declare no competing interests.

Figures

Figure 1.
Figure 1.. A timeline of microbial discovery and the development of metagenomic analysis.
From the findings of early microbiology and advancements in sequencing technology, the field of metagenomics has grown to encompass the development of metagenomic databases, tools, and organizations dedicated to the field of microbial and metagenomic discovery.
Figure 2.
Figure 2.. Experimental protocol for metagenomics experiments.
a, Metagenomic samples, such as environmental samples from soil or water, or samples from the microbiome of organisms, are collected and either stored or processed immediately. b, DNA from the sample is extracted using physical or enzymatic lysis. c, A DNA library is constructed. Multiple samples can be sequenced together (multiplexed) by labelling samples with DNA barcodes. Bulk DNA is sequenced using whole shotgun sequencing. d, Sequencing reads undergo quality control checks and preprocessing. e, Demultiplexing separates sequences by DNA barcode and sequences are then processed using bioinformatics analyses.
Figure 3.
Figure 3.. Metagenomic functional analysis.
Metagenomic sequences are a common starting point for functional profiling and annotation. Known genes can be directly aligned to a reference for alignment-based quantification (top row). Metagenomic sequencing reads can also be employed by ab initio gene finders to train models that identify ORFs from input data followed by reads quantification (middle row). Both alignment-based and ab initio methods can be used for functional profiling, which can be further analyzed to explore biological insights (right side). Metagenome sequences can be assembled into contigs to facilitate gene discoveries by identifying novel ORFs and predicting gene function based on existing knowledge (bottom row).
Figure 4.
Figure 4.. Metagenomic taxonomic characterization using alignment-based and alignment-free methods.
In both alignment-based and alignment-free profiling approaches, a public reference database is chosen for the input of the metagenomic data. A, Alignment-free profiling tools in general experience reduced computational burden. Aa, The generation of k-mers is performed on both the chosen reference dataset and the metagenomic sample of interest. Ab, k-mers generated from the references and sample are compared to identify matching k-mers. Ac, Matching k-mers are quantified to generate a taxonomic profile. B, Alignment-based methods can be more sensitive at the cost of increased computational burden. Ba, The metagenomic sequencing reads are aligned to a prebuilt marker-gene-based reference database. Bb, Similarities between reads and each genome are calculated using either indexing, dynamic programming, or k-mer matching. Bc, Reads are then aligned to the reference database to be quantified for taxonomic profiling (Bd).

References

    1. Handelsman J, Rondon MR, Brady SF, Clardy J & Goodman RM Molecular biological access to the chemistry of unknown soil microbes: a new frontier for natural products. Chem Biol 5, R245–249 (1998). - PubMed
    1. Venter JC et al. Environmental Genome Shotgun Sequencing of the Sargasso Sea. Science 304, 66–74 (2004). - PubMed
    1. Rondon MR et al. Cloning the Soil Metagenome: a Strategy for Accessing the Genetic and Functional Diversity of Uncultured Microorganisms. Appl Environ Microbiol 66, 2541–2547 (2000). - PMC - PubMed
    1. Sunagawa S et al. Tara Oceans: towards global ocean ecosystems biology. Nat Rev Microbiol 18, 428–445 (2020). - PubMed
    1. Gevers D et al. The Human Microbiome Project: A Community Resource for the Healthy Human Microbiome. PLOS Biology 10, e1001377 (2012). - PMC - PubMed

LinkOut - more resources