Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation

CAMP: a modular metagenomics analysis system for integrated multistep data exploration

Lauren Mak et al. NAR Genom Bioinform. .

Abstract

Computational analysis of large-scale metagenomics sequencing datasets provides valuable isolate-level taxonomic and functional insights from complex microbial communities. However, the ever-expanding ecosystem of metagenomics-specific methods and file formats makes designing scalable workflows and seamlessly exploring output data increasingly challenging. Although one-click bioinformatics pipelines can help organize these tools into workflows, they face compatibility and maintainability challenges that can prevent replication. To address the gap in easily extensible yet robustly distributable metagenomics workflows, we have developed the Core Analysis Modular Pipeline (CAMP), a module-based metagenomics analysis system written in Snakemake, with a standardized module and directory architecture. Each module can run independently or in sequence to produce target data formats (e.g. short-read preprocessing alone or followed by de novo assembly), and provides output summary statistics reports and Jupyter notebook-based visualizations. We applied CAMP to a set of 10 metagenomics samples, demonstrating how a modular analysis system with built-in data visualization facilitates rich seamless communication between outputs from different analytical purposes. The CAMP ecosystem (module template and analysis modules) can be found at https://github.com/Meta-CAMP.

PubMed Disclaimer

Conflict of interest statement

None declared.

Figures

Figure 1.
Figure 1.
An overview of the available metagenomics analysis modules in the CAMP. All modules share the same internal architecture, but wrap a different set of algorithms (shown to the left of each box) customized to its particular analysis goals. Modules that are typically the beginning of analysis projects contain light red boxes, modules that are typically intermediate steps contain medium red boxes, and modules that are typically terminal analysis steps contain dark red boxes. Typical first inputs are indicated by the boxed DNA symbols, while terminal outputs that report some form of quality metric or taxonomic classification information are indicated by the boxed graphic bacteria. Functional profiling outputs are indicated with a boxed wrench and screwdriver. Modules that contain built-in dataviz notebooks contain a bar graph symbol, and modules that require database downloads contain a cylinder.
Figure 2.
Figure 2.
Quality control and preprocessing of short-read sequencing data. Counts of (A) reads and (B) bases retained after each preprocessing step across all samples. The steps include low-quality base trimming, adapter removal, host genome removal, and error correction.
Figure 3.
Figure 3.
De novo assembly sizes generally correlate with short-read sequencing dataset sizes. (A) The number of contigs and (B) number of sequencing bases in each sample’s assembly, as well as the (C) mean contig size.
Figure 4.
Figure 4.
Most MAG binning algorithms infer a consistent number of MAGs, with the exceptions of samples 4, and 7–9 and VAMB. (A) Number of MAGs inferred by each binning algorithm across samples. (B) The numbers of contigs per. (C) Total MAG size inferred for each sample and comparison across binners (bars indicate standard deviation).
Figure 5.
Figure 5.
Quality assessment metrics of DAS Tool-inferred MAGs across all samples. The median of plot’s metric across all of the MAGs inferred from that sample is indicated with a line across the box. (A) Completeness, (B) contamination, (H) GC, (I) N50, and (J) MAG size were reported by CheckM2. (C) Clade separation score (CSS) was reported by gunc. (D) Strain heterogeneity and (E) relative abundance were reported by and calculated from CheckM1 respectively. (F) MAG coverage by the GTDB-Tk classified reference genome and (G) ANI to that genome were reported by dnadiff. Metrics (K–Q) were reported by QUAST comparing each MAG to the GTDB-Tk classified reference genome. (T) was calculated from the MAG’s completeness, contamination, and N50.
Figure 6.
Figure 6.
Reference-free metrics are sometimes correlated with reference-based metrics. (A) Completeness is highly correlated with QUAST-estimated genome fraction (the fraction of the reference genome that the MAG aligned to). (B, C) Contamination and CSS do not correlate with the proportion of unaligned sequence material. (D) Strain heterogeneity is mildly inversely correlated with the proportion of unaligned material.
Figure 7.
Figure 7.
Each classifier detects different numbers of taxa across all ranks. (A) Species. (B) Genus. (C) Family. (D) Order. (E) Class. (F) Phylum.
Figure 8.
Figure 8.
Taxonomic profiles are classifier-specific, not sample-specific, at the taxonomic rank of species but do not present any discernible patterns at the rank of phylum. Bray–Curtis and Jaccard distances between each replicate’s species (A, C) and phylum (C, D) profiles.
Figure 9.
Figure 9.
There are three phyla present in a majority of all of the 10 samples: Proteobacteria, Actinobacteria, and Firmicutes.
Figure 10.
Figure 10.
There is more overlap between the virus and/or phage profiles of different samples than there are bacteria.
Figure 11.
Figure 11.
(A) Metabolism, (B) information and storage, (C) cellular process and signaling, and (D) poorly characterized COGs annotated in each sample. The x-axis refers to the proportion of sequencing reads aligned to genes from a COG.

Update of

References

    1. Almeida A, Mitchell AL, Boland M et al. A new genomic blueprint of the human gut microbiota. Nature. 2019;568:499–504. 10.1038/s41586-019-0965-1. - DOI - PMC - PubMed
    1. Danko D, Bezdan D, Afshin EE et al. A global metagenomic map of urban microbiomes and antimicrobial resistance. Cell. 2021;184:3376–93. 10.1016/j.cell.2021.05.002. - DOI - PMC - PubMed
    1. Kadosh E, Snir-Alkalay I, Venkatachalam A et al. The gut microbiome switches mutant p53 from tumour-suppressive to oncogenic. Nature. 2020;586:133–8. 10.1038/s41586-020-2541-0. - DOI - PMC - PubMed
    1. Brito IL, Gurry T, Zhao S et al. Transmission of human-associated microbiota along family and social networks. Nat Microbiol. 2019;4:964–71. 10.1101/540252. - DOI - PMC - PubMed
    1. Sierra MA, Ryon KA, Tierney BT et al. Microbiome and metagenomic analysis of Lake Hillier Australia reveals pigment-rich polyextremophiles and wide-ranging metabolic adaptations. Environ Microbiol. 2022;17:60. 10.1186/s40793-022-00455-9. - DOI - PMC - PubMed

LinkOut - more resources