Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Mar 22;7(1):46.
doi: 10.1186/s40168-019-0658-x.

Sunbeam: an extensible pipeline for analyzing metagenomic sequencing experiments

Affiliations

Sunbeam: an extensible pipeline for analyzing metagenomic sequencing experiments

Erik L Clarke et al. Microbiome. .

Abstract

Background: Analysis of mixed microbial communities using metagenomic sequencing experiments requires multiple preprocessing and analytical steps to interpret the microbial and genetic composition of samples. Analytical steps include quality control, adapter trimming, host decontamination, metagenomic classification, read assembly, and alignment to reference genomes.

Results: We present a modular and user-extensible pipeline called Sunbeam that performs these steps in a consistent and reproducible fashion. It can be installed in a single step, does not require administrative access to the host computer system, and can work with most cluster computing frameworks. We also introduce Komplexity, a software tool to eliminate potentially problematic, low-complexity nucleotide sequences from metagenomic data. A unique component of the Sunbeam pipeline is an easy-to-use extension framework that enables users to add custom processing or analysis steps directly to the workflow. The pipeline and its extension framework are well documented, in routine use, and regularly updated.

Conclusions: Sunbeam provides a foundation to build more in-depth analyses and to enable comparisons in metagenomic sequencing experiments by removing problematic, low-complexity reads and standardizing post-processing and analytical steps. Sunbeam is written in Python using the Snakemake workflow management software and is freely available at github.com/sunbeam-labs/sunbeam under the GPLv3.

Keywords: Pipeline; Quality control; Shotgun metagenomic sequencing; Software; Sunbeam.

PubMed Disclaimer

Conflict of interest statement

Ethics approval and consent to participate

The study protocol was approved by The Children’s Hospital of Philadelphia Institutional Review Board. All subjects gave written, informed consent prior to participation.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures

Fig. 1
Fig. 1
Inputs, processes, and outputs for standard steps in the Sunbeam metagenomics pipeline
Fig. 2
Fig. 2
Schematics of example extension inputs and contents. a Files for extension sbx_metaspades_example, which uses MetaSPAdes to assemble reads from quality-controlled fastq.gz files. sbx_metaspades_example.rules lists procedure necessary to generate assembly results from a pair of decontaminated, quality-controlled FASTQ input files. requirements.txt lists the software requirements for the package to be installed through Conda. b Files contained within the sbx_report extension: requirements.txt lists the software requirements for the package to be installed through Conda; sbx_report.rules contains the code for the rule as above, final_report.Rmd is a R markdown script that generates and visualizes the report, example.html is an example report, and README.md provides instructions for installing and running the extension. Sunbeam inputs required for each extension are shown as colored shapes above the extensions
Fig. 3
Fig. 3
a Nonmetric multidimensional scaling plots generated using the vegan package in R [76], using MetaPhlAn2 classifications of data from Lewis et al. [61]. Each point is colored by the cluster in which it was annotated in the Lewis et al. metadata—cluster 2 (red) is the dysbiotic cluster, while cluster 1 (blue) is the healthy-like cluster. b Inverse Simpson diversity by absolute latitude calculated using the vegan package in R from the Kraken classification output of Sunbeam for Bahram et al. [63]. Points are colored by habitat. The polynomial regression line is shown in black. c Boxplots of unique Anelloviridae taxa in each sample from McCann et al. [64]. Each point corresponds to a single sample. d Heatmap from shallow shotgun analysis colored by proportional abundance. Each row corresponds to a bacterial taxon; each column represents a different reagent combination. Columns are grouped by time point, then by subject (top). All plots were generated using the ggplot2 R package [77]
Fig. 4
Fig. 4
a Comparison between Komplexity and similar software (BBMask, DUST, and RepeatMasker). The small bar plot in the lower left shows the total nucleotides masked by each tool. The central bar plot shows the number of unique nucleotides masked by every tool combination; each combination is shown by the connected dots below. Bars displaying nucleotides masked by tool combinations that include Komplexity are colored red. b Example complexity score distributions calculated by Komplexity for reads from ten stool virome samples (high microbial biomass; [15]) and ten bronchoalveolar lavage (BAL) virome samples (low-biomass, high-host; [12]) using the default parameters

References

    1. Turnbaugh PJ, Ley RE, Hamady M, Fraser-Liggett CM, Knight R, Gordon JI. The human microbiome project. Nature. 2007;449:804–810. doi: 10.1038/nature06244. - DOI - PMC - PubMed
    1. Turnbaugh PJ, Ley RE, Mahowald MA, Magrini V, Mardis ER, Gordon JI. An obesity-associated gut microbiome with increased capacity for energy harvest. Nature. 2006;444:1027–1031. doi: 10.1038/nature05414. - DOI - PubMed
    1. Muegge BD, Kuczynski J, Knights D, Clemente JC, Gonzalez A, Fontana L, et al. Diet drives convergence in gut microbiome functions across mammalian phylogeny and within humans. Science. 2011;332:970–4. - PMC - PubMed
    1. Morgan XC, Tickle TL, Sokol H, Gevers D, Devaney KL, Ward DV, et al. Dysfunction of the intestinal microbiome in inflammatory bowel disease and treatment. Genome Biol. 2012;13:R79. doi: 10.1186/gb-2012-13-9-r79. - DOI - PMC - PubMed
    1. Lee STM, Kahn SA, Delmont TO. Shaiber A, Esen özcan C, Hubert NA, et al. Tracking microbial colonization in fecal microbiota transplantation experiments via genome-resolved metagenomics. Microbiome. 2017;5:1–10. doi: 10.1186/s40168-017-0270-x. - DOI - PMC - PubMed

Publication types