Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 May 21;19(1):175.
doi: 10.1186/s12859-018-2189-z.

SAMSA2: a standalone metatranscriptome analysis pipeline

Affiliations

SAMSA2: a standalone metatranscriptome analysis pipeline

Samuel T Westreich et al. BMC Bioinformatics. .

Abstract

Background: Complex microbial communities are an area of growing interest in biology. Metatranscriptomics allows researchers to quantify microbial gene expression in an environmental sample via high-throughput sequencing. Metatranscriptomic experiments are computationally intensive because the experiments generate a large volume of sequence data and each sequence must be compared with reference sequences from thousands of organisms.

Results: SAMSA2 is an upgrade to the original Simple Annotation of Metatranscriptomes by Sequence Analysis (SAMSA) pipeline that has been redesigned for standalone use on a supercomputing cluster. SAMSA2 is faster due to the use of the DIAMOND aligner, and more flexible and reproducible because it uses local databases. SAMSA2 is available with detailed documentation, and example input and output files along with examples of master scripts for full pipeline execution.

Conclusions: SAMSA2 is a rapid and efficient metatranscriptome pipeline for analyzing large RNA-seq datasets in a supercomputing cluster environment. SAMSA2 provides simplified output that can be examined directly or used for further analyses, and its reference databases may be upgraded, altered or customized to fit the needs of any experiment.

Keywords: Annotation; Bacteria; Bioinformatics; Cluster; Functions; GALAXY; Metagenomics; Metatranscriptome; Metatranscriptomics; Microbiome; Open access; Pipeline; RNA-seq; SAMSA; Software; Tool.

PubMed Disclaimer

Conflict of interest statement

Ethics approval and consent to participate

Not applicable

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures

Fig. 1
Fig. 1
The SAMSA2 analysis pipeline. Starting sequence reads are merged, cleaned, and filtered to remove ribosomal RNA (rRNA) sequences. At the annotation step, DIAMOND can be used to incorporate any custom database as an annotation reference. Results are condensed and analyzed using custom Python scripting, and saved as standard data tables that can be imported into R to generate figures or for statistical comparison
Fig. 2
Fig. 2
a PCA and b heatmaps generated by SAMSA2 visualization scripts. Comparisons can be made using either organism or functional annotation results, or based on any other incorporated database. Both plots show how similar whole metatranscriptomes are to each other. Greater similarity is associated with a closer dots in the PCA plot or b darker blue color in the heatmap
Fig. 3
Fig. 3
Example stacked bar plot. SAMSA2’s default stacked bar graph shows both relative (top) and absolute (bottom) transcript counts per genus with samples grouped according to control or experimental metadata designations
Fig. 4
Fig. 4
Example SEED Subsystems annotation pie charts at hierarchy level 1. Pie charts or other figures can be generated for every level of SEED Subsystems hierarchy
Fig. 5
Fig. 5
Benchmarking of SAMSA2 for resource use. a DIAMOND annotation time increases linearly as more input sequences are added, allowing the estimation of total annotation time. For this test, all files ran with 30 CPUs, each with 2 GB RAM. b Annotation speed relative to allocated memory: Higher RAM allocation allows DIAMOND to hold more of the reference databases in memory, speeding up pipeline annotation up to the point where the database is fully in memory; all files in this test contained 50,000 sequences each

References

    1. Davids M, Hugenholtz F, Martins dos Santos V, Smidt H, Kleerebezem M, Schaap PJ. Functional profiling of unfamiliar microbial communities using a validated de novo assembly metatranscriptome pipeline. PLoS One. 2016;11(1):e0146423. doi: 10.1371/journal.pone.0146423. - DOI - PMC - PubMed
    1. Leimena MM, Ramiro-Garcia J, Davids M, van den Bogert B, Smidt H, Smid EJ, Boekhorst J, Zoetendal EG, Schaap PJ, Kleerebezem M. A comprehensive metatranscriptome analysis pipeline and its validation using human small intestine microbiota datasets. BMC Genomics. 2013;14:530. doi: 10.1186/1471-2164-14-530. - DOI - PMC - PubMed
    1. Meyer F, Paarmann D, D'Souza M, Olson R, Glass EM, Kubal M, Paczian T, Rodriguez A, Stevens R, Wilke A, et al. The metagenomics RAST server - a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinformatics. 2008;9:386. doi: 10.1186/1471-2105-9-386. - DOI - PMC - PubMed
    1. Ni Y, Li J, Panagiotou G. COMAN: a web server for comprehensive metatranscriptomics analysis. BMC Genomics. 2016;17(1):622. doi: 10.1186/s12864-016-2964-z. - DOI - PMC - PubMed
    1. Martinez X, Pozuelo M, Pascal V, Campos D, Gut I, Gut M, Azpiroz F, Guarner F, Manichanh C. MetaTrans: an open-source pipeline for metatranscriptomics. Sci Rep. 2016;6:26447. doi: 10.1038/srep26447. - DOI - PMC - PubMed

Publication types

MeSH terms