Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Feb 1;22(1):41.
doi: 10.1186/s12859-021-03967-2.

tidyMicro: a pipeline for microbiome data analysis and visualization using the tidyverse in R

Affiliations

tidyMicro: a pipeline for microbiome data analysis and visualization using the tidyverse in R

Charlie M Carpenter et al. BMC Bioinformatics. .

Abstract

Background: The drive to understand how microbial communities interact with their environments has inspired innovations across many fields. The data generated from sequence-based analyses of microbial communities typically are of high dimensionality and can involve multiple data tables consisting of taxonomic or functional gene/pathway counts. Merging multiple high dimensional tables with study-related metadata can be challenging. Existing microbiome pipelines available in R have created their own data structures to manage this problem. However, these data structures may be unfamiliar to analysts new to microbiome data or R and do not allow for deviations from internal workflows. Existing analysis tools also focus primarily on community-level analyses and exploratory visualizations, as opposed to analyses of individual taxa.

Results: We developed the R package "tidyMicro" to serve as a more complete microbiome analysis pipeline. This open source software provides all of the essential tools available in other popular packages (e.g., management of sequence count tables, standard exploratory visualizations, and diversity inference tools) supplemented with multiple options for regression modelling (e.g., negative binomial, beta binomial, and/or rank based testing) and novel visualizations to improve interpretability (e.g., Rocky Mountain plots, longitudinal ordination plots). This comprehensive pipeline for microbiome analysis also maintains data structures familiar to R users to improve analysts' control over workflow. A complete vignette is provided to aid new users in analysis workflow.

Conclusions: tidyMicro provides a reliable alternative to popular microbiome analysis packages in R. We provide standard tools as well as novel extensions on standard analyses to improve interpretability results while maintaining object malleability to encourage open source collaboration. The simple examples and full workflow from the package are reproducible and applicable to external data sets.

Keywords: Microbiome; Open source; Pipeline; R; Tidyverse; Visualization.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Flowchart of the tidyMicro pipeline. The supplied OTU table(s) must be in the standard format output by QIIME with column names that match a sequencing library names column in the clinical data. The initial step is merging all OTU tables and clinical data using the tidy_micro function (a). From here, the tidyMicro set can be used for exploratory visuals (b), community level analyses (c), and taxa level analyses (d, e)
Fig. 2
Fig. 2
Structure of a tidyMicro data set. A tidyMicro data set is a data frame with a hierarchical structure where each OTU table creates a block containing taxa blocks for each taxa within the table. Clinical data is repeated within each taxa block. This structure allows users to easily create custom extensions
Fig. 3
Fig. 3
Example exploratory visualizations. a Principle component plot calculated from centered log ratio transformed genus level taxa counts, colored by MSRA infection. b Principle coordinate plot calculated from genus level Bray–Curtis beta diversity with normal ellipses, colored by MSRA infection. c Stacked bar charts of average genus level taxa abundances by MSRA infection. d Heatmap of Spearman correlations between centered log ratio transformed genus level taxa counts and subjects’ age
Fig. 4
Fig. 4
Rocky mountain plot. Spearman correlations between centered log ratio transformed genus level taxa counts and subjects’ age. Correlations are colored by phylum and taxa with correlations above 0.3 in magnitude are labeled
Fig. 5
Fig. 5
Three mode principle component (a) and three mode principle coordinate (b) plots. Plots created from sequences on the 7th, 14th, and 21st day of life of 15 infants collapsing over time component. Colors represent the three different time points. Principle coordinate plot created from Bray–Curtis beta diversity
Fig. 6
Fig. 6
Examples of common taxa abundance distributions. Strong right skews (a), “U” shaped distributions (b), and sparsity (c) are all common patterns
Fig. 7
Fig. 7
Rocky mountain plot made from negative binomial models. Relationships between MRSA infection and genus level taxa abundance after controlling for smoking status were estimated using negative binomial models using log(sequencing depth) as an offset. All models were fit using the glm.nb function in the MASS package. False discovery rate (FDR) adjusted p-values of estimated β coefficients are log transformed, and the magnitude is plotted along the y-axis. For positive β estimates, the log( FDR p-value) is multiplied by -1, so the direction along the y-axis corresponds to the direction of the estimated relationship
Fig. 8
Fig. 8
Parametric stacked bar charts. Parametric stacked bar charts back transform β parameter estimates to get estimated taxa abundance. (a) Parametric stacked bar charts from estimated relationships between MRSA infection and genus level taxa abundances after controlling for smoking status. (b) Parametric stacked bar charts from estimated relationships between genus level taxa abundance and subject age by MRSA infection. All models from both (a) and (b) used log(sequencing depth) as an offset

Similar articles

Cited by

References

    1. Harris JK, Wagner BD, Zemanick ET, Robertson CE, Stevens MJ, Heltshe SL, et al. Changes in airway microbiome and inflammation with ivacaftor treatment in patients with cystic fibrosis and the G551D mutation. Ann Am ThoracSoc. 2019;17(2):212–220. doi: 10.1513/AnnalsATS.201907-493OC. - DOI - PMC - PubMed
    1. Stanislawski MA, Dabelea D, Lange LA, Wagner BD, Lozupone CA. Gut microbiota phenotypes of obesity. NPJ Biofilms Microbiomes. 2019;5:18. doi: 10.1038/s41522-019-0091-8. - DOI - PMC - PubMed
    1. Frank DN, Manigart O, Leroy V, Meda N, Valéa D, Zhang W, et al. Altered vaginal microbiota are associated with perinatal mother-to-child transmission of HIV in African women from Burkina Faso. J Acquir Immune DeficSyndr. 2012;60(3):299–306. doi: 10.1097/QAI.0b013e31824e4bdb. - DOI - PMC - PubMed
    1. Kelly CJ, Colgan SP, Frank DN. Of microbes and meals: the health consequences of dietary endotoxemia. NutrClinPract. 2012;27(2):215–225. - PMC - PubMed
    1. Frank DN, Zhu W, Sartor RB, Li E. Investigating the biological and clinical significance of human dysbioses. Trends Microbiol. 2011;19(9):427–434. doi: 10.1016/j.tim.2011.06.005. - DOI - PMC - PubMed

LinkOut - more resources