Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Oct 25;14(10):713-725.
doi: 10.1093/procel/pwad024.

The best practice for microbiome analysis using R

Affiliations

The best practice for microbiome analysis using R

Tao Wen et al. Protein Cell. .

Abstract

With the gradual maturity of sequencing technology, many microbiome studies have published, driving the emergence and advance of related analysis tools. R language is the widely used platform for microbiome data analysis for powerful functions. However, tens of thousands of R packages and numerous similar analysis tools have brought major challenges for many researchers to explore microbiome data. How to choose suitable, efficient, convenient, and easy-to-learn tools from the numerous R packages has become a problem for many microbiome researchers. We have organized 324 common R packages for microbiome analysis and classified them according to application categories (diversity, difference, biomarker, correlation and network, functional prediction, and others), which could help researchers quickly find relevant R packages for microbiome analysis. Furthermore, we systematically sorted the integrated R packages (phyloseq, microbiome, MicrobiomeAnalystR, Animalcules, microeco, and amplicon) for microbiome analysis, and summarized the advantages and limitations, which will help researchers choose the appropriate tools. Finally, we thoroughly reviewed the R packages for microbiome analysis, summarized most of the common analysis content in the microbiome, and formed the most suitable pipeline for microbiome analysis. This paper is accompanied by hundreds of examples with 10,000 lines codes in GitHub, which can help beginners to learn, also help analysts compare and test different tools. This paper systematically sorts the application of R in microbiome, providing an important theoretical basis and practical reference for the development of better microbiome tools in the future. All the code is available at GitHub github.com/taowenmicro/EasyMicrobiomeR.

Keywords: R package; amplicon; data analysis; metagenome; microbiome; visualization.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests related to the content of this paper.

Figures

Graphical Abstract
Graphical Abstract
Figure 1.
Figure 1.
Microbial community data analysis workflow and related R packages. (A) Overview of microbial community data analysis workflow. Core files are feature table (OTU), Taxonomy, sample metadata (Metadata), phylogenetic tree (Tree), and representative sequences (Rep.fa). (B) Detail of microbial community analysis workflow. First, the raw data can be processed by using USEARCH/VSEARCH, QIIME 2, DADA2 packages. Then, the important files are saved and used for downstream analysis in R language and RStudio software. Many microbial analysis methods rely on numerous R packages developed with R language. The font size in the word cloud represents the number of citations of R packages. (C) Commonly used R packages for data-cleaning/manipulation and visualization. (D) Classification of R packages for six categories in microbial community analysis.
Figure 2.
Figure 2.
Introduction to the functions of integrated microbial analysis R packages. Microbial community analysis can be divided into diversity analysis, difference analysis, biomarker identification, correlation and network analysis, functional prediction, and other microbial community analysis (community building/construction process, association analysis with other indicators).
Figure 3.
Figure 3.
Typical results of integrated microbial community analysis R packages and comparison of similar results. Group the analysis results of multiple integrated R packages according to the major categories of microbial community analysis functions. Each main branch in the tree diagram represents a type of microbial community analysis, and there are a total of 10 main branches: feature diversity analysis including (i) alpha diversity analysis, (ii) beta diversity analysis, (iii) features (community taxonomic or functional) composition analysis, (iv) evolutionary or taxonomic tree analysis; (v) difference analysis; (vi) biomarker identification; (vii) correlation and network analysis; (viii) functional prediction; (ix) community building/construction process analysis; (x) other analysis, such as association analysis with other indicators. Each leaf (circle) represents a style of the result displayed in the analysis, and the circle number around the outside of leaf represents the package number of the integrated R package that the analysis result comes from.
Figure 4.
Figure 4.
Examples of the best practice results of microbial community analysis in R language. The selected results include rarefaction curve (A), principal coordinate analysis scatter plot (B), Venn network graph (C), evolutionary tree (D), LEfSe cladogram (E), difference analysis extended error bar plot in STAMP style (F), difference analysis Manhattan plot (G), difference analysis multi-group volcano plot (H), biomarker selection ring-column chart (I), network graph (J), correlation connection combination graph (K), source tracing analysis pie chart (L).

References

    1. Amir A, McDonald D, Navas-Molina JAet al. . Deblur rapidly resolves single-nucleotide community sequence patterns. MSystems 2017;2:e00191–e00116. - PMC - PubMed
    1. Aßhauer KP, Wemheuer B, Daniel Ret al. . Tax4Fun: predicting functional profiles from metagenomic 16S rRNA data. Bioinformatics 2015;31:2882–2884. - PMC - PubMed
    1. Barnett DJ, Arts IC, Penders J.. microViz: an R package for microbiome data visualization and statistics. J Open Source Softw 2021;6:3201.
    1. Bolyen E, Rideout JR, Dillon MRet al. . Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nat Biotechnol 2019;37:852–857. - PMC - PubMed
    1. Callahan BJ, McMurdie PJ, Rosen MJet al. . DADA2: high-resolution sample inference from Illumina amplicon data. Nat Methods 2016;13:581–583. - PMC - PubMed

Publication types