Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Feb 23;6(1):e01154-20.
doi: 10.1128/mSystems.01154-20.

Data Analysis Strategies for Microbiome Studies in Human Populations-a Systematic Review of Current Practice

Affiliations

Data Analysis Strategies for Microbiome Studies in Human Populations-a Systematic Review of Current Practice

Sven Kleine Bardenhorst et al. mSystems. .

Abstract

Reproducibility is a major issue in microbiome studies, which is partly caused by missing consensus about data analysis strategies. The complex nature of microbiome data, which are high-dimensional, zero-inflated, and compositional, makes them challenging to analyze, as they often violate assumptions of classic statistical methods. With advances in human microbiome research, research questions and study designs increase in complexity so that more sophisticated data analysis concepts are applied. To improve current practice of the analysis of microbiome studies, it is important to understand what kind of research questions are asked and which tools are used to answer these questions. We conducted a systematic literature review considering all publications focusing on the analysis of human microbiome data from June 2018 to June 2019. Of 1,444 studies screened, 419 fulfilled the inclusion criteria. Information about research questions, study designs, and analysis strategies were extracted. The results confirmed the expected shift to more advanced research questions, as one-third of the studies analyzed clustered data. Although heterogeneity in the methods used was found at any stage of the analysis process, it was largest for differential abundance testing. Especially if the underlying data structure was clustered, we identified a lack of use of methods that appropriately addressed the underlying data structure while taking into account additional dependencies in the data. Our results confirm considerable heterogeneity in analysis strategies among microbiome studies; increasingly complex research questions require better guidance for analysis strategies.IMPORTANCE The human microbiome has emerged as an important factor in the development of health and disease. Growing interest in this topic has led to an increasing number of studies investigating the human microbiome using high-throughput sequencing methods. However, the development of suitable analytical methods for analyzing microbiome data has not kept pace with the rapid progression in the field. It is crucial to understand current practice to identify the scope for development. Our results highlight the need for an extensive evaluation of the strengths and shortcomings of existing methods in order to guide the choice of proper analysis strategies. We have identified where new methods could be designed to address more advanced research questions while taking into account the complex structure of the data.

Keywords: 16S rRNA; analysis strategies; microbiome; shotgun metagenomics sequencing.

PubMed Disclaimer

Figures

FIG 1
FIG 1
Sample sizes and balances of study groups. (A) Distribution of sample size stratified by study group. n refers to the number of studies with two study groups. (B) Balance of study groups stratified by research objective. The vertical axis indicates the balance in sample size between study groups, with one representing equal group sizes. Points outside the dashed lines indicate studies in which one study group is at least twice as large as the second study group. Note, only studies with two study groups are presented here. (C) Distribution of sample sizes for studies using only one study group.
FIG 2
FIG 2
Upset plot of most frequently applied investigated combinations of taxonomic levels.
FIG 3
FIG 3
Upset plot of most frequently applied sets of alpha diversity measures.
FIG 4
FIG 4
Analysis of most frequently used alpha diversity measures. (A) Proportion of studies testing for differences in the respective indices between groups by parametric methods, nonparametric methods. or both. (B) Proportion of studies that used methods designed for clustered data, not designed for clustered data, or both. A total of 100.0% refers to all studies that analyzed clustered data. Richness refers to observed species richness.
FIG 5
FIG 5
Upset plot of most frequently applied sets of beta diversity measures.
FIG 6
FIG 6
Bar chart of methods used for differential abundance testing grouped by category. Methods highlighted in dark gray model the microbiome as the independent variable. Methods highlighted in light gray model the microbiome as dependent variable. Note that to improve interpretability of the plot, the bar for LEfSe (n = 132) was truncated to fit into the scale.
FIG 7
FIG 7
Flowchart of literature review and data extraction process.

References

    1. Duvallet C, Gibbons SM, Gurry T, Irizarry RA, Alm EJ. 2017. Meta-analysis of gut microbiome studies identifies disease-specific and shared responses. Nat Commun 8:1784. doi:10.1038/s41467-017-01973-8. - DOI - PMC - PubMed
    1. Cho I, Blaser MJ. 2012. The human microbiome: at the interface of health and disease. Nat Rev Genet 13:260–270. doi:10.1038/nrg3182. - DOI - PMC - PubMed
    1. Wade WG. 2013. The oral microbiome in health and disease. Pharmacol Res 69:137–143. doi:10.1016/j.phrs.2012.11.006. - DOI - PubMed
    1. Boertien JM, Pereira PAB, Aho VTE, Scheperjans F. 2019. Increasing comparability and utility of gut microbiome studies in Parkinson’s disease: a systematic review. J Parkinsons Dis 9:S297–S312. doi:10.3233/JPD-191711. - DOI - PMC - PubMed
    1. Sczyrba A, Hofmann P, Belmann P, Koslicki D, Janssen S, Dröge J, Gregor I, Majda S, Fiedler J, Dahms E, Bremges A, Fritz A, Garrido-Oter R, Jørgensen TS, Shapiro N, Blood PD, Gurevich A, Bai Y, Turaev D, DeMaere MZ, Chikhi R, Nagarajan N, Quince C, Meyer F, Balvočiūtė M, Hansen LH, Sørensen SJ, Chia BKH, Denis B, Froula JL, Wang Z, Egan R, Kang DD, Cook JJ, Deltel C, Beckstette M, Lemaitre C, Peterlongo P, Rizk G, Lavenier D, Wu YW, Singer SW, Jain C, Strous M, Klingenberg H, Meinicke P, Barton MD, Lingner T, Lin HH, Liao YC, Silva GGZ, et al.. 2017. Critical assessment of metagenome interpretation—a benchmark of metagenomics software. Nat Methods 14:1063–1071. doi:10.1038/nmeth.4458. - DOI - PMC - PubMed

LinkOut - more resources