. 2012;7(12):e52078.

doi: 10.1371/journal.pone.0052078. Epub 2012 Dec 20.

Hypothesis testing and power calculations for taxonomic-based human microbiome data

Patricio S La Rosa¹, J Paul Brooks, Elena Deych, Edward L Boone, David J Edwards, Qin Wang, Erica Sodergren, George Weinstock, William D Shannon

Affiliations

PMID: 23284876
PMCID: PMC3527355
DOI: 10.1371/journal.pone.0052078

Hypothesis testing and power calculations for taxonomic-based human microbiome data

Patricio S La Rosa et al. PLoS One. 2012.

. 2012;7(12):e52078.

doi: 10.1371/journal.pone.0052078. Epub 2012 Dec 20.

Authors

Patricio S La Rosa¹, J Paul Brooks, Elena Deych, Edward L Boone, David J Edwards, Qin Wang, Erica Sodergren, George Weinstock, William D Shannon

Affiliation

¹ Division of General Medical Sciences, Department of Medicine, Washington University School of Medicine, St. Louis, MO, USA.

PMID: 23284876
PMCID: PMC3527355
DOI: 10.1371/journal.pone.0052078

Abstract

This paper presents new biostatistical methods for the analysis of microbiome data based on a fully parametric approach using all the data. The Dirichlet-multinomial distribution allows the analyst to calculate power and sample sizes for experimental design, perform tests of hypotheses (e.g., compare microbiomes across groups), and to estimate parameters describing microbiome properties. The use of a fully parametric model for these data has the benefit over alternative non-parametric approaches such as bootstrapping and permutation testing, in that this model is able to retain more information contained in the data. This paper details the statistical approaches for several tests of hypothesis and power/sample size calculations, and applies them for illustration to taxonomic abundance distribution and rank abundance distribution data using HMP Jumpstart data on 24 subjects for saliva, subgingival, and supragingival samples. Software for running these analyses is available.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

**Figure 1. Description of Dirichlet-multinomial parameters.**
Intuitive description of the meaning of the overdispersion parameter . The four plots show the taxa frequencies for each of the five hypothetical samples (dashed lines) with 12 taxa in each sample, and the corresponding weighted average across the five samples given by the vector of taxa frequencies (solid line). The plots on the left show the taxa frequencies of samples drawn from a Multinomial distribution and the plots on the right show taxa frequencies of five samples drawn from a Dirichlet Multinomial. The top row of plots is for samples with a smaller number of sequence reads, while the bottom row of plots is for samples with a larger number of sequence reads. As the number of reads increases for the multinomial distribution increases each samples taxa frequencies converge onto the mean, while for the Dirichlet-multinomial an increased number of reads is still associated with the same variability between the individual samples.

formula image — **Figure 1. Description of Dirichlet-multinomial parameters.**
Intuitive description of the meaning of the overdispersion parameter . The four plots show the taxa frequencies for each of the five hypothetical samples (dashed lines) with 12 taxa in each sample, and the corresponding weighted average across the five samples given by the vector of taxa frequencies (solid line). The plots on the left show the taxa frequencies of samples drawn from a Multinomial distribution and the plots on the right show taxa frequencies of five samples drawn from a Dirichlet Multinomial. The top row of plots is for samples with a smaller number of sequence reads, while the bottom row of plots is for samples with a larger number of sequence reads. As the number of reads increases for the multinomial distribution increases each samples taxa frequencies converge onto the mean, while for the Dirichlet-multinomial an increased number of reads is still associated with the same variability between the individual samples.

**Figure 2. Definition of effect size.**
Illustration of a small and a large effect size when comparing two groups.

**Figure 3. Comparison of two metagenomic groups using a taxa composition data analysis approach.**
Taxa frequency means at Class level obtained from subgingival plaque samples (blue curve) and from supragingival plaques samples (red curve): a) The mean of all taxa frequencies found in each group, b) The mean of taxa frequencies whose weighted average across both groups is larger than 1%. The remaining taxa are pooled into an additional taxon labeled as ‘Pooled taxa’.

**Figure 4. Comparison of three metagenomic groups using a taxa composition data analysis approach.**
Taxa frequencies at class level obtained from saliva (black line), subgingival plaque (blue line), and from supragingival plaques samples (red line): a) The mean of all taxa frequencies found in each group, b) the mean of taxa frequencies whose weighted average across both groups is larger than 1%. The remaining taxa are pooled into an additional taxon labeled as ‘Pooled taxa’.

**Figure 5. Comparison of two metagenomic groups using rank abundance distribution data.**
Ranked taxa frequencies mean at class level obtained from subgingival plaque samples (blue curve) and from supragingival plaques samples (red curve): a) The means of all ranked taxa frequencies found in each group; b) The mean of ranked taxa frequencies whose weighted average across both groups is larger than 1%. The remaining taxa are pooled into an additional taxon labeled as ‘Pooled taxa’.

**Figure 6. Comparison of three metagenomic groups using rank abundance distribution data.**
Ranked taxa frequencies mean at class level obtained from subgingival plaque samples (blue curve) and from supragingival plaques samples (red curve): a) The means of all ranked taxa frequencies found in each group; b) The mean of ranked taxa frequencies whose weighted average across both groups is larger than 1%. The remaining taxa are pooled into an additional taxon labeled as ‘Pooled taxa’.

See this image and copyright information in PMC

References

1. Peterson J, Garges S, Giovanni M, McInnes P, Wang L, et al. (2009) The NIH Human Microbiome Project. Genome Research 19: 2317–2323. - PMC - PubMed
1. Turnbaugh PJ, Ley RE, Hamady M, Fraser-Liggett CM, Knight R, et al. (2007) The human microbiome project. Nature 449: 804–810. - PMC - PubMed
1. Wooley JC, Godzik A, Friedberg I (2010) A Primer on Metagenomics. PLoS Comput Biol 6: e1000667. - PMC - PubMed
1. Singleton DR, Furlong MA, Rathbun SL, Whitman WB (2001) Quantitative Comparisons of 16S rRNA Gene Sequence Libraries from Environmental Samples. Appl Environ Microbiol 67: 4374–4376. - PMC - PubMed
1. Martin AP (2002) Phylogenetic Approaches for Describing and Comparing the Diversity of Microbial Communities. Appl Environ Microbiol 68: 3673–3682. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Medical
- ClinicalTrials.gov

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Hypothesis testing and power calculations for taxonomic-based human microbiome data

Affiliation

Hypothesis testing and power calculations for taxonomic-based human microbiome data

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical