. 2014 Apr 3;10(4):e1003531.

doi: 10.1371/journal.pcbi.1003531. eCollection 2014 Apr.

Waste not, want not: why rarefying microbiome data is inadmissible

Paul J McMurdie¹, Susan Holmes¹

Affiliations

PMID: 24699258
PMCID: PMC3974642
DOI: 10.1371/journal.pcbi.1003531

Waste not, want not: why rarefying microbiome data is inadmissible

Paul J McMurdie et al. PLoS Comput Biol. 2014.

. 2014 Apr 3;10(4):e1003531.

doi: 10.1371/journal.pcbi.1003531. eCollection 2014 Apr.

Authors

Paul J McMurdie¹, Susan Holmes¹

Affiliation

¹ Statistics Department, Stanford University, Stanford, California, United States of America.

PMID: 24699258
PMCID: PMC3974642
DOI: 10.1371/journal.pcbi.1003531

Abstract

Current practice in the normalization of microbiome count data is inefficient in the statistical sense. For apparently historical reasons, the common approach is either to use simple proportions (which does not address heteroscedasticity) or to use rarefying of counts, even though both of these approaches are inappropriate for detection of differentially abundant species. Well-established statistical theory is available that simultaneously accounts for library size differences and biological variability using an appropriate mixture model. Moreover, specific implementations for DNA sequencing read count data (based on a Negative Binomial model for instance) are already available in RNA-Seq focused R packages such as edgeR and DESeq. Here we summarize the supporting statistical theory and use simulations and empirical data to demonstrate substantial improvements provided by a relevant mixture model framework over simple proportions or rarefying. We show how both proportions and rarefied counts result in a high rate of false positives in tests for species that are differentially abundant across sample classes. Regarding microbiome sample-wise clustering, we also show that the rarefying procedure often discards samples that can be accurately clustered by alternative methods. We further compare different Negative Binomial methods with a recently-described zero-inflated Gaussian mixture, implemented in a package called metagenomeSeq. We find that metagenomeSeq performs well when there is an adequate number of biological replicates, but it nevertheless tends toward a higher false positive rate. Based on these results and well-established statistical theory, we advocate that investigators avoid rarefying altogether. We have provided microbiome-specific extensions to these tools in the R package, phyloseq.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

**Figure 1. A minimal example of the effect of rarefying on statistical power.**
Hypothetical abundance data in its original (Top-Left) and rarefied (Top-Right) form, with corresponding formal test results for differentiation (Bottom).

**Figure 2. Graphical summary of the two simulation frameworks.**
Both *Simulation A* (clustering) and *Simulation B* (differential abundance) are represented. All simulations begin with real microbiome count data from a survey experiment referred to here as “the Global Patterns dataset” . Tables of integers with multiple columns represent an abundance count matrix (“OTU table”), while a single-column of integers represents a multinomial of OTU counts/proportions. In both simulation illustrations an *effect size* is explained and given an example value of 10 for easy mental computation, but its meaning is different for each simulation. Note that *effect size* is altogether different than *library size*, the latter being equivalent to both the column sums and the number of reads per sample. A grey highlight indicates count values for which an effect has been applied in *Simulation B*. Protocol S1 includes the complete source code used to compute the example values shown here, as well as the full simulations discussed below.

**Figure 3. Examples of overdispersion in microbiome data.**
Common-Scale Variance versus Mean for Microbiome Data. Each point in each panel represents a different OTU's mean/variance estimate for a biological replicate and study. The data in this figure come from the *Global Patterns* survey and the *Long-Term Dietary Patterns* study , with results from additional studies included in Protocol S1. (Right) Variance versus mean abundance for rarefied counts. (Left) Common-scale variances and common-scale means, estimated according to Equations 6 and 7 from Anders and Huber , implemented in the DESeq package (Text S1). The dashed gray line denotes the σ ² = μ case (Poisson; φ = 0). The cyan curve denotes the fitted variance estimate using DESeq , with method = ‘pooled’, sharingMode = ‘fit-only’, fitType = ‘local’.

**Figure 4. Clustering accuracy in simulated two-class mixing.**
*Partitioning around medoids* , clustering accuracy (vertical axis) that results following different normalization and distance methods. Points denote the mean values of replicates, with a vertical bar representing one standard deviation above and below. Normalization method is indicated by both shade and shape, while panel columns and panel rows indicate the distance metric and median library size (), respectively. The horizontal axis is the effect size, which in this context is the ratio of target to non-target values in the multinomials that were used to simulate microbiome counts. Each multinomial is derived from two microbiomes that have negligible overlapping OTUs (Fecal and Ocean microbiomes in the Global Patterns dataset [48]). Higher values of effect size indicate an easier clustering task. For simulation details and precise definitions of abbreviations see *Simulation A* of the Methods section.

formula image — **Figure 4. Clustering accuracy in simulated two-class mixing.**
*Partitioning around medoids* , clustering accuracy (vertical axis) that results following different normalization and distance methods. Points denote the mean values of replicates, with a vertical bar representing one standard deviation above and below. Normalization method is indicated by both shade and shape, while panel columns and panel rows indicate the distance metric and median library size (), respectively. The horizontal axis is the effect size, which in this context is the ratio of target to non-target values in the multinomials that were used to simulate microbiome counts. Each multinomial is derived from two microbiomes that have negligible overlapping OTUs (Fecal and Ocean microbiomes in the Global Patterns dataset [48]). Higher values of effect size indicate an easier clustering task. For simulation details and precise definitions of abbreviations see *Simulation A* of the Methods section.

**Figure 5. Normalization by rarefying only, dependency on library size threshold.**
Unlike the analytical methods represented in Figure 4, here rarefying is the only normalization method used, but at varying values of the minimum library size threshold, shown as library-size quantile (horizontal axis). Panel columns, panel rows, and point/line shading indicate effect size (ES), median library size (), and distance method applied after rarefying, respectively. Because discarded samples cannot be accurately clustered, the line is the maximum achievable accuracy.

**Figure 6. Performance of differential abundance detection with and without rarefying.**
Performance summarized here by the “Area Under the Curve” (AUC) metric of a Receiver Operator Curve (ROC) (vertical axis). Briefly, the AUC value varies from 0.5 (random) to 1.0 (perfect), incorporating both sensitivity and specificity. The horizontal axis indicates the effect size, shown as the actual multiplication factor applied to OTU counts in the test class to simulate a differential abundance. Each curve traces the respective normalization method's mean performance of that panel, with a vertical bar indicating a standard deviation in performance across all replicates and microbiome templates. The right-hand side of the panel rows indicates the median library size, , while the darkness of line shading indicates the number of samples per simulated experiment. Color shade and shape indicate the normalization method. See Methods section for the definitions of each normalization and testing method. For all methods, detection among multiple tests was defined using a False Discovery Rate (Benjamini-Hochberg [52]) significance threshold of 0.05.

See this image and copyright information in PMC

References

1. Shendure J, Lieberman Aiden E (2012) The expanding scope of DNA sequencing. Nature Biotechnology 30: 1084–1094. - PMC - PubMed
1. Shendure J, Ji H (2008) Next-generation DNA sequencing. Nature Biotechnology 26: 1135–1145. - PubMed
1. Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nature Methods 5: 621–628. - PubMed
1. Pace NR (1997) A molecular view of microbial diversity and the biosphere. Science 276: 734–740. - PubMed
1. Wilson KH, Wilson WJ, Radosevich JL, DeSantis TZ, Viswanathan VS, et al. (2002) High-Density Microarray of Small-Subunit Ribosomal DNA Probes. Appl Environ Microbiol 68: 2535–2541. - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions

Grants and funding

R01 GM086884/GM/NIGMS NIH HHS/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources
Medical
- ClinicalTrials.gov
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Waste not, want not: why rarefying microbiome data is inadmissible

Affiliation

Waste not, want not: why rarefying microbiome data is inadmissible

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical

Molecular Biology Databases