Robust estimation of microbial diversity in theory and in practice
- PMID: 23407313
- PMCID: PMC3660670
- DOI: 10.1038/ismej.2013.10
Robust estimation of microbial diversity in theory and in practice
Abstract
Quantifying diversity is of central importance for the study of structure, function and evolution of microbial communities. The estimation of microbial diversity has received renewed attention with the advent of large-scale metagenomic studies. Here, we consider what the diversity observed in a sample tells us about the diversity of the community being sampled. First, we argue that one cannot reliably estimate the absolute and relative number of microbial species present in a community without making unsupported assumptions about species abundance distributions. The reason for this is that sample data do not contain information about the number of rare species in the tail of species abundance distributions. We illustrate the difficulty in comparing species richness estimates by applying Chao's estimator of species richness to a set of in silico communities: they are ranked incorrectly in the presence of large numbers of rare species. Next, we extend our analysis to a general family of diversity metrics ('Hill diversities'), and construct lower and upper estimates of diversity values consistent with the sample data. The theory generalizes Chao's estimator, which we retrieve as the lower estimate of species richness. We show that Shannon and Simpson diversity can be robustly estimated for the in silico communities. We analyze nine metagenomic data sets from a wide range of environments, and show that our findings are relevant for empirically-sampled communities. Hence, we recommend the use of Shannon and Simpson diversity rather than species richness in efforts to quantify and compare microbial diversity.
Figures
and
are based on reconstructions of the rarefaction curve Sm from sample data. For a sample of size M, the rarefaction curve Sm for m⩽M can be estimated by subsampling (red full line). If the sample size M is large, the estimator has small uncertainty. The rarefaction curve Sm for m>M can be estimated by extrapolating the sample data beyond the sample size M. Different extrapolation scenarios are compatible with the sample data. We consider two extreme scenarios (dashed lines). A lower estimate is obtained by assuming that unobserved species are approximately as rare as the rarest observed species. An upper estimate is obtained by assuming that unobserved species are represented in the community by one individual. The difference between the two extremes quantifies the uncertainty of the extrapolation, shown as the shaded region. The uncertainty increases rapidly for m>>M.
and
for the Hill diversity Dα. We consider three sample sizes M (in columns: M=102, 104, 106) and three community sizes N (in rows: N=1010, 1015, 1020). The shaded range between
and
indicates the estimation uncertainty. The true Hill diversity Dα of the community is plotted in black. The Hill diversities between α=1 (Shannon) and α=2 (Simpson) are correctly estimated even for small sample size M. The estimates of Hill diversities less than α=1, including α=0 (species richness), are characterized by large uncertainty.
References
-
- Bent SJ, Forney LJ. The tragedy of the uncommon: understanding limitations in the analysis of microbial diversity. ISME J. 2008;2:689–695. - PubMed
-
- Bohannan BJM, Hughes JB. New approaches to analyzing microbial biodiversity data. Curr Opin Microbiol. 2003;6:282–287. - PubMed
-
- Brose U, Martinez ND, Williams RJ. Estimating species richness: sensitivity to sample coverage and insensitivity to spatial patterns. Ecology. 2003;84:2364–2377.
-
- Bunge J.2009Statistical estimation of uncultivated microbial diversityIn: Epstein SS (ed)Uncultivated Microorganisms Springer-Verlag; 1–18.
-
- Bunge J, Fitzpatrick M. Estimating the number of species: a review. J Amer Statist Assoc. 1993;88:364–373.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Medical
