Applications of species accumulation curves in large-scale biological data analysis
- PMID: 27252899
- PMCID: PMC4885658
- DOI: 10.1007/s40484-015-0049-7
Applications of species accumulation curves in large-scale biological data analysis
Abstract
The species accumulation curve, or collector's curve, of a population gives the expected number of observed species or distinct classes as a function of sampling effort. Species accumulation curves allow researchers to assess and compare diversity across populations or to evaluate the benefits of additional sampling. Traditional applications have focused on ecological populations but emerging large-scale applications, for example in DNA sequencing, are orders of magnitude larger and present new challenges. We developed a method to estimate accumulation curves for predicting the complexity of DNA sequencing libraries. This method uses rational function approximations to a classical non-parametric empirical Bayes estimator due to Good and Toulmin [Biometrika, 1956, 43, 45-63]. Here we demonstrate how the same approach can be highly effective in other large-scale applications involving biological data sets. These include estimating microbial species richness, immune repertoire size, and k-mer diversity for genome assembly applications. We show how the method can be modified to address populations containing an effectively infinite number of species where saturation cannot practically be attained. We also introduce a flexible suite of tools implemented as an R package that make these methods broadly accessible.
Keywords: accumulation region; immune repertoire; microbiome diversity; rational function approximation; species accumulation curve; species richness.
Conflict of interest statement
The authors Chao Deng, Timothy Daley and Andrew D Smith declare they have no conflict of interest.
Figures




References
-
- Magurran AE. Ecological Diversity and Its Measurement. Vol. 168. Princeton: Princeton University Press; 1988.
-
- Bunge J, Fitzpatrick M. Estimating the number of species: A review. J. Am. Stat. Assoc. 1993;88:364–373.
-
- Colwell RK, Mao CX, Chang J. Interpolating, extrapolating, and comparing incidence-based species accumulation curves. Ecology. 2004;85:2717–2727.
-
- Efron B, Thisted R. Estimating the number of unseen species: How many words did Shakespeare know? Biometrika. 1976;63:435–447.
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Miscellaneous