Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021;48(6):1053-1070.
doi: 10.1080/02664763.2020.1754359. Epub 2020 Apr 19.

Tuning parameter selection for a penalized estimator of species richness

Affiliations

Tuning parameter selection for a penalized estimator of species richness

Alex Paynter et al. J Appl Stat. 2021.

Abstract

Our goal is to estimate the true number of classes in a population, called the species richness. We consider the case where multiple frequency count tables have been collected from a homogeneous population, and investigate a penalized maximum likelihood estimator under a negative binomial model. Because high probabilities of unobserved classes increase the variance of species richness estimates, our method penalizes the probability of a class being unobserved. Tuning the penalization parameter is challenging because the true species richness is never known, and so we propose and validate four novel methods for tuning the penalization parameter. We illustrate and contrast the performance of the proposed methods by estimating the strain-level microbial diversity of Lake Champlain over 3 consecutive years, and global human host-associated species-level microbial richness.

Keywords: diversity; ecology; maximum likelihood; microbiome; regularization.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: None to declare.

Figures

Figure 1.
Figure 1.
Estimates of C and their root-MSE over λ when η=(101,101) and C = 1000. Results are based on 100 simulations per λ.
Figure 2.
Figure 2.
Estimates of C and their root-MSE over λ when η=(102,105) and C = 1000. Results are based on 100 simulations per λ.
Figure 3.
Figure 3.
Simulation results for all proposed methods when η=(101,101), and when η=(102,105).
Figure 4.
Figure 4.
Simulation results for Methods 0 and 3 when η=(101,103), η=(101,105) and C{500,1000,2000}. The distribution of C^ is shown over 100 draws. The true value of C is indicated with a solid horizontal line.

References

    1. Bache S.M. and Wickham H., magrittr: A Forward-Pipe Operator for R, R package version 1.5, 2014.
    1. Barger K. and Bunge J., Objective Bayesian estimation for the number of species, Bayesian Anal. 5 (2010), pp. 765–785. doi: 10.1214/10-BA527 - DOI
    1. Bien J., The simulator: An engine to streamline simulations, preprint (2016). Available at http://www.arxiv.org/1607.00021.
    1. Bulmer M.G., On fitting the Poisson lognormal distribution to species-abundance data, Biometrics 30 (1974), pp. 101–110. doi: 10.2307/2529621 - DOI
    1. Bunge J. and Fitzpatrick M., Estimating the number of species: A review, J. Am. Stat. Assoc. 88 (1993), pp. 364–373.

LinkOut - more resources