. 2007 Feb 14;2(2):e180.

doi: 10.1371/journal.pone.0000180.

Maximum likelihood estimation of the negative binomial dispersion parameter for highly overdispersed data, with applications to infectious diseases

James O Lloyd-Smith¹

Affiliations

Affiliation

¹ Center for Infectious Disease Dynamics, Mueller Lab, Pennsylvania State University, University Park, Pennsylvania, United States of America. jlloydsmith@psu.edu

PMID: 17299582
PMCID: PMC1791715
DOI: 10.1371/journal.pone.0000180

Maximum likelihood estimation of the negative binomial dispersion parameter for highly overdispersed data, with applications to infectious diseases

James O Lloyd-Smith. PLoS One. 2007.

. 2007 Feb 14;2(2):e180.

doi: 10.1371/journal.pone.0000180.

Author

James O Lloyd-Smith¹

Affiliation

¹ Center for Infectious Disease Dynamics, Mueller Lab, Pennsylvania State University, University Park, Pennsylvania, United States of America. jlloydsmith@psu.edu

PMID: 17299582
PMCID: PMC1791715
DOI: 10.1371/journal.pone.0000180

Abstract

Background: The negative binomial distribution is used commonly throughout biology as a model for overdispersed count data, with attention focused on the negative binomial dispersion parameter, k. A substantial literature exists on the estimation of k, but most attention has focused on datasets that are not highly overdispersed (i.e., those with k>or=1), and the accuracy of confidence intervals estimated for k is typically not explored.

Methodology: This article presents a simulation study exploring the bias, precision, and confidence interval coverage of maximum-likelihood estimates of k from highly overdispersed distributions. In addition to exploring small-sample bias on negative binomial estimates, the study addresses estimation from datasets influenced by two types of event under-counting, and from disease transmission data subject to selection bias for successful outbreaks.

Conclusions: Results show that maximum likelihood estimates of k can be biased upward by small sample size or under-reporting of zero-class events, but are not biased downward by any of the factors considered. Confidence intervals estimated from the asymptotic sampling variance tend to exhibit coverage below the nominal level, with overestimates of k comprising the great majority of coverage errors. Estimation from outbreak datasets does not increase the bias of k estimates, but can add significant upward bias to estimates of the mean. Because k varies inversely with the degree of overdispersion, these findings show that overestimation of the degree of overdispersion is very rare for these datasets.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

**Figure 1**
Estimated values of k̂ and confidence interval coverage for NB datasets. 10,000 datasets were simulated as described in Section 2.1.1 of the text, using mean m, dispersion parameter k, and sample size n as shown. Boxes show the median and interquartile range (IQR) of 10,000 resulting ML estimates of k̂, and whiskers show the 5^th and 95^th percentile values. Numbers to the right of each subplot show the percentage of simulations for which the true value of k was outside (below (CI overestimate)/above (CI underestimate) for the numbers y/z, respectively) the 90% confidence interval estimated for k̂ The vertical line in each subplot shows the true value of k. To facilitate comparison among parameter sets, the horizontal axis of all subplots is scaled from 0 to 10 times the true value of k.

**Figure 2**
Estimated values of k̂ and confidence interval coverage for NB datasets with uniform under-counting of secondary cases. The probability with which any secondary case was missed by surveillance was (a) *p_u* = 0.2 and (b) *p_u* = 0.5. 10,000 datasets were simulated as described in Section 2.1.2 of the text, for parameters m, k, and n as shown. Plotting details are described in Figure 1.

**Figure 3**
Estimated values of k̂ and confidence interval coverage for NB datasets with under-reporting of zeroes. Individuals that caused no secondary infections were missed by surveillance with probability (a) *p_z* = 0.2 and (b) *p_z* = 0.5. 10,000 datasets were simulated as described in Section 2.1.3 of the text. Plotting details are described in Figure 1.

**Figure 4**
Estimated values of (a) k̂ and (b) m̂ for outbreak datasets generated by branching process simulations with NB offspring distributions. 10,000 datasets were simulated as described in Section 2.1.4 of the text. Circles indicate parameter sets for which fewer than 1 in 10⁵ simulated outbreaks had n cases or more. Other plotting details are described in Figure 1.

See this image and copyright information in PMC

References

1. Bliss CI, Fisher RA. Fitting the negative binomial distribution to biological data - note on the efficient fitting of the negative binomial. Biometrics. 1953;9:176–200.
1. Pielou EC. Mathematical Ecology. New York: Wiley.; 1977.
1. White GC, Bennetts RE. Analysis of frequency count data using the negative binomial distribution. Ecology. 1996;77:2549–2557.
1. Shaw DJ, Grenfell BT, Dobson AP. Patterns of macroparasite aggregation in wildlife host populations. Parasitology. 1998;117:597–610. - PubMed
1. Walther BA, Morand S. Comparative performance of species richness estimation methods. Parasitology. 1998;116:395–405. - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

R01-DA10135/DA/NIDA NIH HHS/United States

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Maximum likelihood estimation of the negative binomial dispersion parameter for highly overdispersed data, with applications to infectious diseases

Affiliation

Maximum likelihood estimation of the negative binomial dispersion parameter for highly overdispersed data, with applications to infectious diseases

Author

Affiliation

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources