Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 Feb 14;2(2):e180.
doi: 10.1371/journal.pone.0000180.

Maximum likelihood estimation of the negative binomial dispersion parameter for highly overdispersed data, with applications to infectious diseases

Affiliations

Maximum likelihood estimation of the negative binomial dispersion parameter for highly overdispersed data, with applications to infectious diseases

James O Lloyd-Smith. PLoS One. .

Abstract

Background: The negative binomial distribution is used commonly throughout biology as a model for overdispersed count data, with attention focused on the negative binomial dispersion parameter, k. A substantial literature exists on the estimation of k, but most attention has focused on datasets that are not highly overdispersed (i.e., those with k>or=1), and the accuracy of confidence intervals estimated for k is typically not explored.

Methodology: This article presents a simulation study exploring the bias, precision, and confidence interval coverage of maximum-likelihood estimates of k from highly overdispersed distributions. In addition to exploring small-sample bias on negative binomial estimates, the study addresses estimation from datasets influenced by two types of event under-counting, and from disease transmission data subject to selection bias for successful outbreaks.

Conclusions: Results show that maximum likelihood estimates of k can be biased upward by small sample size or under-reporting of zero-class events, but are not biased downward by any of the factors considered. Confidence intervals estimated from the asymptotic sampling variance tend to exhibit coverage below the nominal level, with overestimates of k comprising the great majority of coverage errors. Estimation from outbreak datasets does not increase the bias of k estimates, but can add significant upward bias to estimates of the mean. Because k varies inversely with the degree of overdispersion, these findings show that overestimation of the degree of overdispersion is very rare for these datasets.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1
Estimated values of and confidence interval coverage for NB datasets. 10,000 datasets were simulated as described in Section 2.1.1 of the text, using mean m, dispersion parameter k, and sample size n as shown. Boxes show the median and interquartile range (IQR) of 10,000 resulting ML estimates of , and whiskers show the 5th and 95th percentile values. Numbers to the right of each subplot show the percentage of simulations for which the true value of k was outside (below (CI overestimate)/above (CI underestimate) for the numbers y/z, respectively) the 90% confidence interval estimated for The vertical line in each subplot shows the true value of k. To facilitate comparison among parameter sets, the horizontal axis of all subplots is scaled from 0 to 10 times the true value of k.
Figure 2
Figure 2
Estimated values of and confidence interval coverage for NB datasets with uniform under-counting of secondary cases. The probability with which any secondary case was missed by surveillance was (a) pu = 0.2 and (b) pu = 0.5. 10,000 datasets were simulated as described in Section 2.1.2 of the text, for parameters m, k, and n as shown. Plotting details are described in Figure 1.
Figure 3
Figure 3
Estimated values of and confidence interval coverage for NB datasets with under-reporting of zeroes. Individuals that caused no secondary infections were missed by surveillance with probability (a) pz = 0.2 and (b) pz = 0.5. 10,000 datasets were simulated as described in Section 2.1.3 of the text. Plotting details are described in Figure 1.
Figure 4
Figure 4
Estimated values of (a) and (b) for outbreak datasets generated by branching process simulations with NB offspring distributions. 10,000 datasets were simulated as described in Section 2.1.4 of the text. Circles indicate parameter sets for which fewer than 1 in 105 simulated outbreaks had n cases or more. Other plotting details are described in Figure 1.

References

    1. Bliss CI, Fisher RA. Fitting the negative binomial distribution to biological data - note on the efficient fitting of the negative binomial. Biometrics. 1953;9:176–200.
    1. Pielou EC. Mathematical Ecology. New York: Wiley.; 1977.
    1. White GC, Bennetts RE. Analysis of frequency count data using the negative binomial distribution. Ecology. 1996;77:2549–2557.
    1. Shaw DJ, Grenfell BT, Dobson AP. Patterns of macroparasite aggregation in wildlife host populations. Parasitology. 1998;117:597–610. - PubMed
    1. Walther BA, Morand S. Comparative performance of species richness estimation methods. Parasitology. 1998;116:395–405. - PubMed

Publication types

MeSH terms