Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2021;8(1):8.
doi: 10.1186/s40488-021-00121-4. Epub 2021 Jun 24.

A comparison of zero-inflated and hurdle models for modeling zero-inflated count data

Affiliations
Review

A comparison of zero-inflated and hurdle models for modeling zero-inflated count data

Cindy Xin Feng. J Stat Distrib Appl. 2021.

Abstract

Counts data with excessive zeros are frequently encountered in practice. For example, the number of health services visits often includes many zeros representing the patients with no utilization during a follow-up time. A common feature of this type of data is that the count measure tends to have excessive zero beyond a common count distribution can accommodate, such as Poisson or negative binomial. Zero-inflated or hurdle models are often used to fit such data. Despite the increasing popularity of ZI and hurdle models, there is still a lack of investigation of the fundamental differences between these two types of models. In this article, we reviewed the zero-inflated and hurdle models and highlighted their differences in terms of their data generating processes. We also conducted simulation studies to evaluate the performances of both types of models. The final choice of regression model should be made after a careful assessment of goodness of fit and should be tailored to a particular data in question.

Keywords: Hurdle model; Model diagnosis; Zero deflation; Zero inflation.

PubMed Disclaimer

Conflict of interest statement

Competing interestsThe author declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Percentage of zero deflation over all data points when the data are simulated from a HNB model of sample size n=300. In the left panel, the covariate x is a binary variable simulated from a Bernoulli distribution with probability parameter 0.5. In the right panel, the covariate is a continuou variable simulated from a standard normal distribution
Fig. 2
Fig. 2
Probabilities of observing a zero (green), a sampling zero (blue) and their differences (black) against the covariate when the data are simulated from a HNB model with a binary covariate of sample size n=300. The intercepts for both the zero and truncated counts components are set as 1. The regression coefficients of the covariate for the zero (β1) and the truncated counts component (α1) are set as -2, -1.5, -1, -0.1, 0.1, 1, 1.5 and 2
Fig. 3
Fig. 3
Probabilities of observing a zero (green), a sampling zero (blue) and their differences (black) against the covariate, when the data are simulated from a HNB model with a continuous covariate of sample size n=300. The intercepts for both the zero and truncated counts components are set as 1. The regression coefficients of the covariate for the zero (β1) and the truncated counts component (α1) are set as -2, -1.5, -1, -0.1, 0.1, 1, 1.5 and 2
Fig. 4
Fig. 4
Mean of the standardized differences of the probability of being an excessive zero and the probability of being a sampling zero when data are simulated from a ZINB model of sample size n=300. In the left panel, the covariate is a binary variable from a Bernoulli random variable with probability parameter 0.5. In the right panel, the covariate is a continuous variable from a standard normal distribution
Fig. 5
Fig. 5
Simulation results for the simulation setting #1 (true model: HNB model with a single binary covariate generated from a Bernoulli distribution with probability parameter 0.5). Evaluation criteria include Δ¯AIC (mean difference in AICs of the ZINB and HNB models); %ΔAIC>4 (percentage of the differences in AICs between the ZINB and HNB models that are above 4; percentage of Vuong’s test p-value <5% and percentage of the SW normality test of the RQRs for the ZINB model <5%
Fig. 6
Fig. 6
Simulation results for the simulation setting #1 (true model: HNB model with a single continuous covariate generated from a standard normal distribution). Evaluation criteria include Δ¯AIC (mean difference in AICs of the ZINB and HNB models); %ΔAIC>4 (percentage of the differences in AICs between the ZINB and HNB models that are above 4; the percentage of Vuong’s test p-value <5% and the percentage of the SW normality test of the RQRs for the ZINB model <5%
Fig. 7
Fig. 7
Simulation results for the simulation setting #2 (true model: ZINB model with a single binary covariate generated from a Bernoulli distribution with probability parameter 0.5). Evaluation criteria include Δ¯AIC (mean difference in AICs of the HNB and ZINB models); %ΔAIC>4 (percentage of the differences in AICs between the HNB and ZINB models that are above 4; the percentage of Vuong’s test p-value <5% and the percentage of the SW normality test of the RQRs for the HNB model <5%
Fig. 8
Fig. 8
Simulation results for the simulation setting #2 (true model: ZINB model with a single continuous covariate generated from a standard normal distribution). Evaluation criteria include Δ¯AIC (mean difference in AICs of the HNB and ZINB models); %ΔAIC>4 (percentage of the differences in AICs between the HNB and ZINB models that are above 4; the percentage of Vuong’s test p-value <5% and the percentage of the SW normality test of the RQRs for the HNB model <5%

Similar articles

Cited by

References

    1. Agarwal D. K., Gelfand A. E., Citron-Pousty S. Zero-inflated models with application to spatial count data. Environ. Ecol. Stat. 2002;9:341–355. doi: 10.1023/A:1020910605990. - DOI
    1. Lovric M., editor. Akaike’s Information Criterion. Berlin: Springer; 2011.
    1. Akaike H., Petrov B. N., Csaki F. Second international symposium on information theory. Budapest: Akadémiai Kiadó; 1973.
    1. Atkins D., Gallop R. Rethinking how family researchers model infrequent outcomes: A tutorial on count regression and zero-inflated models. J. Fam. Psychol. 2007;21(4):726–735. doi: 10.1037/0893-3200.21.4.726. - DOI - PubMed
    1. Austin P. C. Using the standardized difference to compare the prevalence of a binary variable between two groups in observational research. Commun. Stat. Simul. Comput. 2009;38(6):1228–1234. doi: 10.1080/03610910902859574. - DOI

LinkOut - more resources