Review

. 2021;8(1):8.

doi: 10.1186/s40488-021-00121-4. Epub 2021 Jun 24.

A comparison of zero-inflated and hurdle models for modeling zero-inflated count data

Cindy Xin Feng¹

Affiliations

PMID: 34760432
PMCID: PMC8570364
DOI: 10.1186/s40488-021-00121-4

Review

A comparison of zero-inflated and hurdle models for modeling zero-inflated count data

Cindy Xin Feng. J Stat Distrib Appl. 2021.

. 2021;8(1):8.

doi: 10.1186/s40488-021-00121-4. Epub 2021 Jun 24.

Author

Cindy Xin Feng¹

Affiliation

¹ Department of Community Health and Epidemiology, Faculty of Medicine, Dalhousie University, 5790 University Avenue, Halifax, B3H 4R2 Nova Scotia Canada.

PMID: 34760432
PMCID: PMC8570364
DOI: 10.1186/s40488-021-00121-4

Abstract

Counts data with excessive zeros are frequently encountered in practice. For example, the number of health services visits often includes many zeros representing the patients with no utilization during a follow-up time. A common feature of this type of data is that the count measure tends to have excessive zero beyond a common count distribution can accommodate, such as Poisson or negative binomial. Zero-inflated or hurdle models are often used to fit such data. Despite the increasing popularity of ZI and hurdle models, there is still a lack of investigation of the fundamental differences between these two types of models. In this article, we reviewed the zero-inflated and hurdle models and highlighted their differences in terms of their data generating processes. We also conducted simulation studies to evaluate the performances of both types of models. The final choice of regression model should be made after a careful assessment of goodness of fit and should be tailored to a particular data in question.

Keywords: Hurdle model; Model diagnosis; Zero deflation; Zero inflation.

PubMed Disclaimer

Conflict of interest statement

Competing interestsThe author declare that they have no competing interests.

Figures

**Fig. 1**
Percentage of zero deflation over all data points when the data are simulated from a HNB model of sample size n=300. In the left panel, the covariate x is a binary variable simulated from a Bernoulli distribution with probability parameter 0.5. In the right panel, the covariate is a continuou variable simulated from a standard normal distribution

**Fig. 2**
Probabilities of observing a zero (green), a sampling zero (blue) and their differences (black) against the covariate when the data are simulated from a HNB model with a binary covariate of sample size n=300. The intercepts for both the zero and truncated counts components are set as 1. The regression coefficients of the covariate for the zero (β₁) and the truncated counts component (α₁) are set as -2, -1.5, -1, -0.1, 0.1, 1, 1.5 and 2

**Fig. 3**
Probabilities of observing a zero (green), a sampling zero (blue) and their differences (black) against the covariate, when the data are simulated from a HNB model with a continuous covariate of sample size n=300. The intercepts for both the zero and truncated counts components are set as 1. The regression coefficients of the covariate for the zero (β₁) and the truncated counts component (α₁) are set as -2, -1.5, -1, -0.1, 0.1, 1, 1.5 and 2

**Fig. 4**
Mean of the standardized differences of the probability of being an excessive zero and the probability of being a sampling zero when data are simulated from a ZINB model of sample size n=300. In the left panel, the covariate is a binary variable from a Bernoulli random variable with probability parameter 0.5. In the right panel, the covariate is a continuous variable from a standard normal distribution

**Fig. 5**
Simulation results for the simulation setting #1 (true model: HNB model with a single binary covariate generated from a Bernoulli distribution with probability parameter 0.5). Evaluation criteria include $\bar{Δ}$ AIC (mean difference in AICs of the ZINB and HNB models); %Δ*AIC*>4 (percentage of the differences in AICs between the ZINB and HNB models that are above 4; percentage of Vuong’s test p-value <5% and percentage of the SW normality test of the RQRs for the ZINB model <5%

**Fig. 6**
Simulation results for the simulation setting #1 (true model: HNB model with a single continuous covariate generated from a standard normal distribution). Evaluation criteria include $\bar{Δ}$ AIC (mean difference in AICs of the ZINB and HNB models); %Δ*AIC*>4 (percentage of the differences in AICs between the ZINB and HNB models that are above 4; the percentage of Vuong’s test p-value <5% and the percentage of the SW normality test of the RQRs for the ZINB model <5%

**Fig. 7**
Simulation results for the simulation setting #2 (true model: ZINB model with a single binary covariate generated from a Bernoulli distribution with probability parameter 0.5). Evaluation criteria include $\bar{Δ}$ AIC (mean difference in AICs of the HNB and ZINB models); %Δ*AIC*>4 (percentage of the differences in AICs between the HNB and ZINB models that are above 4; the percentage of Vuong’s test p-value <5% and the percentage of the SW normality test of the RQRs for the HNB model <5%

**Fig. 8**
Simulation results for the simulation setting #2 (true model: ZINB model with a single continuous covariate generated from a standard normal distribution). Evaluation criteria include $\bar{Δ}$ AIC (mean difference in AICs of the HNB and ZINB models); %Δ*AIC*>4 (percentage of the differences in AICs between the HNB and ZINB models that are above 4; the percentage of Vuong’s test p-value <5% and the percentage of the SW normality test of the RQRs for the HNB model <5%

See this image and copyright information in PMC

Cited by

Assessment of Neonatal Mortality and Associated Hospital-Related Factors in Healthcare Facilities Within Sunyani and Sunyani West Municipal Assemblies in Bono Region, Ghana.
Tawiah K, Asosega KA, Iddi S, Opoku AA, Abdul IW, Ansah RK, Bukari FK, Okyere E, Adebanji AO. Tawiah K, et al. Health Serv Insights. 2024 Jun 11;17:11786329241258836. doi: 10.1177/11786329241258836. eCollection 2024. Health Serv Insights. 2024. PMID: 38873401 Free PMC article.
Zero-inflated models for the evaluation of colorectal polyps in colon cancer screening studies-a value-based biostatistics practice.
Dwivedi AK, Elhanafi SE, Othman MO, Zuckerman MJ. Dwivedi AK, et al. PeerJ. 2025 May 26;13:e19504. doi: 10.7717/peerj.19504. eCollection 2025. PeerJ. 2025. PMID: 40444286 Free PMC article.
Western corn rootworm adult activity and immigrant resistance to Bt traits in first-year maize.
Meinke LJ, Reinders JD, Clothier J, Krumm JT, Pilcher CD, Carroll MW, Head GP. Meinke LJ, et al. PLoS One. 2025 Jun 13;20(6):e0325388. doi: 10.1371/journal.pone.0325388. eCollection 2025. PLoS One. 2025. PMID: 40512719 Free PMC article.
Hybrid Machine Learning Approach to Zero-Inflated Data Improves Accuracy of Dengue Prediction.
Francisco ME, Carvajal TM, Watanabe K. Francisco ME, et al. PLoS Negl Trop Dis. 2024 Oct 21;18(10):e0012599. doi: 10.1371/journal.pntd.0012599. eCollection 2024 Oct. PLoS Negl Trop Dis. 2024. PMID: 39432557 Free PMC article.
Using zero-inflated and hurdle regression models to analyze schistosomiasis data of school children in the southern areas of Ghana.
Nketia K, de Souza DK. Nketia K, et al. PLoS One. 2024 Jul 12;19(7):e0304681. doi: 10.1371/journal.pone.0304681. eCollection 2024. PLoS One. 2024. PMID: 38995915 Free PMC article.

See all "Cited by" articles

References

1. Agarwal D. K., Gelfand A. E., Citron-Pousty S. Zero-inflated models with application to spatial count data. Environ. Ecol. Stat. 2002;9:341–355. doi: 10.1023/A:1020910605990. - DOI
1. Lovric M., editor. Akaike’s Information Criterion. Berlin: Springer; 2011.
1. Akaike H., Petrov B. N., Csaki F. Second international symposium on information theory. Budapest: Akadémiai Kiadó; 1973.
1. Atkins D., Gallop R. Rethinking how family researchers model infrequent outcomes: A tutorial on count regression and zero-inflated models. J. Fam. Psychol. 2007;21(4):726–735. doi: 10.1037/0893-3200.21.4.726. - DOI - PubMed
1. Austin P. C. Using the standardized difference to compare the prevalence of a binary variable between two groups in observational research. Commun. Stat. Simul. Comput. 2009;38(6):1228–1234. doi: 10.1080/03610910902859574. - DOI

Publication types

Actions

LinkOut - more resources

Full Text Sources
- Europe PubMed Central
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A comparison of zero-inflated and hurdle models for modeling zero-inflated count data

Affiliation

A comparison of zero-inflated and hurdle models for modeling zero-inflated count data

Author

Affiliation

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

LinkOut - more resources

Full Text Sources