. 2018 Apr 18;22(8):1390-1398.

doi: 10.1093/ntr/nty072. Online ahead of print.

Models for analyzing zero-inflated and overdispersed count data: an application to cigarette and marijuana use

Brian Pittman¹, Eugenia Buta², Suchitra Krishnan-Sarin¹, Stephanie S O'Malley¹, Thomas Liss¹, Ralitza Gueorguieva^{1

2}

Affiliations

PMID: 29912423
PMCID: PMC7364829
DOI: 10.1093/ntr/nty072

Models for analyzing zero-inflated and overdispersed count data: an application to cigarette and marijuana use

Brian Pittman et al. Nicotine Tob Res. 2018.

. 2018 Apr 18;22(8):1390-1398.

doi: 10.1093/ntr/nty072. Online ahead of print.

Authors

Brian Pittman¹, Eugenia Buta², Suchitra Krishnan-Sarin¹, Stephanie S O'Malley¹, Thomas Liss¹, Ralitza Gueorguieva^{1

2}

Affiliations

¹ Department of Psychiatry, Yale School of Medicine.
² Department of Biostatistics, Yale School of Public Health.

PMID: 29912423
PMCID: PMC7364829
DOI: 10.1093/ntr/nty072

Abstract

Introduction: This paper describes different methods for analyzing counts and illustrates their use on cigarette and marijuana smoking data.

Methods: The Poisson, zero-inflated Poisson (ZIP), hurdle Poisson (HUP), negative binomial (NB), zero-inflated negative binomial (ZINB) and hurdle negative binomial (HUNB) regression models are considered. The different approaches are evaluated in terms of the ability to take into account zero-inflation (extra zeroes) and overdispersion (variance larger than expected) in count outcomes, with emphasis placed on model fit, interpretation, and choosing an appropriate model given the nature of the data. The illustrative data example focuses on cigarette and marijuana smoking reports from a study on smoking habits among youth e-cigarette users with gender, age, and e-cigarette use included as predictors.

Results: Of the 69 subjects available for analysis, 36% and 64% reported smoking no cigarettes and no marijuana, respectively, suggesting both outcomes might be zero-inflated. Both outcomes were also overdispersed with large positive skew. The ZINB and HUNB models fit the cigarette counts best. According to goodness-of-fit statistics, the NB, HUNB, and ZINB models fit the marijuana data well, but the ZINB provided better interpretation.

Conclusion: In the absence of zero-inflation, the NB model fits smoking data well, which is typically overdispersed. In the presence of zero-inflation, the ZINB or HUNB model is recommended to account for additional heterogeneity. In addition to model fit and interpretability, choosing between a zero-inflated or hurdle model should ultimately depend on the assumptions regarding the zeros, study design, and the research question being asked.

Implications: Count outcomes are frequent in tobacco research and often have many zeros and exhibit large variance and skew. Analyzing such data based on methods requiring a normally distributed outcome are inappropriate and will likely produce spurious results. This study compares and contrasts appropriate methods for analyzing count data, specifically those with an over-abundance of zeros, and illustrates their use on cigarette and marijuana smoking data. Recommendations are provided.

PubMed Disclaimer

Figures

**Figure 1.**
Observed versus predicted cigarette use reported on the day just before study intake. NB = negative binomial; ZIP = zero-inflated Poisson; HUP = hurdle Poisson; ZINB = zero-inflated negative binomial; HUNB = hurdle negative binomial; cigarettes use was truncated at 20 for clarity. Not shown: a single endorsement of 40 cigarettes. Note: ZIP and ZINB are modeling structural and sampling zeros.

**Figure 2.**
Observed versus predicted marijuana use reported on the day just before study intake. NB = negative binomial; ZIP = zero-inflated Poisson; HUP = hurdle Poisson; HUNB = hurdle negative binomial; marijuana use was truncated at 12 for clarity. Not shown: a single endorsement for 21 marijuana joints. *Note:* ZIP and ZINB are modeling structural and sampling zeros.

See this image and copyright information in PMC

Cited by

Effect of COVID-19 pandemic on missed medical appointment among adults with chronic disease conditions in Northwest Ethiopia.
Ayele TA, Alamneh TS, Shibru H, Sisay MM, Yilma TM, Melak MF, Bisetegn TA, Belachew T, Haile M, Zeru T, Asres MS, Shitu K. Ayele TA, et al. PLoS One. 2022 Oct 4;17(10):e0274190. doi: 10.1371/journal.pone.0274190. eCollection 2022. PLoS One. 2022. PMID: 36194566 Free PMC article.
Recency of Cannabis Vaping in Sexual Minorities in Wave 5 of the Population Assessment of Tobacco and Health (PATH) Study.
Maglalang DD, Hu Y, Baslock D, Daus JD, Cano M, Ahluwalia JS. Maglalang DD, et al. Subst Use Misuse. 2024;59(1):136-142. doi: 10.1080/10826084.2023.2262024. Epub 2023 Dec 1. Subst Use Misuse. 2024. PMID: 37750356 Free PMC article.
Machine learning approach to predict acute kidney injury among patients undergoing multi-level spinal posterior instrumented fusion.
Heo KY, Rajan PV, Khawaja S, Barber LA, Yoon ST. Heo KY, et al. J Spine Surg. 2024 Sep 23;10(3):362-371. doi: 10.21037/jss-24-15. Epub 2024 Aug 23. J Spine Surg. 2024. PMID: 39399076 Free PMC article.
An Introduction and Practical Guide to Strategies for Analyzing Longitudinal Data in Clinical Trials of Smoking Cessation Treatment: Beyond Dichotomous Point-Prevalence Outcomes.
Kypriotakis G, Bernstein SL, Bold KW, Dziura JD, Hedeker D, Mermelstein RJ, Weinberger AH. Kypriotakis G, et al. Nicotine Tob Res. 2024 Jun 21;26(7):796-805. doi: 10.1093/ntr/ntae005. Nicotine Tob Res. 2024. PMID: 38214037 Free PMC article. Review.
Determinants of sexually transmitted infections among female sex workers in Ethiopia: a count regression model approach.
Wariso FB, Ayalew J, Barba A, Bedassa BB, Ebo GG, Tura JB, Rameto M, Belihu WB, Asfaw D, Amogne MD, Negeri L, Lulseged S, Abrahim SA. Wariso FB, et al. Front Public Health. 2023 Aug 4;11:1190085. doi: 10.3389/fpubh.2023.1190085. eCollection 2023. Front Public Health. 2023. PMID: 37601188 Free PMC article.

See all "Cited by" articles

References

1. Agresti A. An Introduction to Categorical Data Analysis. Hoboken, NJ: Wiley; 2007.
1. McCullagh P, Nelder JA.. Generalized Linear Models. 2nd ed London: Chapman and Hall; 1989.
1. Lambert D. Zero-inflated Poisson regression, with an application to defects in manufacturing. Technometrics. 1992;34(1):1–14.
1. Mullahy J. Specification and testing of some modified count data models. J Econom. 1986;33(3):341–365.
1. van der Sluijs W, Haseen F, Miller M, et al. . “It looks like an adult sweetie shop”: point-of-sale tobacco display exposure and brand awareness in Scottish secondary school students. Nicotine Tob Res. 2016;18(10):1981–1988. - PMC - PubMed

Grants and funding

P50 DA036151/DA/NIDA NIH HHS/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Models for analyzing zero-inflated and overdispersed count data: an application to cigarette and marijuana use

Affiliations

Models for analyzing zero-inflated and overdispersed count data: an application to cigarette and marijuana use

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Abstract

Figures

Similar articles

Cited by

References

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources