Are the Relevant Risk Factors Being Adequately Captured in Empirical Studies of Smoking Initiation? A Machine Learning Analysis Based on the Population Assessment of Tobacco and Health Study
- PMID: 37099744
- PMCID: PMC10347975
- DOI: 10.1093/ntr/ntad066
Are the Relevant Risk Factors Being Adequately Captured in Empirical Studies of Smoking Initiation? A Machine Learning Analysis Based on the Population Assessment of Tobacco and Health Study
Abstract
Introduction: Cigarette smoking continues to pose a threat to public health. Identifying individual risk factors for smoking initiation is essential to further mitigate this epidemic. To the best of our knowledge, no study today has used machine learning (ML) techniques to automatically uncover informative predictors of smoking onset among adults using the Population Assessment of Tobacco and Health (PATH) study.
Aims and methods: In this work, we employed random forest paired with Recursive Feature Elimination to identify relevant PATH variables that predict smoking initiation among adults who have never smoked at baseline between two consecutive PATH waves. We included all potentially informative baseline variables in wave 1 (wave 4) to predict past 30-day smoking status in wave 2 (wave 5). Using the first and most recent pairs of PATH waves was found sufficient to identify the key risk factors of smoking initiation and test their robustness over time. The eXtreme Gradient Boosting method was employed to test the quality of these selected variables.
Results: As a result, classification models suggested about 60 informative PATH variables among many candidate variables in each baseline wave. With these selected predictors, the resulting models have a high discriminatory power with the area under the specificity-sensitivity curves of around 80%. We examined the chosen variables and discovered important features. Across the considered waves, two factors, (1) BMI, and (2) dental and oral health status, robustly appeared as important predictors of smoking initiation, besides other well-established predictors.
Conclusions: Our work demonstrates that ML methods are useful to predict smoking initiation with high accuracy, identifying novel smoking initiation predictors, and to enhance our understanding of tobacco use behaviors.
Implications: Understanding individual risk factors for smoking initiation is essential to prevent smoking initiation. With this methodology, a set of the most informative predictors of smoking onset in the PATH data were identified. Besides reconfirming well-known risk factors, the findings suggested additional predictors of smoking initiation that have been overlooked in previous work. More studies that focus on the newly discovered factors (BMI and dental and oral health status,) are needed to confirm their predictive power against the onset of smoking as well as determine the underlying mechanisms.
© The Author(s) 2023. Published by Oxford University Press on behalf of the Society for Research on Nicotine and Tobacco. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Conflict of interest statement
The authors do not report any conflicts of interest.
Figures


Similar articles
-
Association of Noncigarette Tobacco Product Use With Future Cigarette Smoking Among Youth in the Population Assessment of Tobacco and Health (PATH) Study, 2013-2015.JAMA Pediatr. 2018 Feb 1;172(2):181-187. doi: 10.1001/jamapediatrics.2017.4173. JAMA Pediatr. 2018. PMID: 29297010 Free PMC article.
-
Association of Electronic Nicotine Delivery System Use With Cigarette Smoking Relapse Among Former Smokers in the United States.JAMA Netw Open. 2020 Jun 1;3(6):e204813. doi: 10.1001/jamanetworkopen.2020.4813. JAMA Netw Open. 2020. PMID: 32501492 Free PMC article.
-
Prospective predictors of flavored e-cigarette use: A one-year longitudinal study of young adults in the U.S.Drug Alcohol Depend. 2018 Oct 1;191:279-285. doi: 10.1016/j.drugalcdep.2018.07.020. Epub 2018 Aug 25. Drug Alcohol Depend. 2018. PMID: 30165328 Free PMC article.
-
Tobacco and nicotine delivery product use in a U.S. national sample of women of reproductive age.Prev Med. 2018 Dec;117:61-68. doi: 10.1016/j.ypmed.2018.03.001. Epub 2018 Mar 17. Prev Med. 2018. PMID: 29559222 Free PMC article.
-
Trajectories of ENDS and cigarette use among dual users: analysis of waves 1 to 5 of the PATH Study.Tob Control. 2024 Mar 19;33(e1):e62-e68. doi: 10.1136/tc-2022-057405. Tob Control. 2024. PMID: 36601793
Cited by
-
Key Risk Factors Associated With Electronic Nicotine Delivery Systems Use Among Adolescents.JAMA Netw Open. 2023 Oct 2;6(10):e2337101. doi: 10.1001/jamanetworkopen.2023.37101. JAMA Netw Open. 2023. PMID: 37862018 Free PMC article.
-
Harnessing machine learning in contemporary tobacco research.Toxicol Rep. 2024 Dec 19;14:101877. doi: 10.1016/j.toxrep.2024.101877. eCollection 2025 Jun. Toxicol Rep. 2024. PMID: 39844883 Free PMC article. Review.
-
Identifying Key Predictors of Smoking Cessation Success: Text-Based Feature Selection Using a Large Language Model.medRxiv [Preprint]. 2025 Jun 20:2025.06.18.25329854. doi: 10.1101/2025.06.18.25329854. medRxiv. 2025. PMID: 40585098 Free PMC article. Preprint.
References
-
- US Department of Health and Human Service. Surgeon General’s advisory on e-cigarette use among youth. https://e-cigarettes.surgeongeneral.gov/documents/surgeon-generals-advis.... Accessed May 3, 2022.
-
- US Department of Health and Human Services. E-cigarette Use Among Youth and Young Adults: A Report of the Surgeon General. Atlanta, GA: U.S. Department of Health and Human Services, Centers for Disease Control and Prevention, National Center for Chronic Disease Prevention and Health Promotion, Office on Smoking and Health; 2016.
-
- U.S. Department of Health and Human Services. The Health Consequences of Smoking - 50 Years of Progress: A Report of the Surgeon General. Atlanta: U.S. Department of Health and Human Services, Public Health Service, Centers for Disease Control and Prevention, National Center for Chronic Disease Prevention and Health Promotion, Office on Smoking and Health. Available at: https://www.cdc.gov/tobacco/sgr/50th-anniversary/index.htm2014.
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Medical
Research Materials
Miscellaneous