Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jul 14;25(8):1481-1488.
doi: 10.1093/ntr/ntad066.

Are the Relevant Risk Factors Being Adequately Captured in Empirical Studies of Smoking Initiation? A Machine Learning Analysis Based on the Population Assessment of Tobacco and Health Study

Affiliations

Are the Relevant Risk Factors Being Adequately Captured in Empirical Studies of Smoking Initiation? A Machine Learning Analysis Based on the Population Assessment of Tobacco and Health Study

Thuy T T Le et al. Nicotine Tob Res. .

Abstract

Introduction: Cigarette smoking continues to pose a threat to public health. Identifying individual risk factors for smoking initiation is essential to further mitigate this epidemic. To the best of our knowledge, no study today has used machine learning (ML) techniques to automatically uncover informative predictors of smoking onset among adults using the Population Assessment of Tobacco and Health (PATH) study.

Aims and methods: In this work, we employed random forest paired with Recursive Feature Elimination to identify relevant PATH variables that predict smoking initiation among adults who have never smoked at baseline between two consecutive PATH waves. We included all potentially informative baseline variables in wave 1 (wave 4) to predict past 30-day smoking status in wave 2 (wave 5). Using the first and most recent pairs of PATH waves was found sufficient to identify the key risk factors of smoking initiation and test their robustness over time. The eXtreme Gradient Boosting method was employed to test the quality of these selected variables.

Results: As a result, classification models suggested about 60 informative PATH variables among many candidate variables in each baseline wave. With these selected predictors, the resulting models have a high discriminatory power with the area under the specificity-sensitivity curves of around 80%. We examined the chosen variables and discovered important features. Across the considered waves, two factors, (1) BMI, and (2) dental and oral health status, robustly appeared as important predictors of smoking initiation, besides other well-established predictors.

Conclusions: Our work demonstrates that ML methods are useful to predict smoking initiation with high accuracy, identifying novel smoking initiation predictors, and to enhance our understanding of tobacco use behaviors.

Implications: Understanding individual risk factors for smoking initiation is essential to prevent smoking initiation. With this methodology, a set of the most informative predictors of smoking onset in the PATH data were identified. Besides reconfirming well-known risk factors, the findings suggested additional predictors of smoking initiation that have been overlooked in previous work. More studies that focus on the newly discovered factors (BMI and dental and oral health status,) are needed to confirm their predictive power against the onset of smoking as well as determine the underlying mechanisms.

PubMed Disclaimer

Conflict of interest statement

The authors do not report any conflicts of interest.

Figures

Figure 1.
Figure 1.
Diagram of the data preparation process, where N is the number of participants and P is the number of variables including the outcome variable.
Figure 2.
Figure 2.
The receiver operating characteristic curves of all the XGBoost classifiers. The diagonal line in each plot represents random guessing. Each plot shows the area under the ROC curve (AUC) curve (95% confidence interval [CI]) together with a suggested threshold (specificity, sensitivity).

Similar articles

Cited by

References

    1. Cornelius ME, Wang TW, Jamal A, et al. . Tobacco product use among adults—United States, 2019. MMWR Morb Mortal Wkly Rep. 2020;69(46):1736. - PMC - PubMed
    1. US Department of Health and Human Service. Surgeon General’s advisory on e-cigarette use among youth. https://e-cigarettes.surgeongeneral.gov/documents/surgeon-generals-advis.... Accessed May 3, 2022.
    1. US Department of Health and Human Services. E-cigarette Use Among Youth and Young Adults: A Report of the Surgeon General. Atlanta, GA: U.S. Department of Health and Human Services, Centers for Disease Control and Prevention, National Center for Chronic Disease Prevention and Health Promotion, Office on Smoking and Health; 2016.
    1. U.S. Department of Health and Human Services. The Health Consequences of Smoking - 50 Years of Progress: A Report of the Surgeon General. Atlanta: U.S. Department of Health and Human Services, Public Health Service, Centers for Disease Control and Prevention, National Center for Chronic Disease Prevention and Health Promotion, Office on Smoking and Health. ­Available at: https://www.cdc.gov/tobacco/sgr/50th-anniversary/index.htm2014.
    1. Xu X, Shrestha SS, Trivers KF, et al. . US healthcare spending attributable to cigarette smoking in 2014. Prev Med. 2021;150:106529. - PMC - PubMed