Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Mar 24:16:1528910.
doi: 10.3389/fphys.2025.1528910. eCollection 2025.

Cross-sectional study on smoking types and stroke risk: development of a predictive model for identifying stroke risk

Affiliations

Cross-sectional study on smoking types and stroke risk: development of a predictive model for identifying stroke risk

Chao Ding et al. Front Physiol. .

Abstract

Background: Stroke, a major global health concern, is responsible for high mortality and long-term disabilities. With the aging population and increasing prevalence of risk factors, its incidence is on the rise. Existing risk assessment tools have limitations, and there is a pressing need for more accurate and personalized stroke risk prediction models. Smoking, a significant modifiable risk factor, has not been comprehensively examined in current models regarding different smoking types.

Methods: Data were sourced from the 2015-2018 National Health and Nutrition Examination Survey (NHANES) and the 2020-2021 Behavioral Risk Factor Surveillance System (BRFSS). Tobacco use (including combustible cigarettes and e-cigarettes) and stroke history were obtained through questionnaires. Participants were divided into four subgroups: non-smokers, exclusive combustible cigarette users, exclusive e-cigarette users, and dual users. Covariates such as age, sex, race, education, and health conditions were also collected. Multivariate logistic regression was used to analyze the relationship between smoking and stroke. Four machine-learning models (XGBoost, logistic regression, Random Forest, and Gaussian Naive Bayes) were evaluated using the area under the receiver-operating characteristic curve (AUC), and Shapley's additive interpretation method was applied for feature importance ranking and model interpretation.

Results: A total of 273,028 individuals were included in the study. Exclusive combustible cigarette users had an elevated stroke risk (β: 1.36, 95% CI: 1.26-1.47, P < 0.0001). Among the four machine-learning models, the XGBoost model showed the best discriminative ability with an AUC of 0.794 (95% CI = 0.787-0.802).

Conclusion: This study reveals a significant association between smoking types and stroke risk. An XGBoost-based stroke prediction model was established, which has the potential to improve the accuracy of stroke risk assessment and contribute to personalized interventions for stroke prevention, thus alleviating the healthcare burden related to stroke.

Keywords: Shap; XGBoost; machine learning; prediction model; stroke.

PubMed Disclaimer

Conflict of interest statement

Author MY was employed by Spring Airlines Co,.Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

FIGURE 1
FIGURE 1
Flowchart for inclusion of study populations according to the purpose of the study.
FIGURE 2
FIGURE 2
Forest plot of the relationship between cigarette use and stroke.
FIGURE 3
FIGURE 3
(A) Selection of features with non-zero coefficients and their coefficients using the LASSO regression method. (B) The impact of the penalty coefficient λ on the weight coefficients of each independent variable is represented on the horizontal axis as λ and on the vertical axis as the weight coefficients, with distinct colors indicating the weight coefficients of individual independent variables.
FIGURE 4
FIGURE 4
(A) The horizontal coordinates indicate the magnitude of the SHAP value, with positive values representing the positive contribution of the variable to a positive stroke outcome and negative values the opposite; the color ranges from blue to red to characterize the low to high values of the variable in order. (B) SHAP evaluations of the XGBoost algorithm for forecasting adverse outcomes in stroke patients. (C) The mean AUC performance of four machine learning models evaluated using five-fold external cross-validation. (D) ROC curve examination of the XGBoost algorithms for predicting stroke risk in the external test set.
FIGURE 5
FIGURE 5
Web-based calculator predicts stroke risk.

Similar articles

Cited by

References

    1. Adler N., Bahcheli A. T., Cheng K. C. L., Al-Zahrani K. N., Slobodyanyuk M., Pellegrina D., et al. (2023). Mutational processes of tobacco smoking and APOBEC activity generate protein-truncating mutations in cancer genomes. Sci. Adv. 9 (44), eadh3083. 10.1126/sciadv.adh3083 - DOI - PMC - PubMed
    1. Ananth C. V., Brandt J. S., Keyes K. M., Graham H. L., Kostis J. B., Kostis W. J. (2023). Epidemiology and trends in stroke mortality in the USA, 1975-2019. Int. J. Epidemiol. 52 (3), 858–866. 10.1093/ije/dyac210 - DOI - PMC - PubMed
    1. Belkin S., Benthien J., Axt P. N., Mohr T., Mortensen K., Weckmann M., et al. (2023). Impact of heated tobacco products, E-cigarettes, and cigarettes on inflammation and endothelial dysfunction. Int. J. Mol. Sci. 24 (11), 9432. 10.3390/ijms24119432 - DOI - PMC - PubMed
    1. Benowitz N. L., Fraiman J. B. (2017). Cardiovascular effects of electronic cigarettes. Nat. Rev. Cardiol. 14 (8), 447–456. 10.1038/nrcardio.2017.36 - DOI - PMC - PubMed
    1. Crotty Alexander L. E., Drummond C. A., Hepokoski M., Mathew D., Moshensky A., Willeford A., et al. (2018). Chronic inhalation of e-cigarette vapor containing nicotine disrupts airway barrier function and induces systemic inflammation and multiorgan fibrosis in mice. Am. J. Physiol. Regul. Integr. Comp. Physiol. 314 (6), R834-R847–r847. 10.1152/ajpregu.00270.2017 - DOI - PMC - PubMed

LinkOut - more resources