Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jul 5;15(1):24069.
doi: 10.1038/s41598-025-09409-w.

Explainable artificial intelligence driven insights into smoking prediction using machine learning and clinical parameters

Affiliations

Explainable artificial intelligence driven insights into smoking prediction using machine learning and clinical parameters

S Aishwarya et al. Sci Rep. .

Abstract

Smoking is a leading cause of various health conditions, including cancer and respiratory diseases. Smokers often face medical restrictions such as limitations in blood and organ donation, reduced effectiveness of medications, and increased surgical complications. These impacts underscore the need for early detection of smoking status to enable timely intervention. This study explores the use of Artificial Intelligence (AI) and Machine Learning (ML) techniques to predict smoking status based on health parameters, including biosignals and clinical biomarkers. A balanced subset of 2,000 instances was sampled from a publicly available Kaggle dataset comprising clinical and biometric features. Multiple ML models were implemented, including Random Forest Classifier, Logistic Regression, Decision Tree Classifier, K-Nearest Neighbors, CatBoost Classifier, and an Artificial Neural Network. The Random Forest Classifier achieved the better performance with an accuracy of 0.80, precision of 0.80, recall of 0.80, and F1-score of 0.79. To enhance model interpretability, four Explainable Artificial Intelligence (XAI) techniques were applied: Shapley Additive Explanations (SHAP), Local Interpretable Model-Agnostic Explanations (LIME), QLattice, and Anchor. SHAP identified hemoglobin as the most influential predictor, while LIME, QLattice, and Anchor highlighted the role of gamma-glutamyl transferase (t). Interactions between hemoglobin, GTP, and height were associated with more accurate predictions. The integration of ensemble modeling and multiple XAI approaches offers deeper interpretability than prior studies, providing healthcare providers and policymakers with a robust, transparent decision-support tool for targeted intervention strategies.

Keywords: Artificial intelligence; Health parameters; Machine learning; Smokers detection; XAI.

PubMed Disclaimer

Conflict of interest statement

Declarations. Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Health effects attributed to smoking.
Fig. 2
Fig. 2
Violin Plots Showing Distribution of (a) Age, (b) Systolic Blood Pressure, (c) Diastolic Blood Pressure, and (d) Hemoglobin Levels.
Fig. 3
Fig. 3
Multiple Bar Charts Depicting (a) Left Ear Hearing Status, (a) Right Ear Hearing Status, (c) Urine Protein Presence, and (d) Incidence of Dental Caries.
Fig. 4
Fig. 4
Feature Importance Ranked Using Mutual Information Scores.
Fig. 5
Fig. 5
Architecture of the Proposed Stacked Machine Learning Model.
Fig. 6
Fig. 6
Workflow Diagram of the Machine Learning Process Implemented.
Fig. 7
Fig. 7
Confusion Matrices for Random Forest Classifier Using (a) Grid Search, (b) Randomized Search, and (c) Bayesian Optimization.
Fig. 8
Fig. 8
AUC Curves for Random Forest Classifier Using (a) Grid Search, (b) Randomized Search, and (c) Bayesian Optimization.
Fig. 9
Fig. 9
Precision-Recall Curves for Random Forest Classifier Using (a) Grid Search, (b) Randomized Search, and (c) Bayesian Optimization.
Fig. 10
Fig. 10
Accuracy Trend Over Training Epochs for the Artificial Neural Network Model.
Fig. 11
Fig. 11
Loss Trend Over Training Epochs for the Artificial Neural Network Model.
Fig. 12
Fig. 12
SHAP Mean Bar Plots Illustrating Model Interpretation for (a) Grid Search, (b) Randomized Search, and (c) Bayesian Optimization.
Fig. 13
Fig. 13
SHAP Beeswarm Plots for Model Interpretation Across (a) Grid Search, (b) Randomized Search, and (c) Bayesian Optimization.
Fig. 14
Fig. 14
LIME-Based Feature Importance Visualizations for Models Trained Using (a) Grid Search, (b) Randomized Search, and (c) Bayesian Optimization.
Fig. 15
Fig. 15
QGraphs Depicting Important Predictive Markers Identified by Models Using (a) Grid Search, (b) Randomized Search, and (c) Bayesian Optimization.

Similar articles

References

    1. Vásconez-González, J. et al. Effects of smoking marijuana on the respiratory system: a systematic review. Subst. Abus.44 (3), 249–260 (2023). - PubMed
    1. Elisia, I. et al. The effect of smoking on chronic inflammation, immune function and blood cell composition. Sci. Rep.10 (1), 19480 (2020). - PMC - PubMed
    1. Giulietti, F. et al. Pharmacological approach to smoking cessation: an updated review for daily clinical practice. High. Blood Press. Cardiovasc. Prev.27 (5), 349–362 (2020). - PMC - PubMed
    1. Jiang, C., Chen, Q. & Xie M. Smoking increases the risk of infectious diseases: A narrative review. Tob. Induc. Dis. 18(July):60. (2020) 10.18332/tid/123845 - PMC - PubMed
    1. Kamruzzaman, M., Hossain, A. & Kabir, E. Smoker’s characteristics, general health and their perception of smoking in the social environment: A study of smokers in Rajshahi city, Bangladesh. J. Public. Health1, 1–2 (2021). - PMC - PubMed

LinkOut - more resources