Explainable artificial intelligence driven insights into smoking prediction using machine learning and clinical parameters

doi:10.1038/s41598-025-09409-w

. 2025 Jul 5;15(1):24069.

doi: 10.1038/s41598-025-09409-w.

Explainable artificial intelligence driven insights into smoking prediction using machine learning and clinical parameters

S Aishwarya¹, P C Siddalingaswamy², Krishnaraj Chadaga³

Affiliations

¹ Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India.
² Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India. pcs.swamy@manipal.edu.
³ Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India. krishnaraj.chadaga@manipal.edu.

PMID: 40617930
PMCID: PMC12228788
DOI: 10.1038/s41598-025-09409-w

Explainable artificial intelligence driven insights into smoking prediction using machine learning and clinical parameters

S Aishwarya et al. Sci Rep. 2025.

. 2025 Jul 5;15(1):24069.

doi: 10.1038/s41598-025-09409-w.

Authors

S Aishwarya¹, P C Siddalingaswamy², Krishnaraj Chadaga³

Affiliations

¹ Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India.
² Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India. pcs.swamy@manipal.edu.
³ Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India. krishnaraj.chadaga@manipal.edu.

PMID: 40617930
PMCID: PMC12228788
DOI: 10.1038/s41598-025-09409-w

Abstract

Smoking is a leading cause of various health conditions, including cancer and respiratory diseases. Smokers often face medical restrictions such as limitations in blood and organ donation, reduced effectiveness of medications, and increased surgical complications. These impacts underscore the need for early detection of smoking status to enable timely intervention. This study explores the use of Artificial Intelligence (AI) and Machine Learning (ML) techniques to predict smoking status based on health parameters, including biosignals and clinical biomarkers. A balanced subset of 2,000 instances was sampled from a publicly available Kaggle dataset comprising clinical and biometric features. Multiple ML models were implemented, including Random Forest Classifier, Logistic Regression, Decision Tree Classifier, K-Nearest Neighbors, CatBoost Classifier, and an Artificial Neural Network. The Random Forest Classifier achieved the better performance with an accuracy of 0.80, precision of 0.80, recall of 0.80, and F1-score of 0.79. To enhance model interpretability, four Explainable Artificial Intelligence (XAI) techniques were applied: Shapley Additive Explanations (SHAP), Local Interpretable Model-Agnostic Explanations (LIME), QLattice, and Anchor. SHAP identified hemoglobin as the most influential predictor, while LIME, QLattice, and Anchor highlighted the role of gamma-glutamyl transferase (t). Interactions between hemoglobin, GTP, and height were associated with more accurate predictions. The integration of ensemble modeling and multiple XAI approaches offers deeper interpretability than prior studies, providing healthcare providers and policymakers with a robust, transparent decision-support tool for targeted intervention strategies.

Keywords: Artificial intelligence; Health parameters; Machine learning; Smokers detection; XAI.

PubMed Disclaimer

Conflict of interest statement

Declarations. Competing interests: The authors declare no competing interests.

Figures

**Fig. 1**
Health effects attributed to smoking.

**Fig. 2**
Violin Plots Showing Distribution of (a) Age, (b) Systolic Blood Pressure, (c) Diastolic Blood Pressure, and (d) Hemoglobin Levels.

**Fig. 3**
Multiple Bar Charts Depicting (a) Left Ear Hearing Status, (a) Right Ear Hearing Status, (c) Urine Protein Presence, and (d) Incidence of Dental Caries.

**Fig. 4**
Feature Importance Ranked Using Mutual Information Scores.

**Fig. 5**
Architecture of the Proposed Stacked Machine Learning Model.

**Fig. 6**
Workflow Diagram of the Machine Learning Process Implemented.

**Fig. 7**
Confusion Matrices for Random Forest Classifier Using (a) Grid Search, (b) Randomized Search, and (c) Bayesian Optimization.

**Fig. 8**
AUC Curves for Random Forest Classifier Using (a) Grid Search, (b) Randomized Search, and (c) Bayesian Optimization.

**Fig. 9**
Precision-Recall Curves for Random Forest Classifier Using (a) Grid Search, (b) Randomized Search, and (c) Bayesian Optimization.

**Fig. 10**
Accuracy Trend Over Training Epochs for the Artificial Neural Network Model.

**Fig. 11**
Loss Trend Over Training Epochs for the Artificial Neural Network Model.

**Fig. 12**
SHAP Mean Bar Plots Illustrating Model Interpretation for (a) Grid Search, (b) Randomized Search, and (c) Bayesian Optimization.

**Fig. 13**
SHAP Beeswarm Plots for Model Interpretation Across (a) Grid Search, (b) Randomized Search, and (c) Bayesian Optimization.

**Fig. 14**
LIME-Based Feature Importance Visualizations for Models Trained Using (a) Grid Search, (b) Randomized Search, and (c) Bayesian Optimization.

**Fig. 15**
QGraphs Depicting Important Predictive Markers Identified by Models Using (a) Grid Search, (b) Randomized Search, and (c) Bayesian Optimization.

See this image and copyright information in PMC

References

1. Vásconez-González, J. et al. Effects of smoking marijuana on the respiratory system: a systematic review. Subst. Abus.44 (3), 249–260 (2023). - PubMed
1. Elisia, I. et al. The effect of smoking on chronic inflammation, immune function and blood cell composition. Sci. Rep.10 (1), 19480 (2020). - PMC - PubMed
1. Giulietti, F. et al. Pharmacological approach to smoking cessation: an updated review for daily clinical practice. High. Blood Press. Cardiovasc. Prev.27 (5), 349–362 (2020). - PMC - PubMed
1. Jiang, C., Chen, Q. & Xie M. Smoking increases the risk of infectious diseases: A narrative review. Tob. Induc. Dis. 18(July):60. (2020) 10.18332/tid/123845 - PMC - PubMed
1. Kamruzzaman, M., Hossain, A. & Kabir, E. Smoker’s characteristics, general health and their perception of smoking in the social environment: A study of smokers in Rajshahi city, Bangladesh. J. Public. Health1, 1–2 (2021). - PMC - PubMed

LinkOut - more resources

Full Text Sources
- Nature Publishing Group
- PubMed Central
Research Materials
- NCI CPTC Antibody Characterization Program

[1] Vásconez-González, J. et al. Effects of smoking marijuana on the respiratory system: a systematic review. Subst. Abus.44 (3), 249–260 (2023). - PubMed

[2] Vásconez-González, J. et al. Effects of smoking marijuana on the respiratory system: a systematic review. Subst. Abus.44 (3), 249–260 (2023). - PubMed

[3] Elisia, I. et al. The effect of smoking on chronic inflammation, immune function and blood cell composition. Sci. Rep.10 (1), 19480 (2020). - PMC - PubMed

[4] Elisia, I. et al. The effect of smoking on chronic inflammation, immune function and blood cell composition. Sci. Rep.10 (1), 19480 (2020). - PMC - PubMed

[5] Giulietti, F. et al. Pharmacological approach to smoking cessation: an updated review for daily clinical practice. High. Blood Press. Cardiovasc. Prev.27 (5), 349–362 (2020). - PMC - PubMed

[6] Giulietti, F. et al. Pharmacological approach to smoking cessation: an updated review for daily clinical practice. High. Blood Press. Cardiovasc. Prev.27 (5), 349–362 (2020). - PMC - PubMed

[7] Jiang, C., Chen, Q. & Xie M. Smoking increases the risk of infectious diseases: A narrative review. Tob. Induc. Dis. 18(July):60. (2020) 10.18332/tid/123845 - PMC - PubMed

[8] Jiang, C., Chen, Q. & Xie M. Smoking increases the risk of infectious diseases: A narrative review. Tob. Induc. Dis. 18(July):60. (2020) 10.18332/tid/123845 - PMC - PubMed

[9] Kamruzzaman, M., Hossain, A. & Kabir, E. Smoker’s characteristics, general health and their perception of smoking in the social environment: A study of smokers in Rajshahi city, Bangladesh. J. Public. Health1, 1–2 (2021). - PMC - PubMed

[10] Kamruzzaman, M., Hossain, A. & Kabir, E. Smoker’s characteristics, general health and their perception of smoking in the social environment: A study of smokers in Rajshahi city, Bangladesh. J. Public. Health1, 1–2 (2021). - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Explainable artificial intelligence driven insights into smoking prediction using machine learning and clinical parameters

Affiliations

Explainable artificial intelligence driven insights into smoking prediction using machine learning and clinical parameters

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

References

LinkOut - more resources

Full Text Sources

Research Materials

Abstract

Conflict of interest statement

Figures

Similar articles

References

Related information

LinkOut - more resources

Full Text Sources

Research Materials