Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jan 9;20(1):e0312914.
doi: 10.1371/journal.pone.0312914. eCollection 2025.

Enhancing stroke disease classification through machine learning models via a novel voting system by feature selection techniques

Affiliations

Enhancing stroke disease classification through machine learning models via a novel voting system by feature selection techniques

Mahade Hasan et al. PLoS One. .

Retraction in

Abstract

Heart disease remains a leading cause of mortality and morbidity worldwide, necessitating the development of accurate and reliable predictive models to facilitate early detection and intervention. While state of the art work has focused on various machine learning approaches for predicting heart disease, but they could not able to achieve remarkable accuracy. In response to this need, we applied nine machine learning algorithms XGBoost, logistic regression, decision tree, random forest, k-nearest neighbors (KNN), support vector machine (SVM), gaussian naïve bayes (NB gaussian), adaptive boosting, and linear regression to predict heart disease based on a range of physiological indicators. Our approach involved feature selection techniques to identify the most relevant predictors, aimed at refining the models to enhance both performance and interpretability. The models were trained, incorporating processes such as grid search hyperparameter tuning, and cross-validation to minimize overfitting. Additionally, we have developed a novel voting system with feature selection techniques to advance heart disease classification. Furthermore, we have evaluated the models using key performance metrics including accuracy, precision, recall, F1-score, and the area under the receiver operating characteristic curve (ROC AUC). Among the models, XGBoost demonstrated exceptional performance, achieving 99% accuracy, precision, F1-Score, 98% recall, and 100% ROC AUC. This study offers a promising approach to early heart disease diagnosis and preventive healthcare.

PubMed Disclaimer

Conflict of interest statement

NO authors have competing interests.

Figures

Fig 1
Fig 1. System architecture of this study.
Fig 2
Fig 2. Histogram of attributes.
Fig 3
Fig 3. Feature selection procedure.
Fig 4
Fig 4. Illustration of all features correlation.
Fig 5
Fig 5. Graphical representation of all models confusion matrix.
Fig 6
Fig 6. The ROC curve for the experiment.
Fig 7
Fig 7
(a) XGBoost Model performance on Cross Validation with Grid, Cross Validation without Grid Search, Without Cross validation and Grid Search; (b) RF Model performance on Cross Validation with Grid, Cross Validation without Grid Search, Without Cross validation and Grid Search; (c) KNN Model performance on Cross Validation with Grid, Cross Validation without Grid Search, Without Cross validation and Grid Search; (d) SVM Model performance on Cross Validation with Grid, Cross Validation without Grid Search, Without Cross validation and Grid Search; (e) ABR Model performance on Cross Validation with Grid, Cross Validation without Grid Search, Without Cross validation and Grid Search; (f) NB-Gaussian Model performance on Cross Validation with Grid, Cross Validation without Grid Search, Without Cross validation and Grid Search; (g) LR Model performance on Cross Validation with Grid, Cross Validation without Grid Search, Without Cross validation and Grid Search; (h) Linear Regression Model performance on Cross Validation with Grid, Cross Validation without Grid Search, Without Cross validation and Grid Search; (i) DT Model performance on Cross Validation with Grid, Cross Validation without Grid Search, Without Cross validation and Grid Search.
Fig 8
Fig 8. The illustration of proposed clinical application.

Similar articles

Cited by

References

    1. “Stroke: What It Is, Causes, Symptoms, Treatment & Types,” Cleveland Clinic. Accessed: Sep. 04, 2023. [Online]. Available: https://my.clevelandclinic.org/health/diseases/5601-stroke.
    1. “Cardiovascular diseases.” Accessed: Sep. 04, 2023. [Online]. Available: https://www.who.int/health-topics/cardiovascular-diseases.
    1. Greenfield D. M. and Snowden J. A., “Cardiovascular Diseases and Metabolic Syndrome,” in The EBMT Handbook: Hematopoietic Stem Cell Transplantation and Cellular Therapies, 7th ed., Carreras E., Dufour C., Mohty M., and Kröger N., Eds., Cham (CH): Springer, 2019. Accessed: Sep. 04, 2023. [Online]. Available: http://www.ncbi.nlm.nih.gov/books/NBK554003/. - PubMed
    1. Chen A. H., Huang S. Y., Hong P. S., Cheng C. H., and Lin E. J., “HDPS: Heart disease prediction system,” in 2011 Computing in Cardiology, Sep. 2011, pp. 557–560. Accessed: Dec. 21, 2023. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/6164626/references#references.
    1. Kaur M., Sakhare S. R., Wanjale K., and Akter F., “Early Stroke Prediction Methods for Prevention of Strokes,” Behavioural Neurology, vol. 2022, p. e7725597, Apr. 2022, doi: 10.1155/2022/7725597 - DOI - PMC - PubMed

Publication types