Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Feb 20;15(1):6132.
doi: 10.1038/s41598-025-90530-1.

An extensive experimental analysis for heart disease prediction using artificial intelligence techniques

Affiliations

An extensive experimental analysis for heart disease prediction using artificial intelligence techniques

D Rohan et al. Sci Rep. .

Abstract

The heart is an important organ that plays a crucial role in maintaining life. Unfortunately, heart disease is one of the major causes of mortality globally. Early and accurate detection can significantly improve the situation by enabling preventive measures and personalized healthcare recommendations. Artificial intelligence is emerging as a powerful tool for healthcare applications, particularly in predicting heart diseases. Researchers are actively working on this, but challenges remain in achieving accurate heart disease prediction. Therefore, experimenting with various models to identify the most effective one for heart disease prediction is crucial. In this view, this paper addresses this need by conducting an extensive investigation of various models. The proposed research considered 11 feature selection techniques and 21 classifiers for the experiment. The feature selection techniques considered for the research are Information Gain, Chi-Square Test, Fisher Discriminant Analysis (FDA), Variance Threshold, Mean Absolute Difference (MAD), Dispersion Ratio, Relief, LASSO, Random Forest Importance, Linear Discriminant Analysis (LDA), and Principal Component Analysis (PCA). The classifiers considered for the research are Logistic Regression, Decision Tree, Random Forest, K-Nearest Neighbors (KNN), Support Vector Machine (SVM), Gaussian Naïve Bayes (GNB), XGBoost, AdaBoost, Stochastic Gradient Descent (SGD), Gradient Boosting Classifier, Extra Tree Classifier, CatBoost, LightGBM, Multilayer Perceptron (MLP), Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), Bidirectional LSTM (BiLSTM), Bidirectional GRU (BiGRU), Convolutional Neural Network (CNN), and Hybrid Model (CNN, RNN, LSTM, GRU, BiLSTM, BiGRU). Among all the extensive experiments, XGBoost outperformed all others, achieving an accuracy of 0.97, precision of 0.97, sensitivity of 0.98, specificity of 0.98, F1 score of 0.98, and AUC of 0.98.

Keywords: Artificial intelligence; Deep learning; Feature selection; Heart disease prediction; Machine learning; Performance metrics; XGBoost.

PubMed Disclaimer

Conflict of interest statement

Declarations. Competing Interests: The authors declare no Competing interests.

Figures

Fig. 1
Fig. 1
Execution flow of the experiment.
Algorithm 1
Algorithm 1
Pseudocode for calculating information gain
Algorithm 2
Algorithm 2
Pseudocode for Chi-square test
Algorithm 3
Algorithm 3
Pseudocode for FDA
Algorithm 4
Algorithm 4
Pseudocode for the MAD
Algorithm 5
Algorithm 5
Pseudocode for the DR
Algorithm 6
Algorithm 6
Pseudocode for the Relief
Algorithm 7
Algorithm 7
Pseudocode for the Random Forest Importance
Algorithm 8
Algorithm 8
Pseudocode for RF
Algorithm 9
Algorithm 9
Pseudocode for XGBoost
Algorithm 10
Algorithm 10
Pseudocode for AdaBoost
Algorithm 11
Algorithm 11
Pseudocode for SGD
Algorithm 12
Algorithm 12
Pseudocode for GB
Algorithm 13
Algorithm 13
Pseudocode for ETC
Algorithm 14
Algorithm 14
Pseudocode for CatBoost
Algorithm 15
Algorithm 15
Pseudocode for LightGBM
Fig. 2
Fig. 2
Model architecture of MLP.
Algorithm 16
Algorithm 16
Pseudocode for MLP
Fig. 3
Fig. 3
Structure of RNN.
Fig. 4
Fig. 4
Model architecture of RNN.
Fig. 5
Fig. 5
Structure of LSTM.
Fig. 6
Fig. 6
Model architecture of LSTM.
Fig. 7
Fig. 7
Structure of GRU.
Fig. 8
Fig. 8
Model architecture of GRU.
Fig. 9
Fig. 9
Structure of Bi-LSTM.
Fig. 10
Fig. 10
Model architecture of Bi-LSTM.
Fig. 11
Fig. 11
Structure of Bi-GRU.
Fig. 12
Fig. 12
Model architecture of Bi-GRU.
Fig. 13
Fig. 13
Model architecture of CNN.
Fig. 14
Fig. 14
Model architecture of hybrid model.
Fig. 15
Fig. 15
Feature importance of information gain.
Fig. 16
Fig. 16
Chi-square test results of the features.
Fig. 17
Fig. 17
Fisher’s scores.
Fig. 18
Fig. 18
Features selected by variance threshold.
Fig. 19
Fig. 19
Mean absolute difference of the features.
Fig. 20
Fig. 20
Dispersion ratio of the features.
Fig. 21
Fig. 21
Relief feature scores.
Fig. 22
Fig. 22
Feature scores of Lasso regularization.
Fig. 23
Fig. 23
Random forest importance Feature scores.
Fig. 24
Fig. 24
Feature importance.
Fig. 25
Fig. 25
Explained variance ratio.
Fig. 26
Fig. 26
LR without feature selection (a) Confusion matrix (b) ROC Curve.
Fig. 27
Fig. 27
DT without feature selection (a) Confusion matrix (b) ROC Curve.
Fig. 28
Fig. 28
RF without feature selection (a) Confusion matrix (b) ROC Curve.
Fig. 29
Fig. 29
KNN without feature selection (a) Confusion matrix (b) ROC Curve.
Fig. 30
Fig. 30
SVM without feature selection (a) Confusion matrix (b) ROC Curve.
Fig. 31
Fig. 31
GNB without feature selection (a) Confusion matrix (b) ROC Curve.
Fig. 32
Fig. 32
XGBoost without feature selection (a) Confusion matrix (b) ROC Curve.
Fig. 33
Fig. 33
AdaBoost without feature selection (a) Confusion matrix (b) ROC Curve.
Fig. 34
Fig. 34
SGD without feature selection (a) Confusion matrix (b) ROC Curve.
Fig. 35
Fig. 35
GB without feature selection (a) Confusion matrix (b) ROC Curve.
Fig. 36
Fig. 36
etc without feature selection (a) Confusion matrix (b) ROC Curve.
Fig. 37
Fig. 37
CatBoost without feature selection (a) Confusion matrix (b) ROC Curve.
Fig. 38
Fig. 38
LightGBM without feature selection (a) Confusion matrix (b) ROC Curve.
Fig. 39
Fig. 39
MLP without feature selection (a) Confusion matrix (b) ROC Curve.
Fig. 40
Fig. 40
RNN without feature selection (a) Confusion matrix (b) ROC Curve.
Fig. 41
Fig. 41
LSTM without feature selection (a) Confusion matrix (b) ROC Curve.
Fig. 42
Fig. 42
LSTM without feature selection (a) Confusion matrix (b) ROC Curve.
Fig. 43
Fig. 43
Bi-LSTM without feature selection (a) Confusion matrix (b) ROC Curve.
Fig. 44
Fig. 44
Bi-GRU without feature selection (a) Confusion matrix (b) ROC Curve.
Fig. 45
Fig. 45
CNN without feature selection (a) Confusion matrix (b) ROC Curve.
Fig. 46
Fig. 46
Hybrid model without feature selection (a) Confusion matrix (b) ROC Curve.

Similar articles

Cited by

References

    1. World Heart Report 2023: Confronting the World’s Number One Killer. geneva, switzerland. world heart federation. 2023. https://world-heart-federation.org/resource/world-heart-report-2023/. Accessed: 17-06-2024.
    1. Sanjeev, S., Balne, C. C. S., Reddy, T. J. & Reddy, G. Deep learning-based mixed data approach for COVID-19 detection. In 2021 IEEE 18th India Council International Conference (INDICON), 1–6, 10.1109/INDICON52576.2021.9691563 (IEEE).
    1. Reddy, G. P., Kumar, Y. V. P., & Explainable, A. I. (XAI) Explained. In,. IEEE Open Conference of Electrical. Electronic and Information Sciences (eStream)1–6, 10.1109/eStream59056.2023.10134984 (IEEE) (2023).
    1. Reddy, G. P. & Pavan Kumar, Y. V. A beginner’s guide to federated learning. In 2023 Intelligent Methods, Systems, and Applications (IMSA), 557–562, 10.1109/IMSA58542.2023.10217383 (IEEE).
    1. Ali, L. et al. An automated diagnostic system for heart disease prediction based on chi-square statistical model and optimally configured deep neural network. IEEE Access7, 34938–34945, 10.1109/ACCESS.2019.2904800.

LinkOut - more resources