Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Oct 1;10(19):e38731.
doi: 10.1016/j.heliyon.2024.e38731. eCollection 2024 Oct 15.

Early heart disease prediction using feature engineering and machine learning algorithms

Affiliations

Early heart disease prediction using feature engineering and machine learning algorithms

Mohammed Amine Bouqentar et al. Heliyon. .

Abstract

Heart disease is one of the most widespread global health issues, it is the reason behind around 32 % of deaths worldwide every year. The early prediction and diagnosis of heart diseases are critical for effective treatment and sickness management. Despite the efforts of healthcare professionals, cardiovascular surgeons and cardiologists' misdiagnosis and misinterpretation of test results may happen every day. This study addresses the growing global health challenge raised by Cardiovascular Diseases (CVDs), which account for 32 % of all deaths worldwide, according to the World Health Organization (WHO). With the progress of Machine Learning (ML) and Deep Learning (DL) techniques as part of Artificial Intelligence (AI), these technologies have become crucial for predicting and diagnosing CVDs. This research aims to develop an ML system for the early prediction of cardiovascular diseases by choosing one of the powerful existing ML algorithms after a deep comparative analysis of several. To achieve this work, the Cleveland and Statlog heart datasets from international platforms are used in this study to evaluate and validate the system's performance. The Cleveland dataset is categorized and used to train various ML algorithms, including decision tree, random forest, support vector machine, logistic regression, adaptive boosting, and K-nearest neighbors. The performance of each algorithm is assessed based on accuracy, precision, recall, F1 score, and the Area Under the Curve metrics. Hyperparameter tuning approaches have been employed to find the best hyperparameters that reflect the optimal performance of the used algorithms based on different evaluation approaches including 10-fold cross-validation with a 95 % confidence interval. The study's findings highlight the potential of ML in improving the early prediction and diagnosis of cardiovascular diseases. By comparing and analyzing the performance of the applied algorithms on both the Cleveland and Statlog heart datasets, this research contributes to the advancement of ML techniques in the medical field. The developed ML system offers a valuable tool for healthcare professionals in the early prediction and diagnosis of cardiovascular diseases, with implications for the prediction and diagnosis of other diseases as well.

Keywords: Artificial intelligence; Cardiovascular diseases; Classification; Deep learning; Machine learning; Prediction.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Figures

Fig. 1
Fig. 1
Layout of the proposed system.
Fig. 2
Fig. 2
ML Pipeline process.
Fig. 3
Fig. 3
Flowchart of the medical predictive diagnosis using six ML algorithms.
Fig. 4
Fig. 4
box plots for Cleveland database.
Fig. 5
Fig. 5
Features histogram matrix. (A) Age. (B) Sex. (C) Chest pain. (D) Resting blood pressure. (E) Cholesterol. (F) Diabetes. (G) Electrocardiographic results. (H) Heart rate. (I) Angina. (J) ST depression. (K) Slope. (L) Number of major vessels. (M) Thallium heart scan.
Fig. 6
Fig. 6
Distribution of instances according to features using count plot. (A) Sex attribute. (B) Chest pain attribute. (C) Diabetes attribute. (D) Electrocardiographic results attribute. (E) The Angina attribute. (F) Slope attribute. (G) The number of major vessels attribute. (H) The Thallium heart scan attribute.
Fig. 7
Fig. 7
Continuous features with target variable using histogram and KDE. (A) Age attribute. (B) Resting blood pressure attribute. (C) Cholesterol attribute. (D) Heart rate attribute. (E) ST depression attribute.
Fig. 8
Fig. 8
Comparison between Grid and Random search with nine trials.
Fig. 9
Fig. 9
Graph illustrating K and MSE values.
Fig. 10
Fig. 10
The classification provided by the studied models using six ML algorithms. (A) LR algorithm. (B) Decision Tree algorithm. (C) KNN algorithm. (D) RF algorithm. (E) AdaBoost algorithm. (F) SVM algorithm.
Fig. 11
Fig. 11
ROC curves results given by the studied models based on the six algorithms. (A) LR algorithm. (B) Decision Tree algorithm. (C) KNN algorithm. (D) RF algorithm. (E) AdaBoost algorithm. (F) SVM algorithm.
Fig. 12
Fig. 12
The key indicators of the confusion matrix of the proposed model using the Statlog heart dataset.

Similar articles

Cited by

References

    1. Denysyuk H.V., Pinto R.J., Silva P.M., Duarte R.P., Marinho F.A., Pimenta L., Gouveia A.J., Gonçalves N.J., Coelho P.J., Zdravevski E., Lameski P., Leithardt V., Garcia N.M., Pires I.M. Algorithms for automated diagnosis of cardiovascular diseases based on ECG data: a comprehensive systematic review. Heliyon. 2023;9 doi: 10.1016/j.heliyon.2023.e13601. - DOI - PMC - PubMed
    1. Collins C., Dennehy D., Conboy K., Mikalef P. Artificial intelligence in information systems research: a systematic literature review and research agenda. Int. J. Inf. Manag. 2021;60 doi: 10.1016/j.ijinfomgt.2021.102383. - DOI
    1. Shinde P.P., Shah S. 2018 Fourth Int. Conf. Comput. Commun. Control Autom. ICCUBEA. 2018. A review of machine learning and deep learning applications; pp. 1–6. - DOI
    1. Basak A., Schmidt K.M., Mengshoel O.J. From data to interpretable models: machine learning for soil moisture forecasting. Int. J. Data Sci. Anal. 2023;15:9–32. doi: 10.1007/s41060-022-00347-8. - DOI - PMC - PubMed
    1. Zhang S., Zhou H., Zhang L. Recent machine learning progress in image analysis and understanding. Adv. Multimed. 2018;2018:1–2. doi: 10.1155/2018/1685890. - DOI

LinkOut - more resources