Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Aug 1;6(1):16.
doi: 10.1186/s42492-023-00143-6.

Hyperparameter optimization for cardiovascular disease data-driven prognostic system

Affiliations

Hyperparameter optimization for cardiovascular disease data-driven prognostic system

Jayson Saputra et al. Vis Comput Ind Biomed Art. .

Abstract

Prediction and diagnosis of cardiovascular diseases (CVDs) based, among other things, on medical examinations and patient symptoms are the biggest challenges in medicine. About 17.9 million people die from CVDs annually, accounting for 31% of all deaths worldwide. With a timely prognosis and thorough consideration of the patient's medical history and lifestyle, it is possible to predict CVDs and take preventive measures to eliminate or control this life-threatening disease. In this study, we used various patient datasets from a major hospital in the United States as prognostic factors for CVD. The data was obtained by monitoring a total of 918 patients whose criteria for adults were 28-77 years old. In this study, we present a data mining modeling approach to analyze the performance, classification accuracy and number of clusters on Cardiovascular Disease Prognostic datasets in unsupervised machine learning (ML) using the Orange data mining software. Various techniques are then used to classify the model parameters, such as k-nearest neighbors, support vector machine, random forest, artificial neural network (ANN), naïve bayes, logistic regression, stochastic gradient descent (SGD), and AdaBoost. To determine the number of clusters, various unsupervised ML clustering methods were used, such as k-means, hierarchical, and density-based spatial clustering of applications with noise clustering. The results showed that the best model performance analysis and classification accuracy were SGD and ANN, both of which had a high score of 0.900 on Cardiovascular Disease Prognostic datasets. Based on the results of most clustering methods, such as k-means and hierarchical clustering, Cardiovascular Disease Prognostic datasets can be divided into two clusters. The prognostic accuracy of CVD depends on the accuracy of the proposed model in determining the diagnostic model. The more accurate the model, the better it can predict which patients are at risk for CVD.

Keywords: Cardiovascular disease; Data mining; Data-driven analytics; Hyperparameter optimization; Orange data mining software; Prognostic system; Unsupervised machine learning.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing financial interests or personal relationships that may have influenced the work reported in this study.

Figures

Fig. 1
Fig. 1
CVD mortality rates are expected to increase significantly by 2020 [3]
Fig. 2
Fig. 2
Trends in CVD mortality for men and women in United States from 1980 to 2019 [3]
Fig. 3
Fig. 3
Research flowchart
Fig. 4
Fig. 4
Classification techniques used to classify Cardiovascular Disease Prognostic datasets
Fig. 5
Fig. 5
Test and score results
Fig. 6
Fig. 6
Classification accuracy matrix
Fig. 7
Fig. 7
Calibration plot based on classification accuracy CVD prognostic. a target = 0; b target = 1
Fig. 8
Fig. 8
F1 score matrix
Fig. 9
Fig. 9
Precision matrix
Fig. 10
Fig. 10
Recall matrix
Fig. 11
Fig. 11
Calibration plot based on F1 score CVD prognostic. a target = 0; b target = 1
Fig. 12
Fig. 12
Calibration plot based on precision and recall CVD prognostic. a target = 0; b target = 1
Fig. 13
Fig. 13
Performance curve analysis CVD prognostic. a target = 0; b target = 1
Fig. 14
Fig. 14
Clustering techniques are used to analyze Cardiovascular Disease Prognostic datasets
Fig. 15
Fig. 15
K-means clustering
Fig. 16
Fig. 16
K-means clustering silhouette scores
Fig. 17
Fig. 17
Scatter plot of k-means cluster analysis. a K-means clustering scatter plot correlations between cholesterol level and maximum HR attributes; b K-means clustering scatter plot correlations between cholesterol level and resting blood pressure attributes; c K-means clustering scatter plot correlations between maximum HR and resting blood pressure attributes
Fig. 18
Fig. 18
Distances
Fig. 19
Fig. 19
Hierarchical clustering. a Hierarchical clustering based on the cholesterol level attribute; b Hierarchical clustering based on the maximum HR attribute; c Hierarchical clustering based on the resting blood pressure attribute
Fig. 20
Fig. 20
Hierarchical clustering silhouette scores
Fig. 21
Fig. 21
Hierarchical cluster scatter diagram that illustrates the correlation between cholesterol level, maximum HR, and resting blood pressure. a Hierarchical clustering scatter plot correlations between cholesterol level and maximum HR attributes; b Hierarchical clustering scatter plot correlations between cholesterol level and resting blood pressure attributes; c Hierarchical clustering scatter plot correlations between maximum HR and resting blood pressure attributes
Fig. 22
Fig. 22
DBSCAN clustering
Fig. 23
Fig. 23
DBSCAN clustering silhouette scores
Fig. 24
Fig. 24
The scatter plot of DBSCAN cluster that demonstrates the correlation among cholesterol level, maximum HR, and resting blood pressure. a DBSCAN clustering scatter plot correlations between cholesterol level and maximum HR attributes; b DBSCAN clustering scatter plot correlations between cholesterol level and resting blood pressure attributes; c DBSCAN clustering scatter plot correlations between maximum HR and resting blood pressure attributes

References

    1. Nanehkaran YA, Licai Z, Chen JD, Jamel AAM, Shengnan Z, Navaei YD, et al. Anomaly detection in heart disease using a density-based unsupervised approach. Wireless Commun Mobile Comput. 2022;2022:6913043. doi: 10.1155/2022/6913043. - DOI
    1. Shorewala V (2021) Early detection of coronary heart disease using ensemble techniques. Inf Med Unlocked 26:100655. 10.1016/j.imu.2021.100655
    1. Tsao CW, Aday AW, Almarzooq ZI, Alonso A, Beaton AZ, Bittencourt MS, et al. Heart disease and stroke statistics - 2022 update: a report from the American heart association. Circulation. 2022;145(8):e153–e639. doi: 10.1161/CIR.0000000000001052. - DOI - PubMed
    1. Zhao Y, Wood EP, Mirin N, Cook SH, Chunara R. Social determinants in machine learning cardiovascular disease prediction models: a systematic review. Am J Prev Med. 2021;61(4):596–605. doi: 10.1016/j.amepre.2021.04.016. - DOI - PubMed
    1. Şahin B, İlgün G. Risk factors of deaths related to cardiovascular diseases in World Health Organization (WHO) member countries. Health Soc Care Community. 2022;30(1):73–80. doi: 10.1111/hsc.13156. - DOI - PubMed

LinkOut - more resources