Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Sep 28;23(1):351.
doi: 10.1186/s12933-024-02439-0.

Construction of machine learning diagnostic models for cardiovascular pan-disease based on blood routine and biochemical detection data

Affiliations

Construction of machine learning diagnostic models for cardiovascular pan-disease based on blood routine and biochemical detection data

Zhicheng Wang et al. Cardiovasc Diabetol. .

Abstract

Background: Cardiovascular disease, also known as circulation system disease, remains the leading cause of morbidity and mortality worldwide. Traditional methods for diagnosing cardiovascular disease are often expensive and time-consuming. So the purpose of this study is to construct machine learning models for the diagnosis of cardiovascular diseases using easily accessible blood routine and biochemical detection data and explore the unique hematologic features of cardiovascular diseases, including some metabolic indicators.

Methods: After the data preprocessing, 25,794 healthy people and 32,822 circulation system disease patients with the blood routine and biochemical detection data were utilized for our study. We selected logistic regression, random forest, support vector machine, eXtreme Gradient Boosting (XGBoost), and deep neural network to construct models. Finally, the SHAP algorithm was used to interpret models.

Results: The circulation system disease prediction model constructed by XGBoost possessed the best performance (AUC: 0.9921 (0.9911-0.9930); Acc: 0.9618 (0.9588-0.9645); Sn: 0.9690 (0.9655-0.9723); Sp: 0.9526 (0.9477-0.9572); PPV: 0.9631 (0.9592-0.9668); NPV: 0.9600 (0.9556-0.9644); MCC: 0.9224 (0.9165-0.9279); F1 score: 0.9661 (0.9634-0.9686)). Most models of distinguishing various circulation system diseases also had good performance, the model performance of distinguishing dilated cardiomyopathy from other circulation system diseases was the best (AUC: 0.9267 (0.8663-0.9752)). The model interpretation by the SHAP algorithm indicated features from biochemical detection made major contributions to predicting circulation system disease, such as potassium (K), total protein (TP), albumin (ALB), and indirect bilirubin (NBIL). But for models of distinguishing various circulation system diseases, we found that red blood cell count (RBC), K, direct bilirubin (DBIL), and glucose (GLU) were the top 4 features subdividing various circulation system diseases.

Conclusions: The present study constructed multiple models using 50 features from the blood routine and biochemical detection data for the diagnosis of various circulation system diseases. At the same time, the unique hematologic features of various circulation system diseases, including some metabolic-related indicators, were also explored. This cost-effective work will benefit more people and help diagnose and prevent circulation system diseases.

Keywords: Biochemical detection; Blood routine; Cardiovascular disease; Circulation system disease; Machine learning; Metabolic indicator.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
The flow chart of this study
Fig. 2
Fig. 2
Construction of circulation system disease prediction model using clinical blood samples. (A) The AUC of 69 circulation system disease prediction models. ROC curves of five machine learning methods using different data. (B) Blood routine combined with biochemical detection. (C) Blood routine. (D) Biochemical detection
Fig. 3
Fig. 3
The AUC of 69 models distinguishes a kind of circulation system disease from others
Fig. 4
Fig. 4
The top 20 features for the circulation system disease prediction model using different data. (A) Blood routine. (B) Biochemical detection. (C) Blood routine combined with biochemical detection. The red represents a high value, and the blue represents a low value. If the SHAP value is positive, it represents the positive effect of the feature on the model, and vice versa. All features are listed in order of importance from top to bottom. (D) The joyplot of numerical distributions of K, TP, ALB, and NBIL among various circulation system diseases and healthy people
Fig. 5
Fig. 5
Analysis of specific indicators for differentiation between different circulation system diseases. (A) The heatmap displays SHAP values of 50 features for each disease differentiation model. The positive SHAP value is added to the absolute value of the negative SHAP value to form the final SHAP value to be displayed. (B) The network shows the intersection top 10 features among different disease differentiation models. The red circles represent various circulation system diseases, and the blue circles represent various features. The larger the blue circle, the more the intersection features. (C) The joyplot of numerical distributions of RBC, K, DBIL, and GLU among various circulation system diseases and healthy people

Similar articles

Cited by

References

    1. Cheng X, Manandhar I, Aryal S, et al. Application of artificial intelligence in cardiovascular medicine. Compr Physiol. 2021;11(4):2455–66. - PubMed
    1. Roth GA, Mensah GA, Johnson CO, et al. Global burden of cardiovascular diseases and risk factors, 1990–2019: update from the GBD 2019 study. J Am Coll Cardiol. 2020;76(25):2982–3021. - PMC - PubMed
    1. Lindstrom M, DeCleene N, Dorsey H, et al. Global burden of cardiovascular diseases and risks collaboration, 1990–2021. J Am Coll Cardiol. 2022;80(25):2372–425. - PubMed
    1. The W. Report on cardiovascular health and diseases in China 2022: an updated summary. Biomed Environ Sci. 2023;36(8):669–701. - PubMed
    1. Leening MJ, Siregar S, Vaartjes I, et al. Heart disease in the Netherlands: a quantitative update. Neth Heart J. 2014;22(1):3–10. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources