Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 May 12;15(1):16393.
doi: 10.1038/s41598-025-01161-5.

Application of interpretable machine learning algorithms to predict macroangiopathy risk in Chinese patients with type 2 diabetes mellitus

Affiliations

Application of interpretable machine learning algorithms to predict macroangiopathy risk in Chinese patients with type 2 diabetes mellitus

Ningjie Zhang et al. Sci Rep. .

Abstract

Macrovascular complications are leading causes of morbidity and mortality in patients with type 2 diabetes mellitus (T2DM), yet early diagnosis of cardiovascular disease (CVD) in this population remains clinically challenging. This study aims to develop a machine learning model that can accurately predict diabetic macroangiopathy in Chinese patients. A retrospective cross-sectional analytical study was conducted on 1566 hospitalized patients with T2DM. Feature selection was performed using recursive feature elimination (RFE) within the mlr3 framework. Model performance was benchmarked using 29 machine learning (ML) models, with the ranger model selected for its superior performance. Hyperparameters were optimized through grid search and 5-fold cross-validation. Model interpretability was enhanced using SHAP values and PDPs. An external validation set of 106 patients was used to test the model. Key predictive variables identified included the duration of T2DM, age, fibrinogen, and serum urea nitrogen. The predictive model for macroangiopathy was established and showed good discrimination performance with an accuracy of 0.716 and an AUC of 0.777 in the training set. Validation on the external dataset confirmed its robustness with an AUC of 0.745. This study establish an approach based on machine learning algorithm in features selection and the development of prediction tools for diabetic macroangiopathy.

Keywords: Machine learning methods; Macroangiopathy; Prediction model; Risk factor; T2DM.

PubMed Disclaimer

Conflict of interest statement

Declarations. Competing interests: The authors declare no competing interests. Ethics approval and consent to participate: This study was approved by the Institutional Review Board of the First Affiliated Hospital of Zhengzhou University. Written informed consent to participate was obtained from all participants.

Figures

Fig. 1
Fig. 1
Workflow of the ML model development and validation process.
Fig. 2
Fig. 2
Screening key variables using different RFE machine learning methods. (A) The number of features screened by gbm-RFE. (B) The number of features screened by ranger-RFE. (C) The number of features screened by SVM-RFE. (D) Venn diagram showing the overlap among the variables selected by the gbm-RFE, ranger-RFE and SVM-RFE. The intersection highlights four key variables consistently identified across all three models: duration of T2DM, age, fibrinogen, and serum urea nitrogen. These common variables were used to construct the final predictive model.
Fig. 3
Fig. 3
Illustrates the pairwise correlation analysis of key variables identified for macroangiopathy prediction. The color purple and the values ‘0’ indicating the absence of macroangiopathy; the color green and the values ‘1’ indicating presence of macroangiopathy. * means p-value < 0.05, ** means p-value < 0.01, *** means p-value < 0.001.
Fig. 4
Fig. 4
Performance evaluation of the optimal machine learning model for predicting macroangiopathy in T2DM patients. (A) Screening of the optimal benchmark ML model using the final four selected features. Box plots depict the distribution of the Area Under the Curve (AUC) scores across 29 machine learning models, with the optimal model identified based on median performance and variability. (B) ROC curve for the training set of the optimal model after hyperparameter tuning. (C) PRC for the training set of the optimal model after hyperparameter tuning. (D) Confusion matrix for the training set of the optimal model. The matrix presents counts of true positives, true negatives, false positives, and false negatives, along with detailed performance metrics including sensitivity, specificity, precision, recall, F1 score, accuracy, and kappa coefficient. (E) ROC curve for the external validation set of the optimal model. (F) PRC for the external validation set of the optimal model. (G) Confusion matrix for the external validation set of the optimal model.
Fig. 5
Fig. 5
Interpretation and influence of key features on the optimal model’s predictions. (A) Accumulated Local Effects (ALE) plots for the final four selected features. These plots demonstrate the influence of each feature on the optimal model’s predictions. The y-axis represents the effect size, while the x-axis shows the normalized value of each feature. (B) Permutation Feature Importance (PFI) plot showing the importance of the final four selected features in the optimal model. The x-axis represents the feature importance measured by the change in cross-entropy loss (ce), while the y-axis lists the features. (C) SHAP force plot for a single randomly selected sample, illustrating the contribution of each feature to the model’s prediction of macroangiopathy for that specific patient. The plot shows how higher feature values (red) and lower feature values (blue) impact the prediction, with the horizontal axis reflecting the SHAP value. (D) SHAP summary plot of the proposed model on the entire cohort. Each dot represents a single patient, with colors indicating the feature value (blue for lower values, red for higher values). The horizontal axis represents the SHAP value, indicating the direction and magnitude of the feature’s effect on the prediction. Positive SHAP values suggest a protective effect, while negative values indicate an increased risk of severe complications. “0” means without macroangiopathy, “1” means with macroangiopathy.
Fig. 6
Fig. 6
Partial dependence plots (PDP) illustrating the synergistic effects between various risk factors and macroangiopathy. (A) Interaction between age and duration of T2DM on the risk of macroangiopathy. (B) Interaction between fibrinogen levels and duration of T2DM. (C) Interaction between BUN and duration of T2DM. (D) Interaction between fibrinogen levels and age. (E) Interaction between BUN levels and age. (F) Interaction between BUN and fibrinogen levels. The color bar on the right indicates the predicted values of macroangiopathy, with the x-axis and y-axis representing log and Z score transformation values of each risk factor. The contour plots visually demonstrate the combined influence of these pivotal factors on the predicted values of macroangiopathy, highlighting areas of higher and lower risk.

Similar articles

Cited by

References

    1. Ogurtsova, K. et al. IDF diabetes atlas: global estimates of undiagnosed diabetes in adults for 2021. Diabetes Res. Clin. Pract.183, 109118 (2022). - PubMed
    1. Wang, L. et al. Prevalence and treatment of diabetes in China, 2013–2018. JAMA326 (24), 2498–2506 (2021). - PMC - PubMed
    1. Wu, C. Z. et al. Epidemiologic relationship between periodontitis and type 2 diabetes mellitus. BMC Oral Health. 20 (1), 204 (2020). - PMC - PubMed
    1. Bahardoust, M. et al. Medication time of Metformin and sulfonylureas and incidence of cardiovascular diseases and mortality in type 2 diabetes: a pooled cohort analysis. Sci. Rep.15 (1), 8401 (2025). - PMC - PubMed
    1. Bahardoust, M. et al. Effect of Metformin (vs. Placebo or sulfonylurea) on all-cause and cardiovascular mortality and incident cardiovascular events in patients with diabetes: an umbrella review of systematic reviews with meta-analysis. J. Diabetes Metab. Disord. 23 (1), 27–38 (2023). - PMC - PubMed

Supplementary concepts

LinkOut - more resources