Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Feb 27:25:e42181.
doi: 10.2196/42181.

Machine Learning for Predicting Micro- and Macrovascular Complications in Individuals With Prediabetes or Diabetes: Retrospective Cohort Study

Affiliations

Machine Learning for Predicting Micro- and Macrovascular Complications in Individuals With Prediabetes or Diabetes: Retrospective Cohort Study

Simon Schallmoser et al. J Med Internet Res. .

Abstract

Background: Micro- and macrovascular complications are a major burden for individuals with diabetes and can already arise in a prediabetic state. To allocate effective treatments and to possibly prevent these complications, identification of those at risk is essential.

Objective: This study aimed to build machine learning (ML) models that predict the risk of developing a micro- or macrovascular complication in individuals with prediabetes or diabetes.

Methods: In this study, we used electronic health records from Israel that contain information about demographics, biomarkers, medications, and disease codes; span from 2003 to 2013; and were queried to identify individuals with prediabetes or diabetes in 2008. Subsequently, we aimed to predict which of these individuals developed a micro- or macrovascular complication within the next 5 years. We included 3 microvascular complications: retinopathy, nephropathy, and neuropathy. In addition, we considered 3 macrovascular complications: peripheral vascular disease (PVD), cerebrovascular disease (CeVD), and cardiovascular disease (CVD). Complications were identified via disease codes, and, for nephropathy, the estimated glomerular filtration rate and albuminuria were considered additionally. Inclusion criteria were complete information on age and sex and on disease codes (or measurements of estimated glomerular filtration rate and albuminuria for nephropathy) until 2013 to account for patient dropout. Exclusion criteria for predicting a complication were diagnosis of this specific complication before or in 2008. In total, 105 predictors from demographics, biomarkers, medications, and disease codes were used to build the ML models. We compared 2 ML models: logistic regression and gradient-boosted decision trees (GBDTs). To explain the predictions of the GBDTs, we calculated Shapley additive explanations values.

Results: Overall, 13,904 and 4259 individuals with prediabetes and diabetes, respectively, were identified in our underlying data set. For individuals with prediabetes, the areas under the receiver operating characteristic curve for logistic regression and GBDTs were, respectively, 0.657 and 0.681 (retinopathy), 0.807 and 0.815 (nephropathy), 0.727 and 0.706 (neuropathy), 0.730 and 0.727 (PVD), 0.687 and 0.693 (CeVD), and 0.707 and 0.705 (CVD); for individuals with diabetes, the areas under the receiver operating characteristic curve were, respectively, 0.673 and 0.726 (retinopathy), 0.763 and 0.775 (nephropathy), 0.745 and 0.771 (neuropathy), 0.698 and 0.715 (PVD), 0.651 and 0.646 (CeVD), and 0.686 and 0.680 (CVD). Overall, the prediction performance is comparable for logistic regression and GBDTs. The Shapley additive explanations values showed that increased levels of blood glucose, glycated hemoglobin, and serum creatinine are risk factors for microvascular complications. Age and hypertension were associated with an elevated risk for macrovascular complications.

Conclusions: Our ML models allow for an identification of individuals with prediabetes or diabetes who are at increased risk of developing micro- or macrovascular complications. The prediction performance varied across complications and target populations but was in an acceptable range for most prediction tasks.

Keywords: diabetes; machine learning; macrovascular complications; microvascular complications; prediabetes.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: None declared.

Figures

Figure 1
Figure 1
Flowchart of the inclusion criteria. CeVD: cerebrovascular disease; CVD: cardiovascular disease; ICD-9: International Classification of Diseases, Ninth Revision; PVD: peripheral vascular disease.
Figure 2
Figure 2
Performance of the logistic regression and the gradient boosted decision trees (GBDTs) for predicting micro- and macrovascular complications in (A) individuals with prediabetes or (B) diabetes. We report the mean of the area under the receiver operating characteristic curve (AUROC) across the 5 different test sets. The error bars denote SD. CeVD: cerebrovascular disease; CVD: cardiovascular disease; PVD: peripheral vascular disease.
Figure 3
Figure 3
(A) SHAP plots for individuals with prediabetes. (B) SHAP plots for individuals with diabetes. For the SHAP plots, the ranking of the predictors is based on their importance listed in descending order. Each dot represents 1 individual, and its position on the x axis denotes its SHAP value. Elements with a positive (negative) SHAP value pull the prediction toward an increased (decreased) risk of developing a complication. The color of each dot is a representation of the corresponding predictor value, where red indicates a high, blue a low, and gray a missing value. BB: beta blocker; BUN: blood urea nitrogen; CCB: calcium channel blocker; CeVD: cerebrovascular disease; CPK: creatine phosphokinase; CVD: cardiovascular disease; HbA1c: glycated hemoglobin; HDL: high-density lipoprotein; ICD-9 719: other and unspecified disorders of joint; ICD-9 786: symptoms involving respiratory system and other chest symptoms; ICD-9 401: essential hypertension; LDH: lactate dehydrogenase; LDL: low-density lipoprotein; MCH: mean corpuscular hemoglobin; PVD: peripheral vascular disease; SBP: systolic blood pressure; SCr: serum creatinine; SHAP: Shapley additive explanations; UACR: albumin to creatinine ratio in urine; UCr: creatinine in urine.

References

    1. Nathan DM. Long-term complications of diabetes mellitus. N Engl J Med. 1993 Jun 10;328(23):1676–85. doi: 10.1056/NEJM199306103282306. - DOI - PubMed
    1. Haffner SM, Lehto S, Rönnemaa T, Pyörälä K, Laakso M. Mortality from coronary heart disease in subjects with type 2 diabetes and in nondiabetic subjects with and without prior myocardial infarction. N Engl J Med. 1998 Jul 23;339(4):229–34. doi: 10.1056/NEJM199807233390404. - DOI - PubMed
    1. Deshpande AD, Harris-Hayes M, Schootman M. Epidemiology of diabetes and diabetes-related complications. Phys Ther. 2008 Nov;88(11):1254–64. doi: 10.2522/ptj.20080020. https://europepmc.org/abstract/MED/18801858 ptj.20080020 - DOI - PMC - PubMed
    1. Fong DS, Aiello L, Gardner TW, King GL, Blankenship G, Cavallerano JD, Ferris FL, Klein R, American Diabetes Association Retinopathy in diabetes. Diabetes Care. 2004 Jan;27 Suppl 1:S84–7. doi: 10.2337/diacare.27.2007.s84. - DOI - PubMed
    1. Fowler M. Microvascular and macrovascular complications of diabetes. Clin Diabetes. 2011;29(3):116–22. doi: 10.2337/diaclin.29.3.116. - DOI