Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Mar 23;18(6):3317.
doi: 10.3390/ijerph18063317.

Prediction of Type 2 Diabetes Based on Machine Learning Algorithm

Affiliations

Prediction of Type 2 Diabetes Based on Machine Learning Algorithm

Henock M Deberneh et al. Int J Environ Res Public Health. .

Abstract

Prediction of type 2 diabetes (T2D) occurrence allows a person at risk to take actions that can prevent onset or delay the progression of the disease. In this study, we developed a machine learning (ML) model to predict T2D occurrence in the following year (Y + 1) using variables in the current year (Y). The dataset for this study was collected at a private medical institute as electronic health records from 2013 to 2018. To construct the prediction model, key features were first selected using ANOVA tests, chi-squared tests, and recursive feature elimination methods. The resultant features were fasting plasma glucose (FPG), HbA1c, triglycerides, BMI, gamma-GTP, age, uric acid, sex, smoking, drinking, physical activity, and family history. We then employed logistic regression, random forest, support vector machine, XGBoost, and ensemble machine learning algorithms based on these variables to predict the outcome as normal (non-diabetic), prediabetes, or diabetes. Based on the experimental results, the performance of the prediction model proved to be reasonably good at forecasting the occurrence of T2D in the Korean population. The model can provide clinicians and patients with valuable predictive information on the likelihood of developing T2D. The cross-validation (CV) results showed that the ensemble models had a superior performance to that of the single models. The CV performance of the prediction models was improved by incorporating more medical history from the dataset.

Keywords: machine learning; prediction; type 2 diabetes.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Feature selection procedure.
Figure 2
Figure 2
Feature importance ranking (FPG = fasting plasma glucose, HbA1c = hemoglobin A1c, BMI = body mass index, gamma-GTP = gamma glutamyl transpeptidase).
Figure 3
Figure 3
The architecture of the prediction model (RF = random forest, XGB = XGBoost, SVM = support vector machine).
Figure 4
Figure 4
Box plot for the CV score of the prediction models (LR = logistic regression, RF = random forest, XGB = XGBoost, SVM = support vector machine, ST = stacking classifier, CIM = confusion matrix-based classifier integration approach): (a) accuracy, (b) precision, (c) recall, (d) F1-score.
Figure 5
Figure 5
Accuracy comparison using a different number of years for training data (RF = random forest, XGB = XGBoost, SVM = support vector machine, Avg. = average).
Figure 6
Figure 6
Accuracy comparison between the selected 12-feature set and the traditional predictors (5-feature set) using a different number of years for training data.

References

    1. WHO Diabetes. [(accessed on 20 May 2020)]; Available online: https://www.who.int/news-room/fact-sheets/detail/diabetes.
    1. Shaw J., Sicree R., Zimmet P. Global estimates of the prevalence of diabetes for 2010 and 2030. Diabetes Res. Clin. Pract. 2010;87:4–14. doi: 10.1016/j.diabres.2009.10.007. - DOI - PubMed
    1. Zou Q., Qu K., Luo Y., Yin D., Ju Y., Tang H. Predicting diabetes mellitus with machine learning techniques. Front. Genet. 2018;9:515. doi: 10.3389/fgene.2018.00515. - DOI - PMC - PubMed
    1. Won J.C., Lee J.H., Kim J.H., Kang E.S., Won K.C., Kim D.J., Lee M.-K. Diabetes fact sheet in Korea, 2016: An appraisal of current status. Diabetes Metab. J. 2018;42:415–424. doi: 10.4093/dmj.2018.0017. - DOI - PMC - PubMed
    1. Choi S.B., Kim W.J., Yoo T.K., Park J.S., Chung J.W., Lee Y.-H., Kang E.S., Kim D.W. Screening for prediabetes using machine learning models. Comput. Math. Methods Med. 2014;2014:1–8. doi: 10.1155/2014/618976. - DOI - PMC - PubMed

LinkOut - more resources