Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Dec 14;10(1-2):1-10.
doi: 10.1049/htl2.12039. eCollection 2023 Feb-Apr.

Diabetes prediction using machine learning and explainable AI techniques

Affiliations

Diabetes prediction using machine learning and explainable AI techniques

Isfafuzzaman Tasin et al. Healthc Technol Lett. .

Abstract

Globally, diabetes affects 537 million people, making it the deadliest and the most common non-communicable disease. Many factors can cause a person to get affected by diabetes, like excessive body weight, abnormal cholesterol level, family history, physical inactivity, bad food habit etc. Increased urination is one of the most common symptoms of this disease. People with diabetes for a long time can get several complications like heart disorder, kidney disease, nerve damage, diabetic retinopathy etc. But its risk can be reduced if it is predicted early. In this paper, an automatic diabetes prediction system has been developed using a private dataset of female patients in Bangladesh and various machine learning techniques. The authors used the Pima Indian diabetes dataset and collected additional samples from 203 individuals from a local textile factory in Bangladesh. Feature selection algorithm mutual information has been applied in this work. A semi-supervised model with extreme gradient boosting has been utilized to predict the insulin features of the private dataset. SMOTE and ADASYN approaches have been employed to manage the class imbalance problem. The authors used machine learning classification methods, that is, decision tree, SVM, Random Forest, Logistic Regression, KNN, and various ensemble techniques, to determine which algorithm produces the best prediction results. After training on and testing all the classification models, the proposed system provided the best result in the XGBoost classifier with the ADASYN approach with 81% accuracy, 0.81 F1 coefficient and AUC of 0.84. Furthermore, the domain adaptation method has been implemented to demonstrate the versatility of the proposed system. The explainable AI approach with LIME and SHAP frameworks is implemented to understand how the model predicts the final results. Finally, a website framework and an Android smartphone application have been developed to input various features and predict diabetes instantaneously. The private dataset of female Bangladeshi patients and programming codes are available at the following link: https://github.com/tansin-nabil/Diabetes-Prediction-Using-Machine-Learning.

Keywords: AdaBoost; K‐nearest neighbour; android Application; decision tree; diabetes; random forest; support vector machine.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

FIGURE 1
FIGURE 1
Working sequences of the proposed diabetes prediction system
FIGURE 2
FIGURE 2
Percentage of people having diabetes in the Pima Indian dataset
FIGURE 3
FIGURE 3
Feature importance hierarchy
FIGURE 4
FIGURE 4
Working steps of predicting insulin of the RTML dataset
FIGURE 5
FIGURE 5
Development of the web application
FIGURE 6
FIGURE 6
Working sequences of the proposed android application development
FIGURE 7
FIGURE 7
Confusion matrix for XGBoost with ADASYN technique
FIGURE 8
FIGURE 8
ROC curve and AUC value for the XGBoost with ADASYN
FIGURE 9
FIGURE 9
Explainable AI interpretation of feature importance of XGBoost with ADASYN
FIGURE 10
FIGURE 10
LIME explainable AI prediction interpretation
FIGURE 11
FIGURE 11
Instantaneous diabetes prediction by the designed web application
FIGURE 12
FIGURE 12
Home screen of the proposed android application
FIGURE 13
FIGURE 13
Android application review ratings

Similar articles

Cited by

References

    1. Atlas, G. : Diabetes. International Diabetes Federation. 10th ed., IDF Diabetes Atlas.
    1. Akhtar, S. , et al.: Prevalence of diabetes and pre‐diabetes in Bangladesh: A systematic review and meta‐analysis. BMJ Open 10, e036086 (2020) - PMC - PubMed
    1. Prabhu, P. , Selvabharathi, S. : Deep belief neural network model for prediction of diabetes mellitus. In: International Conference on Imaging, Signal Processing and Communication, pp. 138–142 (2019)
    1. VijiyaKumar, K. , Lavanya, B. , Nirmala, I. , Caroline, S.S. : Random forest algorithm for the prediction of diabetes. In: International Conference on System, Computation, Automation and Networking, pp. 1–5 (2019)
    1. Mohan, N. , Jain, V. : Performance analysis of support vector machine in diabetes prediction. In: International Conference on Electronics, Communication and Aerospace Technology, pp. 1–3 (2020)