Two Machine-learning Hybrid Models for Predicting Type 2 Diabetes Mellitus
- PMID: 40351779
- PMCID: PMC12063970
- DOI: 10.4103/jmss.jmss_29_24
Two Machine-learning Hybrid Models for Predicting Type 2 Diabetes Mellitus
Abstract
Background: The global increase in diabetes prevalence necessitates advanced diagnostic methods. Machine learning has shown promise in disease diagnosis, including diabetes.
Materials and methods: We used a dataset collected from the Medical City Hospital laboratory and the Specialized Center for Endocrinology and Diabetes at Al-Kindy Teaching Hospital in Iraq. This dataset includes 1000 physical examination samples from both male and female patients. The samples are categorized into three classes: diabetic (Y), nondiabetic (N), and predicted diabetic (P). The dataset contains twelve attributes and includes outlier data. Outliers in medical studies can result from unusual disease attributes. Therefore, consulting with a specialist physician to identify and handle these outliers using statistical methods is necessary. The main contribution of this study is the proposal of two hybrid models for diabetes diagnosis in two scenarios: (1) Scenario 1 (presence of outlier data): Hybrid Model 1 combines the K-medoids clustering algorithm with a Gaussian naive Bayes (GNB) classifier based on kernel density estimation (KDE) to handle outliers and (2) Scenario 2 (after removing outlier data): Hybrid Model 2 combines the K-means clustering algorithm with a GNB classifier based on KDE with suitable bandwidth. We performed principal component analysis to minimize dimensionality and evaluated the models using fivefold cross-validation.
Results: All experiments were conducted in identical settings. Our proposed hybrid models demonstrated superior performance in two scenarios, handling and rejecting outliers, compared to other machine-learning models in this study, including support vector machines (with radial-based, polynomial, linear, and sigmoid kernel functions), decision trees (J48), and GNB classifiers for diabetes prediction. The average accuracy for Scenario 1 with Hybrid Model 1 was 0.9743, and for Scenario 2 with Hybrid Model 2, it was 0.9867. We also evaluated precision, sensitivity, and F1-score as performance metrics.
Conclusion: This study presents two hybrid models for diabetes diagnosis, demonstrating high accuracy in distinguishing between diabetic and nondiabetic patients and effectively handling outliers. The findings highlight the potential of machine-learning techniques for improving the early diagnosis and treatment of diabetes.
Keywords: Decision tree; Gaussian naive Bayes; K-means; K-medoids; diabetes mellitus prediction; kernel density estimation; support vector machine.
Copyright: © 2025 Journal of Medical Signals & Sensors.
Conflict of interest statement
There are no conflicts of interest.
Figures












References
-
- Żuchnik M, Rybkowska A, Szczuraszek P, Szczuraszek H, Bętkowska P, Radulski J, et al. Type 2 diabetes-factors of occurrence and its complications. Qual Sport. 2023;10:32–40.
-
- Beljić ZT. Prediabetes: From diagnosis to prognosis. Galenika Med J. 2022;1:57–61.
-
- Wiesmann UN, DiDonato S, Herschkowitz NN. Effect of chloroquine on cultured fibroblasts: Release of lysosomal hydrolases and inhibition of their uptake. Biochem Biophys Res Commun. 1975;66:1338–43. - PubMed
-
- Kant R, Davis A, Verma V. Maturity-onset diabetes of the young: Rapid evidence review. Am Fam Physician. 2022;105:162–7. - PubMed