Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Dec 23;16(1):1696.
doi: 10.1038/s41598-025-31234-4.

An ethnic-sensitive hybrid framework for T2D prediction with explainable AI and weighted ensembles

Affiliations

An ethnic-sensitive hybrid framework for T2D prediction with explainable AI and weighted ensembles

Karlo Abnoosian et al. Sci Rep. .

Abstract

Type 2 diabetes (T2D) is a growing global health crisis, affecting over 537 million people as of 2021. Early prediction remains particularly challenging in low- and middle-income countries due to missing data, class imbalance, and population-specific risk factors. This study presents a four-stage predictive framework- Feature-Weighted Class-Adaptive Generative Imputation Network-Weighted Classifier Aggregation Ensemble (FW-CAGIN-WCAE)-designed to address these limitations. First, Zero-Threshold Feature Removal (ZTFR) is applied to eliminate low-quality variables. Second, missing values are imputed FW-CAGIN, a novel class-aware and feature-weighted GAN model that accounts for both class and feature importance. Third, a performance-weighted ensemble of 15 machine and deep learning algorithms is constructed. Finally, SHAP analysis is used to uncover population-specific risk indicators. The proposed method was evaluated on three benchmark datasets-PIDD, FHGDD, and BDD-and their combinations, using nested five-fold cross-validation. The model achieved a peak AUC of 0.936 ± 0.018 in PIDD-BDD combination and reduced the imputation mean absolute error (MAE) from 0.8028 to 0.0033. It also lowered AUC variability by 36.3% and improved the diagnostic odds ratio (DOR) to 68.4 ± 20.5. SHAP analysis identified as a key predictive feature across both Asian and European populations. These findings demonstrate that the proposed framework offers an accurate, interpretable, and population-sensitive solution for early T2D detection, especially in resource-limited healthcare settings.

Keywords: Ensemble learning; FW-CAGIN imputation; Feature selection; Population-specific risk; SHAP analysis; Type 2 diabetes prediction.

PubMed Disclaimer

Conflict of interest statement

Declarations. Ethics approval and consent to participate: Not applicable. Consent for publication: Not applicable. Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Geographical distribution of diabetes prevalence across global regions, highlighting the highest cases in the Western Pacific (206 million) and the lowest in Africa (24 million) based on the IDF Diabetes Atlas.
Fig. 2
Fig. 2
Global diabetes prevalence and impact in 2021. Data from the IDF Diabetes Atlas.
Fig. 3
Fig. 3
Overview of the FW-CAGIN-WCAE Pipeline for Robust T2D Prediction, Integrating ZTFR, FW-CAGIN, and WCAE.
Fig. 4
Fig. 4
PIDD health trends (blue: no-diabetes; orange: diabetes).
Fig. 5
Fig. 5
FHGDD health trends (blue: no-diabetes; orange: diabetes).
Fig. 6
Fig. 6
BDD health trends (blue: no-diabetes; orange: diabetes).
Fig. 7
Fig. 7
Class distribution: (a) PIDD (768 samples; 65.1% non-diabetic, blue; 34.9% diabetic, red); (b) FHGDD (2,000 samples; 65.8% non-diabetic, blue; 34.2% diabetic, red); (c) BDD (465 samples; 80.0% non-diabetic, blue; 20.0% diabetic, red).
Algorithm 1
Algorithm 1
Standard GAN
Algorithm 2
Algorithm 2
Standard GAN
Algorithm 3
Algorithm 3
FW-CAGIN
Fig. 8
Fig. 8
Nested 5-fold cross-validation for model evaluation and hyperparameter optimization.
Fig. 9
Fig. 9
FW-CAGIN-WCAE pipeline. Input data passes through ZTFR filtering, GAN-based imputation, preprocessing, AUC-weighted ensemble, and SHAP explanation to produce final T2D prediction.
Fig. 10
Fig. 10
Horizontal bar charts showing AUC of top models for datasets PIDD (a) to PIDD-BDD-FHGDD (f), with dark green (highest AUC) to light green (lowest AUC) colors and AUC values labeled.
Fig. 11
Fig. 11
Horizontal bar charts showing AUC of WCAE models for datasets PIDD (a) to PIDD-BDD-FHGDD (f), color-coded from dark green (highest AUC) to light green (lowest AUC), with labeled AUC values.
Fig. 12
Fig. 12
ROC Curves for WCAE Models Across Datasets PIDD (a) to PIDD-BDD-FHGDD (f), with Labeled AUC Values.
Fig. 13
Fig. 13
Ethnicity-Specific SHAP Analysis with Population-Based Clinical Risk Thresholds. (a) PIDD (South Asian): Glucose (SHAP = 0.1369) → BMI ≥ 30 kg/m2 (2.5 × risk) → DPF > 0.6 (3.1 × risk). (b) FHGDD (East Asian): Glucose (SHAP ≈ 0.11) → BMI ≥ 28 kg/m2 (1.9 × risk) → Age peak 47.5 y (1.8 × risk) → DPF > 0.5 (2.3 × risk).Red lines: Population-specific risk thresholds (ORs in Table 20). SHAP values reflect feature contribution to T2D prediction, with higher BMI and DPF thresholds in PIDD indicating ethnicity-specific risk profiles.
Fig. 14
Fig. 14
Bar chart comparing AUC for T2D prediction on PIDD (2016–2025). Colors range from dark purple (highest AUC) to light purple (lowest AUC). See Table 20 for details.

References

    1. Forray, A.-I. et al. The global burden of disease: A focus on type II diabetes. In Handbook of Public Health Nutrition: International, National, and Regional Perspectives 1–25 (Springer, 2025).
    1. Genitsaridi, I., et al., Idf Diabetes Atlas: Global, Regional and National Diabetes Prevalence Estimates for 2024 and Projections for 2050. - PubMed
    1. Reed, J., S. Bain, and V. Kanamarlapudi, A review of current trends with type 2 diabetes epidemiology, aetiology, pathogenesis, treatments and future perspectives. Diabetes, Metabolic Syndrome and Obesity, 2021: p. 3567–3602. - PMC - PubMed
    1. Akter, K. et al. Diabetes mellitus and Alzheimer’s disease: shared pathology and treatment?. Br. J. Clin. Pharmacol.71(3), 365–376 (2011). - DOI - PMC - PubMed
    1. Althobaiti, T., Althobaiti, S. & Selim, M. M. An optimized diabetes mellitus detection model for improved prediction of accuracy and clinical decision-making. Alex. Eng. J.94, 311–324 (2024). - DOI

LinkOut - more resources