Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jul 30:12:1640017.
doi: 10.3389/fmed.2025.1640017. eCollection 2025.

Data-driven cluster analysis on the association of aging, obesity and insulin resistance with new-onset diabetes in Chinese adults: a multicenter retrospective cohort study

Affiliations

Data-driven cluster analysis on the association of aging, obesity and insulin resistance with new-onset diabetes in Chinese adults: a multicenter retrospective cohort study

Yazhi Wang et al. Front Med (Lausanne). .

Abstract

Background: Type 2 diabetes mellitus (T2DM) is an endocrine and metabolic disorder that can lead to multi-organ damage and dysfunction, imposing significant financial burden on national healthcare systems. Currently, the early identification of high-risk individuals and the prevention of T2DM remain major challenges for clinicians. This study aimed to use easily obtainable clinical indicators to perform cluster analysis on healthy individuals, in order to accurately identify high-risk population requiring early intervention.

Methods: This study was a multicenter retrospective cohort study with a median follow-up period of 3 years. A total of 12,607 Chinese adult individuals without diabetes at baseline were included. The K-means clustering algorithm was applied to five standardized indicators: age, body mass index (BMI), fasting blood glucose (FBG), triglycerides (TG), and HDL-C (high-density lipoprotein cholesterol). After clustering, multivariate Cox proportional hazards regression analysis was used to evaluate and compare the risk of diabetes incidence among different clusters.

Results: The study population comprising 12,607 subjects was clustered into four distinct groups: Cluster 1 (metabolic health cluster), Cluster 2 (low HDL-C cluster), Cluster 3 (old age and mild metabolic disorder cluster), and Cluster 4 (severe obesity and insulin resistance cluster). The proportional distributions of each cluster were 37.95, 29.99, 24.95, and 7.11%, respectively. The clinical characteristics and diabetes incidence risks varied significantly among the four clusters. Cluster 4 exhibited the highest diabetes incidence rate, followed by Cluster 3, Cluster 2, and Cluster 1. In all models adjusted for covariates, the diabetes incidence rates in Cluster 3 and Cluster 4 were significantly higher than those in Cluster 1 and Cluster 2. However, no significant difference was observed between Cluster 3 and Cluster 4.

Conclusion: Cluster-based analyses can effectively identify individuals at high risk of diabetes in the normal population. These high-risk groups (clusters 3 and 4) are often associated with aging, obesity, and insulin resistance (IR), necessitating early and targeted interventions.

Keywords: aging; cluster analysis; insulin resistance; obesity; type 2 diabetes mellitus.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

Flowchart detailing participant inclusion and exclusion in a study. Initially, 685,277 Chinese participants aged 20 or older were considered. Exclusions reduced the sample due to missing data, extreme BMI, or insufficient visit intervals, among other reasons, resulting in 211,833 original participants. Further exclusions for incomplete records yielded 12,607 for the final study group. After a mean follow-up of three years, 251 individuals developed diabetes mellitus (DM).
FIGURE 1
Flowchart of the study population.
A pie chart and five violin plots display data for four clusters. The pie chart shows percentages for each cluster: Cluster 1 (37.95%), Cluster 2 (29.99%), Cluster 3 (24.95%), and Cluster 4 (7.11%). The violin plots represent data distributions for age, body mass index (BMI), fasting blood glucose (FBG), triglycerides (TG), and high-density lipoprotein cholesterol (HDL-C) across the clusters. Each parameter has its own graph (B to F). Cluster 1 is shown in green, Cluster 2 in blue, Cluster 3 in purple, and Cluster 4 in red.
FIGURE 2
Distribution and clinical features of clusters. (A) Proportional distribution of 12,607 participants. (B–F) Characteristics of each cluster regarding age, BMI, FBG, TG, and HDL-C. Cluster 1: Metabolic health cluster; Cluster 2: Low HDL-C cluster; Cluster 3: Old age and mild metabolic disorder cluster; Cluster 4: Severe obesity and insulin resistance cluster.
Five radar charts depicting different clusters of health parameters. Charts A to D represent Clusters 1 to 4, each outlined in unique colors: teal, blue, purple, and red. Chart E combines all four clusters with the adjusted cohort mean in black. Parameters include FPG, TC, non-HDL, LDL-C, and more, with percentage scales from zero to one hundred percent. Legend indicates cluster colors and the cohort mean.
FIGURE 3
Profile of the four clusters in the cohort study. (A–D) Individual distributions of metabolic components in cluster 1, cluster 2, cluster 3 and cluster 4. (E) Combined distribution of metabolic components in clusters 1–4. Cluster 1: Metabolic health cluster; Cluster 2: Low HDL-C cluster; Cluster 3: Old age and mild metabolic disorder cluster; Cluster 4: Severe obesity and insulin resistance cluster. Radar plots were drawn for each cluster by using z-values which were calculated by adjusting the cluster mean for each variable to the cohort mean and SD for each variable. We then compared the radar plots visually and describe the particular characteristics of each cluster.
Cumulative hazard plot showing four clusters over follow-up time in years. Cluster four (red) exhibits a sharp increase after five years. Clusters one (green), two (blue), and three (purple) remain relatively low. The log-rank test has a p-value less than 0.0001.
FIGURE 4
Kaplan-Meier estimated the cumulative hazard of new-onset DM risk among four clusters. Cluster 1: Metabolic health cluster; Cluster 2: Low HDL-C cluster; Cluster 3: Old age and mild metabolic disorder cluster; Cluster 4: Severe obesity and insulin resistance cluster.

Similar articles

References

    1. Pearson E. Type 2 diabetes: a multifaceted disease. Diabetologia. (2019) 62:1107–12. 10.1007/s00125-019-4909-y - DOI - PMC - PubMed
    1. Sun H, Saeedi P, Karuranga S, Pinkepank M, Ogurtsova K, Duncan B, et al. IDF diabetes atlas: global, regional and country-level diabetes prevalence estimates for 2021 and projections for 2045. Diabetes Res Clin Pract. (2022) 183:109119. 10.1016/j.diabres.2021.109119 - DOI - PMC - PubMed
    1. Ali M, Pearson-Stuttard J, Selvin E, Gregg E. Interpreting global trends in type 2 diabetes complications and mortality. Diabetologia. (2022) 65:3–13. 10.1007/s00125-021-05585-2 - DOI - PMC - PubMed
    1. McCarthy M. Painting a new picture of personalised medicine for diabetes. Diabetologia. (2017) 60:793–9. 10.1007/s00125-017-4210-x - DOI - PMC - PubMed
    1. Kurgan N, Kjaergaard Larsen J, Deshmukh A. Harnessing the power of proteomics in precision diabetes medicine. Diabetologia. (2024) 67:783–97. 10.1007/s00125-024-06097-5 - DOI - PubMed

LinkOut - more resources