Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2026 Feb;32(2):660-670.
doi: 10.1038/s41591-025-04105-8. Epub 2026 Jan 14.

Circulating metabolites, genetics and lifestyle factors in relation to future risk of type 2 diabetes

Affiliations

Circulating metabolites, genetics and lifestyle factors in relation to future risk of type 2 diabetes

Jun Li et al. Nat Med. 2026 Feb.

Abstract

The human metabolome reflects complex metabolic states affected by genetic and environmental factors. However, metabolites associated with type 2 diabetes (T2D) risk and their determinants remain insufficiently characterized. Here we integrated blood metabolomic, genomic and lifestyle data from up to 23,634 initially T2D-free participants from ten cohorts. Of 469 metabolites examined, 235 were associated with incident T2D during up to 26 years of follow-up, including 67 associations not previously reported across bile acid, lipid, carnitine, urea cycle and arginine/proline, glycine and histidine pathways. Further genetic analyses linked these metabolites to signaling pathways and clinical traits central to T2D pathophysiology, including insulin resistance, glucose/insulin response, ectopic fat deposition, energy/lipid regulation and liver function. Lifestyle factors-particularly physical activity, obesity and diet-explained greater variations in T2D-associated versus non-associated metabolites, with specific metabolites revealed as potential mediators. Finally, a 44-metabolite signature improved T2D risk prediction beyond conventional factors. These findings provide a foundation for understanding T2D mechanisms and may inform precision prevention targeting specific metabolic pathways.

PubMed Disclaimer

Conflict of interest statement

Competing interests: S.S.R. is a consultant to Westat, the Administrative Coordinating Center for the NHLBI Trans-Omics for Precision Medicine (TOPMed) program. The other authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Study overview.
a, To identify blood metabolites associated with incident T2D, we analyzed 469 harmonized metabolites in up to 23,634 participants from ten prospective cohort studies. At baseline, participants were free of T2D and other chronic diseases; and blood metabolome was profiled using the metabolomic platforms at Broad Institute or Metabolon Inc. A metabolome-wide association study (MWAS) for incident T2D was conducted in each cohort; and results from the ten cohorts were combined using meta-analysis, identifying 235 metabolites associated with T2D risk. b, We curated meta-analyzed genome-wide association studies (GWASs) for each metabolite using data of up to 18,590 people from eight cohorts, followed by functional analyses, colocalization analyses and Mendelian randomization analyses. c, We conducted MWASs for major modifiable risk factors in up to 16,883 participants from five cohorts, identifying metabolites that potentially mediated the associations between risk factors and T2D risk. d, We used machine learning analyses to develop a metabolomic signature reflecting the complex metabolic states predictive of long-term T2D risk, which may facilitate the identification of high-risk individuals and precision prevention. Source data
Fig. 2
Fig. 2. Associations between 235 metabolites and incident T2D in meta-analysis of ten prospective cohorts.
Circular plots illustrate metabolites associated with incident T2D at FDR < 0.05, by biochemical category. a, Results for complex lipids including monoacylglycerols (MAG) and DAG, TAG, LP, PC, PE, other PLs, PL plasmalogens and sphingolipids (SG). b, Results for other metabolites, including amino acids, carbohydrates, bioenergetic metabolites, nucleotides (NTs), xenobiotics (XBs), as well as other lipid metabolites including carnitines, BAs, CEs and nonesterified fatty acids. Each bar represents results for one metabolite; red and blue indicate positive and inverse associations, respectively; color depth indicates association magnitude, that is, ln(RR) per s.d. increment in the metabolite, capped at −0.3 to 0.3; and bar height indicates association significance, capped at 10−20 in a and 10−15 in b. Analyses were conducted in each cohort by racial/ethnic groups adjusting for age, sex, smoking, alcohol consumption, fasting status, hypertension, dyslipidemia, lipid-lowering medication use, anti-hypertensive medication use, BMI, WHR, family history of T2D and cohort-specific variables, and results were combined using meta-analysis. Source data
Fig. 3
Fig. 3. Genetic determinants of T2D-associated metabolites.
We examined genetic architectures of T2D-associated metabolites based on genome-wide meta-analysis summary statistics. a, R2 explained by genetics comparing the T2D-associated metabolites versus other metabolites by biochemical category (Wilcoxon test, statistical significance defined as two-sided P < 0.00625, correcting for eight categories). b, Top enriched canonical pathways for genes mapped to mQTLs of T2D-associated metabolites, most of which were not enriched for genes mapped to mQTLs of non-associated metabolites. AS, atherosclerosis; CAR, constitutive androstane receptor; FXR, farnesoid X receptor; MetS, metabolic syndrome; PXR, pregnane X receptor; VLDL, very low-density lipoprotein. c, Percentages of metabolites showing nominally significant (P < 0.05) genetic correlations (rg) with traits reflecting T2D pathophysiology, comparing T2D-associated versus non-associated metabolites (two-sided chi-squared test). Barplot shows results for all metabolites (**FDR < 0.05, correcting for 22 traits); and heatmap shows percentage among T2D-associated metabolites by biochemical category (**FDR < 0.05 correcting for 121 comparisons; *P < 0.05). ALT, alanine aminotransferase; AST, aspartate aminotransferase; GGT, gamma-glutamyltransferase; HDLC, HDL cholesterol; LDLC, LDL cholesterol; TC, total cholesterol; TG, triglycerides. d, Proportions of metabolites colocalized (PPH4 > 0.8) with tissue-specific gene expression across 47 human tissues. We tested whether the proportions were higher among T2D-associated metabolites (colors: organ systems) versus non-associated metabolites (gray) using univariant logistic regression (**one-sided FDR < 0.05 correcting for 47 tissue types; *P < 0.05). e, For tissue types showing enriched genetic colocalizations with T2D-associated metabolites (seven tissue types with FDR < 0.05, plus the main metabolic organ liver with P < 0.05), we detailed the enrichment by biochemical category (color depth: proportions among T2D-associated versus non-associated metabolites; *one-sided P < 0.05). Source data
Fig. 4
Fig. 4. Variance of metabolites explained by modifiable risk factors.
a, Boxplots comparing variance explained by age, sex and modifiable risk factors (including smoking, PA and intakes of 15 main food groups), for T2D-associated metabolites versus non-associated metabolites. b, Boxplots showing several specific biochemical categories of metabolites that drove the differential R2. Each box shows the IQR, line in box indicates median and whiskers extend from the box to the smallest and largest value within 1.5 IQR from the lower and upper quartiles. Wilcoxon test was used to compare R2 of the T2D-associated versus that of other metabolites; **two-sided P < 0.0025 (Bonferroni correction for 20 examined factors); *two-sided P < 0.05. For each metabolite, we first fitted a linear regression to regress inverse normal transformed metabolite on age, sex, BMI (standardized), PA (METs hours per week; standardized), all 15 main food groups (red meat, processed meat, poultry, fish and seafood, egg, total dairy, total vegetables, total fruits, potato, nuts and legume, whole grain, refined grain, sugary drinks, coffee and tea and alcohol; servings per day), fasting status and other cohort-specific variables simultaneously. We then calculated R2 of the metabolites explained by each of the risk factors based on association coefficients and the variance of metabolite and risk factors. The analyses were conducted in NHS, NHS2, HPFS, SOL and WHI separately (n = 16,883) by main racial/ethnic groups and R2 were averaged for the comparison. Source data
Fig. 5
Fig. 5. Metabolites that potentially mediate associations between modifiable risk factors and T2D risk.
ac, Scatterplots compare the associations of metabolites with the risk factors BMI (a), PA (b) and coffee and/or tea consumption (c) versus their associations with T2D risk. Each dot represents a metabolite (colored: associated with the risk factor and incident T2D at FDR < 0.05 by biochemical category; dark gray: associated with incident T2D but not the risk factor; light gray: not associated with incident T2D); and the two trend lines are for T2D-associated (dark gray) and non-associated metabolites (light gray) separately. Association coefficients (betas) for risk factors are from MWASs in which all risk factors were mutually adjusted (including age, sex, BMI, PA, consumption of 15 main food groups, fasting status and other cohort-specific variables). For metabolites associated with a risk factor and incident T2D in an epidemiologically expected direction, we conducted mediation analysis testing the indirect effect (risk factor − T2D association via a metabolite). df, For metabolites whose indirect effects were in the same direction as the total effect, we present the distribution of proportion mediated (indirect effect/total effect) for BMI (d), PA (e) and coffee and/or tea consumption (f). All analyses were conducted separately in NHS, NHS2, HPFS, SOL and WHI (n up to 16,883 for individual metabolites) and results were combined using meta-analysis. g, For metabolites showing significant mediating effects between risk factors and incident T2D, we highlighted the top tissue types where these metabolites showed the most genetic colocalizations with tissue-specific gene expression, and the top clinical traits with which these metabolites have most genetic correlation. Source data
Fig. 6
Fig. 6. A multi-metabolite signature for T2D risk prediction.
a, AUC for T2D risk prediction in each cohort. Yellow: the model with metabolomic signature only, acquired using a leave-one cohort-out cross-validation approach to avoid overfitting (within WHI, the signature was acquired using a leave-one-out cross-validation); blue: the model with conventional risk factors including age, sex, smoking, BMI, dyslipidemia, hypertension, lipid-lowering medication use, anti-hypertensive medication use and family history of T2D; red: the model with conventional risk factors plus the metabolomic signature. For cohorts analyzed with Cox model, we plotted AUC estimated at the median follow-up time. We compared the AUC of the conventional plus metabolomic signature model to that of the conventional model; **two-sided P < 0.01, ^two-sided P < 0.1. b,c, Two examples of ROC curves and two-sided P values from WHI (b) and Black participants from ARIC (c). d, Crude incident rate of T2D by cohort, across deciles of the metabolomic signature, with a smooth trendline and 95% CI (gray band) from locally estimated scatterplot smoothing (LOESS). e, Relative risk ratio (points) and 95% CI (lines) for incident T2D, comparing participants in higher versus the lowest deciles of the metabolomic signature. Analyses were conducted separately in NHS, NHS2, HPFS, SOL, WHI, PREDIMED and Black and white participants from ARIC, separately, adjusting for age, sex, smoking, alcohol consumption, fasting status, hypertension, dyslipidemia, lipid-lowering medication use, anti-hypertensive medication use, BMI, WHR, family history of T2D and cohort-specific variables. We plotted relative risk ratios from the meta-analysis (n = 20,930). f, In multivariable analysis, BMI, red meat intake and sugary drink consumption (purple) were associated positively with the metabolomic signature, whereas PA, and intakes of coffee/tea, whole grains and wine (green), were associated inversely with the metabolomic signature (FDR < 0.05). A Sankey plot was used to demonstrate the associations between each of the 44 metabolites constituting the final metabolomic signature with these risk factors and with T2D risk (band-width proportional to the association coefficients). Source data
Extended Data Fig. 1
Extended Data Fig. 1. Biochemical categories of the 469 analyzed metabolites, and their associations with incident T2D comparing non-Hispanic White individuals vs. individuals of other races and ethnicities.
(A) Numbers of metabolites with positive, inverse, or null associations with T2D risk by biochemical category. We compared the association coefficients of each metabolite with T2D risk in the non-Hispanic White group to those from all individuals of other races and ethnics (B), Hispanic/Latino participants (C), and African American participants (D). Sample sizes for individual metabolites vary, depending on their availability in each cohort; the maximum sample sizes are 18,193 for non-Hispanic White individuals, 3,686 for Hispanic/Latino individuals, and 1,604 for African American individuals (see Supplementary Table S4). Association coefficients were presented as natural log of relative risk (RR) per SD increment in metabolites. In each cohort, we first conducted MWAS for incident T2D stratified by major racial/ethnic groups (that is, non-Hispanic White, African American, Hispanic/Latino, or mixed non-White individuals depending on sample size). The main model was adjusted for age, sex, smoking, alcohol consumption, fasting status, lipid-lowering mediation use, anti-hypertensive medication use, hypertension, dyslipidemia, body mass index, waist-hip ratio, family history of T2D, and other cohort-specific variables. Results presented in A were from meta-analysis of all participants. When comparing between racial/ethnic groups in panel B-D, we meta-analyzed the results within each group. Source data
Extended Data Fig. 2
Extended Data Fig. 2. Comparison of associations between metabolites and T2D risk across the two metabolomic platforms.
In each cohort and stratified by major racial/ethnic groups, associations between inversely normal transformed metabolites and T2D risk were analyzed using Cox or logistic regressions. Results were then meta-analyzed separately for cohorts profiled at the Broad Institute vs. those profiled at the Metabolome Inc. A total of 294 overlapping metabolites were included in the comparison. A and C compare the association coefficients (that is, natural log-transformed relative risk ratio [RR] of T2D risk per standard deviation increase in metabolite levels) between the two platforms from Model 1 and Model 2, respectively. B and D show distributions of FDR testing for association heterogeneity between the two platforms, for Model 1 and Model 2, respectively. Source data
Extended Data Fig. 3
Extended Data Fig. 3. Association with T2D risk for complex lipids and fatty acids by carbon chain length and double-bond numbers.
For complex lipid metabolites and fatty acids, we tested the correlation between their association coefficients (with T2D, from Model 2) with carbon chain length and double-bond numbers. Correlations with P < 0.05 were demonstrated, including for free fatty acids (A), cholesterol esters (B), diacylglycerols (C), triacylglycerols (D), phosphatidylcholines (E), plasmalogens (F), and sphingomyelins (G). In each sub-figure, x- and y-axis each represents carbon chain length and double-bond numbers, respectively; and the z-axis represents the natural log-transformed relative risk (RR) for T2D per standard deviation increase in the levels of metabolites. Significant correlations and P values were highlighted in red ( + and – indicate positive and negative correlations, respectively). Source data
Extended Data Fig. 4
Extended Data Fig. 4. Genetic determinants of T2D-associated metabolites.
The Manhattan-like plots show significant genetic variants associated with any of the T2D-associated metabolites, at the standard genome-wide significant level (P < 5×10−8; upper panel) and after Bonferroni corrections for 458 metabolites with genetic data (P < 1.09×10−10; lower panel). The x-axis demonstrates chromosomal positions; y-axis shows the numbers of T2D-related metabolites associated with each variants; and the color depicts the major biochemical categories of the metabolite (amino acids, lipids, carbohydrates and energy metabolism, and others). Genome-association study was conducted in each of the 8 cohorts by major racial/ethnic groups, and meta-analyzed using fixed effect meta-analysis in METAL. Among the 235 T2D-associated metabolites, 233 had GWAS summary data and were included in the analyses. Source data
Extended Data Fig. 5
Extended Data Fig. 5. Comparison of top enriched canonical pathways for genes mapped to mQTLs of T2D-associated metabolites vs. those mapped to mQTLs of non-associated metabolites.
A. The top 30 enriched pathways identified for genes mapped to mQTLs of T2D-associated metabolites (left) vs. those for non-associated metabolites (right). B. We also observed a clear difference in the overall enrichment pattern of canonical pathways, when comparing the enrichment-FDR for genes mapped to mQTLs of T2D-associated metabolites vs. those of non-associated metabolites across all 1,140 tested canonical pathways. Source data
Extended Data Fig. 6
Extended Data Fig. 6. Numbers of tissue-specific eQTL-mQTL colocalizations by metabolite’s association with T2D and key tissue types.
(A) We calculated the numbers of tissue types that each metabolite had significant mQTL-eQTL colocalizations with, and then compared numbers of colocalized tissue types across all T2D-associated metabolites vs. non-associated metabolites. Further, for the 8 selected tissue types (7 with significant enrichment of mQTL-eQTL colocalizations among T2D-associated metabolites plus liver), we used upset plots to depict the numbers of metabolites with mQTL-eQTL colocalizations, stratified by tissue types (left horizontal bars) and cross-tissue intersections (vertical bars), separately for T2D-associated metabolites (B) and non-associated metabolites (C). Source data
Extended Data Fig. 7
Extended Data Fig. 7. Associations of each circulating metabolites with baseline modifiable risk factors, and with incident T2D.
Here we presented results for current smoking, red meat intake, sugary beverage intake, and vegetable intake. In the scatter plots, we compared the associations between metabolites with a risk factor vs. their association with incident T2D. Each dot represents a metabolite (colored: associated with the risk factor and incident T2D at FDR < 0.05 by biochemical category, dark grey: associated with incident T2D but not with the risk factor; light grey: not associated with incident T2D), and we presented the trend lines (and correlation coefficients) separately for T2D-associated metabolites (dark grey) and non-associated metabolites (light grey). Association coefficients (beta) for risk factors are from metabolome-wide association analysis with all risk factors mutually adjusted simultaneously (including age, sex, and BMI, physical activity, 15 major food groups, fasting status, and other cohort specific variables). This analysis was conducted separately in NHS, NHSII, HPFS, SOL, and WHI (n = 16,883) and results were combined using a meta-analysis. Association coefficients (ln[RR]) for T2D risk are from Model 2 (the main analysis model). Source data
Extended Data Fig. 8
Extended Data Fig. 8. Schematic plot and results for metabolomic signature development and testing.
A. We primarily used WHI, which assessed the most metabolites shared between the two platforms for all its participants, as a representable training cohort. For each of held-out testing cohort, we first conducted a metabolome-wide meta-analysis for T2D risk including all cohorts except WHI and the held-out cohort. Metabolites associated with T2D risk at FDR < 0.05 and shared between the two platforms were then used as the input, in an elastic net Cox regression to construct a metabolomic signature model for T2D risk prediction in WHI. We next applied the derived model to the held-out cohort to calculate a metabolomic signature score. In WHI, a leave-one-out cross-validation (LOOCV) approach was used to acquire an unbiased metabolomic signature score for each individual without overfitting. B. We conducted a sensitivity analysis using SOL, which measured the most metabolites on the Metabolon platform for all its participants, as the training cohort. C. The AUC for T2D risk prediction in each cohort, comparing models with vs. without (blue) the metabolomic signatures, beyond traditional risk factors (age, sex, smoking, lipid-lowering medication use, anti-hypertensive medication use, family history of diabetes, hypertension, dyslipidemia, and BMI). ** Two-sided P < 0.01; * P < 0.05, ^ P < 0.1; slash: signature scores were calculated using LOOCV. Source data
Extended Data Fig. 9
Extended Data Fig. 9. Metabolomic signature for T2D prediction with the conventional model additionally adjusting for fasting glucose in cohorts with available data.
We compared AUC for T2D risk prediction across three models in a secondary analysis. Model 1 (yellow) included only the metabolomic signature. Model 2 (blue) included traditional T2D risk factors, comprising age, sex, smoking, lipid-lowering medication use, anti-hypertensive medications, family history of diabetes, hypertension, dyslipidemia, and BMI, and a T2D diagnostic biomarker, blood glucose, assessed by the metabolomic assays. Model 3 (green) additionally included the metabolomic signature score on the basis of Model 2. We compared Model 3 vs. Model 2 to evaluate if the metabolomic signatures demonstrated added value beyond traditional risk factors and blood glucose. ** Two-sided P < 0.01, * P < 0.05, and ^ P < 0.1. Source data

References

    1. IDF Diabetes Atlas 2025, 11th edn (International Diabetes Federation, 2025).
    1. Galicia-Garcia, U. et al. Pathophysiology of type 2 diabetes mellitus. Int. J. Mol. Sci.21, 6275 (2020). - PMC - PubMed
    1. Roden, M. & Shulman, G. I. The integrative biology of type 2 diabetes. Nature576, 51–60 (2019). - PubMed
    1. Morze, J. et al. Metabolomics and type 2 diabetes risk: an updated systematic review and meta-analysis of prospective cohort studies. Diabetes Care45, 1013–1024 (2022). - PMC - PubMed
    1. Guasch-Ferre, M. et al. Metabolomics in prediabetes and diabetes: a systematic review and meta-analysis. Diabetes Care39, 833–846 (2016). - PMC - PubMed

Grants and funding

LinkOut - more resources