Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Oct;57(10):2436-2444.
doi: 10.1038/s41588-025-02364-2. Epub 2025 Oct 10.

Population-scale gene-based analysis of whole-genome sequencing provides insights into metabolic health

Affiliations

Population-scale gene-based analysis of whole-genome sequencing provides insights into metabolic health

Yajie Zhao et al. Nat Genet. 2025 Oct.

Abstract

In addition to its coverage of the noncoding genome, whole-genome sequencing (WGS) may better capture the coding genome than exome sequencing. Here we sought to exploit this and identify new rare, protein-coding variants associated with metabolic health in WGS data (n = 708,956) from the UK Biobank and All of Us studies. Identified genes highlight new biological mechanisms, including protein-truncating variants (PTVs) in the DNA double-strand break repair gene RIF1 that have a substantial effect on body mass index (2.66 kg m-2, s.e. 0.43, P = 3.7 × 10-10). UBR3 is an intriguing example where PTVs independently increase body mass index and type 2 diabetes risk. Furthermore, PTVs in IRS2 have a substantial effect on type 2 diabetes (odds ratio 6.4 (3.7-11.3), P = 9.9 × 10-14, 34% case prevalence among carriers) and were also associated with chronic kidney disease independent of diabetes status, suggesting an important role for IRS2 in maintaining renal health. Our study demonstrates that large-scale WGS provides new mechanistic insights into human metabolic phenotypes through improved capture of coding sequences.

PubMed Disclaimer

Conflict of interest statement

Competing interests: J.L., A.C., Y.L., J.D., C.B.-D. and R.S. are employees and stockholders of GSK. J.R.B.P. and E.J.G. are employees and shareholders of Insmed. J.R.B.P. receives research funding from GSK. Y.Z. was a UK University worker at GSK during this work. S.O.ʼR. has undertaken remunerated consultancy work for Pfizer, Third Rock Ventures, AstraZeneca, NorthSea Therapeutics and Courage Therapeutics and is a scientific founder of Marea Therapeutics. S.L. performs paid consultancy for Eolas Medical. The other authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Genome-wide multi-ancestry gene-burden test for T2D and BMI in UKBB.
a,b, Manhattan plots showing gene-burden test results for T2D (a) and BMI (b) with unadjusted two-sided P values derived from gene-burden testing conducted in BOLT-LMM and plotted on a −log10 scale. Genes passing exome-wide significance (P< 6.15 × 10−7 (0.05/81,350)) are labeled. Points are annotated with variant mask information. MISSREVEL, missense variants with REVEL scores above 0.5 or 0.7; HCPTV, high-confidence PTVs.
Fig. 2
Fig. 2. Discovery and replication of significant associations with BMI and T2D in UKBB and AoU.
Plots show effect estimates for predicted damaging mutations in the indicated gene on BMI (left) and T2D risk (right) in the UKBB and AoU. In UKBB, effect estimates for BMI were derived using GLMs. In AoU, effect estimates were approximated from score statistics and their variances under a GLM framework. nBMI, UKBB = 481,137; nBMI, AoU = 219,015; nT2D, UKBB = 489,941; nT2D, AoU = 219,015. Odds of T2D are plotted on a log10 scale. All error bars represent 95% CIs and all P values are two-sided. Gene names of results replicated in AoU are highlighted in bold.
Fig. 3
Fig. 3. PheWAS of BMI and T2D associated genes in UKBB.
a,b, Effects of the most significant Gene × Mask association with BMI (a) or T2D (b) were assessed (Methods) on a panel of 79 traits, and resulting P values were plotted on a −log10 scale. P values are two-sided and unadjusted. Test statistics were derived from linear and logistic regression models performed using the GLM framework. Numbers of participants are provided in Supplementary Tables 10 and 11. Points are colored according to classification of phenotype; the orientations of triangles indicate the direction of effect for significant traits. For clarity, only a subset of traits and the most significant Gene × Mask association (for genes with more than one mask significantly associated with T2D or BMI) are displayed. UBR3, which was associated with both T2D and BMI in our discovery analysis, is presented alongside BMI risk genes only to avoid duplication. The solid horizontal lines represent a Bonferroni-corrected threshold for statistical significance of 2.35 × 10−5 (0.05/2,132 Phenotype × Mask associations).
Fig. 4
Fig. 4. Loss-of-function variants in IRS2 increase CKD risk.
a, Effects of protein truncating variants in IRS2 on various measures of eGFR (ml min−1 1.73m−2) and CKD (OR) are plotted with 95% CIs. All P values are two-sided and unadjusted. The presented summary statistics are derived from linear (eGFR) and logistic regression (CKD risk) implemented in the GLM framework. b, Effects of rare predicted damaging mutations in the labeled genes on T2D risk are plotted (log(OR) T2D risk ± 95% CIs) against the effect on eGFR (beta estimate ± 95% CIs) across three different methods of estimation to illustrate that the effect of PTVs in IRS2 on renal function seem independent of its effect on T2D. For clarity, only the Gene × Mask combination most significantly associated with T2D is plotted. All error bars represent 95% CIs. Plotted test statistics are derived from linear regression for eGFR and from logistic regression for T2D implemented using GLMs. nCKD, T2D = 489,941; nCreatinine eGFR = 461,884; nCystatin-C eGFR = 462,081; nCystatin-C Creatinine eGFR = 461,543.
Fig. 5
Fig. 5. Genetic evidence for functional heterogeneity of insulin receptor substrates in humans.
Effects of PTVs in IRS1 and IRS2 on continuous traits are beta-estimates from linear regression plotted in centimeters for height and kilograms for fat-free mass, and as OR from logistic regression for T2D. Odds of T2D are plotted on a log scale. All error bars represent 95% CIs. nFat Free Mass = 481,100; nHeight = 488,455; nT2D = 489,941.
Fig. 6
Fig. 6. Effects of PTVs in UBR2 and UBR3 on adiposity and cardiometabolic health.
Effects of PTVs in UBR2 and UBR3 on adiposity (adult BMI, body size age 10 years) and cardiometabolic outcomes are plotted. The points represent beta-estimates from linear regression for BMI (kg m−2) and size age 10 years, and ORs derived from logistic regression for T2D and hypertension. All error bars represent 95% CIs. nBMI = 481,137; nsize age 10 = 479,615; nT2D, hypertension = 489,941.

References

    1. Backman, J. D. et al. Exome sequencing and analysis of 454,787 UK Biobank participants. Nature599, 628–634 (2021). - PMC - PubMed
    1. Tam, V. et al. Benefits and limitations of genome-wide association studies. Nat. Rev. Genet.20, 467–484 (2019). - PubMed
    1. Loos, R. J. F. & Yeo, G. S. H. The genetics of obesity: from discovery to biology. Nat. Rev. Genet.23, 120–133 (2022). - PMC - PubMed
    1. Uffelmann, E. et al. Genome-wide association studies. Nat. Rev. Methods Prim.1, 59 (2021).
    1. Lam, B. Y. H. et al. MC3R links nutritional state to childhood growth and the timing of puberty. Nature599, 436–441 (2021). - PMC - PubMed

Substances