Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2025 Feb 14:2025.02.14.25322264.
doi: 10.1101/2025.02.14.25322264.

Nuclear regulatory disturbances precede and predict the development of Type-2 diabetes in Asian populations

Affiliations

Nuclear regulatory disturbances precede and predict the development of Type-2 diabetes in Asian populations

Pritesh R Jain et al. medRxiv. .

Abstract

To identify biomarkers and pathways to Type-2 diabetes (T2D), a major global disease, we completed array-based epigenome-wide association in whole blood in 5,709 Asian people. We found 323 Sentinel CpGs (from 314 genetic loci) that predict future T2D. The CpGs reveal coherent, nuclear regulatory disturbances in canonical immune activation pathways, as well as metabolic networks involved in insulin signalling, fatty acid metabolism and lipid transport, which are causally linked to development of T2D. The CpGs have potential clinical utility as biomarkers. An array-based composite Methylation Risk Score (MRS) is predictive for future T2D (RR: 5.2 in Q4 vs Q1; P=7×10-25), and is additive to genetic risk. Targeted methylation sequencing revealed multiple additional CpGs predicting T2D, and synthesis of a sequencing-based MRS that is strongly predictive for T2D (RR: 8.3 in Q4 vs Q1; P=1.0×10-11). Importantly, MRS varies between Asian ethnic groups, in a way that explains a large fraction of the difference in T2D risk between populations. We thus provide new insights into the nuclear regulatory disturbances that precede development of T2D, and reveal the potential for sequence-based DNA methylation markers to inform risk stratification in diabetes prevention.

PubMed Disclaimer

Figures

Extended Data Figure 1.
Extended Data Figure 1.
Overview of the study design.
Extended Data Figure 2:
Extended Data Figure 2:. Phenome-wide association of the Sentinel CpGs with epidemiological exposures, amongst participants of the HELIOS study.
The significant associations with permutation P-value <0.001 across the different categories are shown. The x axis represents the Fold enrichment and the y-axis are the individual traits split by their category. The colour represents the percentage of sentinel CpGs associated with an increased risk (red) and decreased risk (blue) for each trait. The numbers next to the plot show the count of CpGs associated with increased / decreased risk respectively.
Extended Data Figure 3.
Extended Data Figure 3.. Functional Enrichment of Sentinel CpGs.
Panel a) Functional annotation and enrichment of Sentinel CpGs across different cell types. Enrichment is shown as observed count vs expected background count across DNase 1 Hotspots (DHS); five Histone 3 marks and 15 Chromatin States. b) Enrichment of Sentinel CpGs across 1210 transcription factors (TFs) from the ReMAP database. The top 25 significantly enriched TFs are labelled. c) Enrichment of Sentinel CpG associated genes both in cis and trans. Cis-genes were annotated using 5 different threshold criteria to identify the extent of enrichment compared to the nearest genes vs alternate choices of gene sets within the 1MB region.
Extended Data Figure 4.
Extended Data Figure 4.. Covariation of Sentinel CpG and their associated eQTM Signatures.
Pairwise absolute correlation between a) Sentinel CpGs; b) cis-eQTMs (nearest gene only); and c) trans-eQTM. Covariation in 1000 random Background sets is shown for comparison, and for probability estimation. The fold enrichment was calculated as the ratio of mean absolute correlation in the Sentinel CpG set compared to the mean in the background sets. P-value for enrichment was obtained using a two-sided t-test.
Extended Data Figure 5.
Extended Data Figure 5.. Enrichment analysis: cis- and trans-acting mQTL SNPs that influence Sentinel CpGs, are enriched for association with T2D and other related human metabolic traits.
Proportion of Sentinel cis-acting mQTL SNPs associated with cardiometabolic traits compared to background, at a GWAS threshold of P<0.05 (a) or at P<1×10−5 (b). Proportion of Sentinel trans-acting mQTL SNPs associated with cardiometabolic traits compared to background, at a GWAS threshold of P<0.05 (c) or at P<10−5 (d).
Extended Data Figure 6.
Extended Data Figure 6.. Cis-mQTL based colocalization analysis of Sentinel CpGs and Type 2 Diabetes.
Regional plots of associations between the ten sentinel CpG loci that have a potential casual association with T2D risk using SMR and colocalization analyses. The lead cis-mQTL for each CpG site which is used as the instrument variable is labelled and the other SNPs are coloured based on their LD correlation with the lead SNP.
Extended Data Figure 7.
Extended Data Figure 7.. Trans-acting SNP-CPG clusters.
Circos plot illustrating the two largest trans-mQTL clusters influencing methylation at Sentinel CpGs. Panel a) Cluster 1: 13 Sentinel trans-mQTL SNPs and 83 CpGs. b) Cluster 2: six trans-mQTL and 58 CpGs. The outer track provides the SNP rsIDs (red text) and CpG ID (blue text). SNPs are and CpGs are additionally annotated with the cis-eQTL most closely associated with the SNP, and gene closest to the CpG. Inner connections show the trans-acting associations between mQTL SNPs and respective CpGs. The connections are colour coded according to the respective trans-mQTL. The identified cis-eQTLs include NKFB1, NFKBIA, NFKBIE, NF1A, COMMD7, IKZF3, MADD and MYBPC3. Many of the genes at the linked trans-Sentinel CpG sites also encode recognised inflammatory mediators (eg JAK3, MAP3K2, NOD2, SMAD3, TGFBR1 and TNIP1). However, the networks also link to CpG-genes that are components of metabolic pathways directly relevant to the pathophysiology of diabetes, including CDKAL1, CPT1A, CYP7B1, PDK4, LDLRAD2, SREBF1, SH2B2, SOCS3, TANK and TXNIP. These genes are reported to impact pancreatic beta cell function, insulin signalling and action, glucose sensing, metabolism of glucose, cholesterol and lipids, fatty acid beta oxidation, mitochondrial biology, thermogenesis, and adipogenesis, and are thus compelling candidate genes in the pathogenesis of diabetes.
Extended Data Figure 8.
Extended Data Figure 8.. WGBS Pipeline for Data generation, curation and quality control.
Panel a) analytics pipeline used to process the WGBS samples. b) Joint plot comparing the common methylation CpG signals between WGBS and its corresponding 450K array across 500 samples – average (S.D.) number of CpGs compared is 351,684.3 (± 22,035.5). The main diagram depicts the Pearson Rho and RMSE of the methylation beta value compared, while the top and right joint diagrams portray the boxplot distribution of the Pearson Rho and RMSE values respectively. c) Demonstrates the distribution of the mean coverage and number of CpGs across the 500 WGBS samples. The top and right joint diagrams provide the histogram distribution of the mean coverage and number of CpGs respectively. d) Histogram for the distribution of distance of most outlying CpG with |r|>0.2 within +/−2kb from corresponding discovery CpGs e) Example of local correlation structure with discovery CpG (array) from WGBS data, overlaid with UCSC and RefSeq genes and regulatory features (histone marks, DHS clusters, TF from ChIP-seq)
Extended Data Figure 9.
Extended Data Figure 9.. Functional Enrichment of the Fine-mapping loci.
Functional annotation and enrichment of T2D associated methylation sites in the ABCG1 Fine-mapped region (a) and the SREBF1 fine-mapped region (b), compared to all the CpGs in each of the two regions respectively. Enrichment is shown as the ratio observed count vs expected background count across DNase 1 Hotspots (DHS); five Histone 3 marks and 15 Chromatin States in different cell types.
Extended Data Figure 10.
Extended Data Figure 10.. Fine-mapping at SREBF1 locus.
Panel a) Top and bottom panel shows the −log10(p) value of association with T2D for CpG sites captured by the EPIC array and TWIST targeted sequencing respectively within a 1Mb region around the sentinel CpG (cg11024682). b) Zoomed in regional plot of SREBF1 genic region, as indicated green rectangle in. Top panel shows the relative risk (RR) for T2D per standard deviation (SD) change in methylation level, whilst the lower panel indicates the −log10(p) value of association with T2D for CpG sites within the SREBF1 genic region. The correlation with the index CpG is highlighted using the different colors. Information about the regulatory regions was obtained from UCSC genome browser for seven cell lines and are highlighted and labelled in the legend. (GM12878: Lymphoblastoid cells; H1-hESC: H1 human embryonic stem cell line; HSMM: Human skeletal muscle myoblasts; HUVEC: Human umbilical vein endothelial cells; K562: human chronic myelogenous leukemia (CML) cell line; NHEK: Normal Human Epidermal Keratinocytes; NHLF: Normal human Lung Fibroblasts)
Figure 1.
Figure 1.. Epigenome Wide Association Study (EWAS) for incident T2D in Asians.
Panel a) Manhattan plot summarizing the association between DNA methylation at the ~850K CpG sites assayed in EWAS, and incident T2D. The 10 top-ranking CpGs are annotated with CpG ID and nearest gene. b) Volcano plot showing Relative Risk for incident T2D per 1% change in DNA methylation at the ~850K CpG sites assayed. c) QQ-plot of the p-value for association with incident T2D in the EWAS analysis. Lambda is the genomic control inflation factor. d) Density plot showing the distribution of mean methylation levels at the 323 Sentinel CpGs, compared to background CpGs on the array.
Figure 2.
Figure 2.. Genome wide analysis of trans-acting mQTL SNPs.
Panel a) The number of Sentinel CpGs (y axis) associated with the genome-wide SNP variation (x axis). Results show that there are discrete genomic regions, characterized by the presence of sequence variation that influences multiple Sentinel CpGs in trans. The 10 top-ranking Sentinel SNPs that influence the highest number of Sentinel CpG sites in trans (N CpGs≥18) are highlighted in dark blue and annotated by nearest genes to the Sentinel SNP. b) Regional plots for the same analysis, showing results at the ERG, NFKB1 and NFKBIE loci, the three genomic regions influencing the highest numbers of Sentinel SNPs in trans.
Figure 3.
Figure 3.. NFKB1 expression, trans-regulation of DNA methylation, and risk of T2D.
Panel a) Phenome wide association of genomic and epidemiological exposures with NFKB1 expression. The x-axis represents the regression effect size per SD change in exposure and the y-axis is the −log10(P) for association. Each dot represents an independent phenotype and is coloured by general category. b) Box plot showing the range of NFKB1 expression across the three tertiles of BMI levels and genetically inferred BMI score. P value for association is from linear regression analysis of NFKB1 expression with the respective phenotype. c) SMR analysis between NFKB1 expression and DNA methylation at sentinel CpGs; and between the CpGs and Type 2 Diabetes using rs2272676 as the instrument variable. CpGs marked (*) showed a shared causal variant with the phenotype (Coloc. PP.H4>0.9); unmarked CpG sites also colocalized but different causal variants (Coloc.PP.H3>0.9). The triangles on top show the direction of SMR analysis. The βall estimate is the fixed effect meta-analysis effect size for all the sentinel CpGs together to show the combined average effect across all associated sentinel CpGs.
Figure 4.
Figure 4.. Lipid metabolic gene pathways identified by Sentinel T2D CpGs.
Panel a) Sankey plot showing the relationships between the core metabolic cluster comprising three trans-acting mQTL SNPs, their nine associated cis-eQTLs and five Sentinel CpGs sites associated with T2D. b) Sequence motifs of the three known SREBF1 binding consensus sequences obtained from the JASPAR database. The table below highlights the genes from this cluster that have SREBF1 binding sequence in their promoter region, and the association between expression of the named gene and SREBF1 expression. c) Regional plots showing the association of the lead trans-acting mQTL SNP (rs174598) at the FADS1/2 locus with cg11024682 methylation, HDL Cholesterol, Triglyceride levels, C reactive protein and Type 2 Diabetes. The direction of the triangles shows the direction of effect for each SNP on the trait. The colours indicate the strength of LD correlation with the lead SNP. SMR analysis was performed with the lead SNP as the genetic instrument variable, FADS1 and FADS2 expression as the exposures and the associated methylation and phenotypes as outcomes.
Figure 5.
Figure 5.. Fine-mapping at ABCG1 locus.
Panel a) Top and bottom panel shows the −log10(p) value of association with T2D for CpG sites captured by the EPIC array and TWIST targeted sequencing respectively within a 1Mb region around the sentinel CpG (cg06500161). b) Zoomed in regional plot of ABCG1 genic region, as indicated green rectangle in. Top panel shows the relative risk (RR) for T2D per standard deviation (SD) change in methylation level, whilst the lower panel indicates the −log10(P) value of association with T2D for CpG sites within the ABCG1 genic region. Information about the regulatory regions was obtained from UCSC genome browser for seven cell lines and are highlighted and labelled in the legend. (GM12878: Lymphoblastoid cells; H1-hESC: H1 human embryonic stem cell line; HSMM: Human skeletal muscle myoblasts; HUVEC: Human umbilical vein endothelial cells; K562: human chronic myelogenous leukemia (CML) cell line; NHEK: Normal Human Epidermal Keratinocytes; NHLF: Normal human Lung Fibroblasts)
Figure 6.
Figure 6.. Methylation Risk Scores and Type-2 Diabetes in Asian populations.
Panel a). Density plot showing the distribution of MRS and PRS values in T2D cases and controls (upper two quadrants) and between Asian ethnic subgroups (lower two quadrants) in the HELIOS study. b). Association of Array-based and Sequence-based MRS with T2D. Model 1: Array-based MRS and T2D in the 1,663 iHealth-T2D participants (‘Test set’), compared to PRS (Model 2), after adjustment for PRS (Model 3) and additional adjustment for Prediabetes (Model 4). Model 5: Sequence-based MRS and T2D in 588 samples from the targeted sequencing experiment, used as the ‘Test set’. Model 6: Sequence-based MRS and T2D in the subset of participants with overweight or obesity (BMI>25kg/m2) but without Prediabetes. Samples were separated into quartiles based on distribution in the controls. Associations with T2D was tested by logistic regression. Relative Risks for T2D are reported relative to Quartile 1 (reference). All analyses are adjusted for age and sex.

References

    1. NCD Risk Factor Collaboration (NCD-RisC). Worldwide trends in diabetes prevalence and treatment from 1990 to 2022: a pooled analysis of 1108 population-representative studies with 141 million participants. Lancet Lond. Engl. 404, 2077–2093 (2024). - PMC - PubMed
    1. Sun H. et al. IDF Diabetes Atlas: Global, regional and country-level diabetes prevalence estimates for 2021 and projections for 2045. Diabetes Res. Clin. Pract. 183, 109119 (2022). - PMC - PubMed
    1. Anjana R. M. et al. Metabolic non-communicable disease health report of India: the ICMR-INDIAB national cross-sectional study (ICMR-INDIAB-17). Lancet Diabetes Endocrinol. 11, 474–489 (2023). - PubMed
    1. Chambers J. C. et al. Epigenome-wide association of DNA methylation markers in peripheral blood from Indian Asians and Europeans with incident type 2 diabetes: a nested case-control study. Lancet Diabetes Endocrinol. 3, 526–534 (2015). - PMC - PubMed
    1. Cheng Y. J. et al. Prevalence of Diabetes by Race and Ethnicity in the United States, 2011–2016. JAMA 322, 2389–2398 (2019). - PMC - PubMed

Publication types

LinkOut - more resources