Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Aug;56(8):1624-1631.
doi: 10.1038/s41588-024-01851-2. Epub 2024 Jul 24.

The correlation between CpG methylation and gene expression is driven by sequence variants

Affiliations

The correlation between CpG methylation and gene expression is driven by sequence variants

Olafur Andri Stefansson et al. Nat Genet. 2024 Aug.

Abstract

Gene promoter and enhancer sequences are bound by transcription factors and are depleted of methylated CpG sites (cytosines preceding guanines in DNA). The absence of methylated CpGs in these sequences typically correlates with increased gene expression, indicating a regulatory role for methylation. We used nanopore sequencing to determine haplotype-specific methylation rates of 15.3 million CpG units in 7,179 whole-blood genomes. We identified 189,178 methylation depleted sequences where three or more proximal CpGs were unmethylated on at least one haplotype. A total of 77,789 methylation depleted sequences (~41%) associated with 80,503 cis-acting sequence variants, which we termed allele-specific methylation quantitative trait loci (ASM-QTLs). RNA sequencing of 896 samples from the same blood draws used to perform nanopore sequencing showed that the ASM-QTL, that is, DNA sequence variability, drives most of the correlation found between gene expression and CpG methylation. ASM-QTLs were enriched 40.2-fold (95% confidence interval 32.2, 49.9) among sequence variants associating with hematological traits, demonstrating that ASM-QTLs are important functional units in the noncoding genome.

PubMed Disclaimer

Conflict of interest statement

All authors are employees of deCODE genetics/Amgen.

Figures

Fig. 1
Fig. 1. 5-mCpG detected by nanopore sequencing.
a, 5-mCpG rates computed across individuals yielded the expected bimodal distribution. b, 5-mCpG rates averaged in 100 bp bins relative to the midposition of chromatin makers assayed in relevant cell types, that is, histone modifications (H3K4me3, H3K27ac, H3K36me3 and H3K9me3) in primary neutrophils, CTCF protein binding sites in primary neutrophils and open chromatin regions in CD4+ T cells obtained from Encode and Roadmap epigenomics project. Additionally, eRNA and main TSS reference maps for RNA samples isolated from whole blood based on cap analysis of gene expression sequencing assays obtained from the Fantom5 project (SlideBase). DHS, DNase hypersensitive sites; eRNA, enhancer RNA. c, We applied sequence-based phasing to assign 5-mCpG status to paternal (p) or maternal (m) haplotypes in each individual (I). For each CpG unit and each parental haplotype, we computed the 5-mCpG rate and defined unmethylated haplotypes where we found three or more neighboring CpG units each with 5-mCpG rate <0.15, but located no more than 500 bp apart, in at least one haplotype (restricted to 2,648 individuals sequenced to an average coverage of >20×). A rectangle is drawn around neighboring CpG units where such unmethylated haplotypes were detected. The cluster labeled α defines locations containing overlapping unmethylated haplotypes labeled a, b and c. The most frequently occurring unmethylated haplotype (fmax) is then nominated as the representative MDS of cluster α (MDSα) containing CpG units x3,x4,x5,x6. Source data
Fig. 2
Fig. 2. ASM-QTLs dominate in correlations found between MDSs and mRNA expression.
a, Variance in 5-mCpG rates of MDSs, one MDS from each 100 kb segment of the genome (n = 25,079 MDSs), binned by quintiles (y axis; ~5,000 MDSs per bin) and plotted against the proportion of these MDSs that have an associated ASM-QTL (x axis) according to whether the MDS is associated with mRNA expression (black) or not (red). The proportion estimates are shown as tick marks (vertical lines), and their 95% CIs are shown as horizontal lines. b, Fraction of the variance in mRNA expression explained by each of the four variables on the y axis as follows: y = ASM-QTL represents the genotype of the ASM-QTL found in association with 5-mCpG rates of MDSs; y = 5-mCpG represents the 5-mCpG rates of MDSs; y = 5-mCpG|ASM-QTL represents the 5-mCpG rates of MDSs after correction for the ASM-QTL found in association with that same MDS; y = ASM-QTL|5-mCpG represents the genotype of the ASM-QTL found in association with 5-mCpG rates of MDSs after having corrected the genotype status for the 5-mCpG rates of that same MDS. c, Fraction of the variance in 5-mCpG rates explained by each of the four variables on the y axis, where mRNA represents mRNA expression but otherwise analogous to b. In both b and c, the center line (solid black) shown in each box represents the median; box limits represent upper and lower quartiles; whiskers represent 1.5× interquartile range. r2mRNA,5-mCpG in b is equivalent to the reversed comparison of r25-mCpG,mRNA in c. d, Effects of ASM-QTL genotype (G) on CpG methylation (β^GM) and mRNA expression (β^GE), x axis, compared with the effects of CpG methylation on mRNA expression (β^ME), y axis. The number of datapoints in each boxplot in b and c, and in the scatter plot in d corresponds to the number of associations found between methylation and gene expression (n = 1,513). Source data
Fig. 3
Fig. 3. DNA sequence variability affects CpG methylation and gene expression.
Illustration of the two models (models 1 and 2; bottom left) consistent with our results. In this hypothetical example, an ASM-QTL gives rise to CpG methylation differences between the maternal (left) and paternal (right) chromosomes of an individual. Under model 1, the ASM-QTL influences TF binding to DNA, which in turn influences methylation of nearby CpGs, but it is the TF (not methylation) that then results in influences on gene expression. Under model 2, the ASM-QTL influences TF binding to DNA, which again leads to influences on methylation of nearby CpGs, but here the change in methylation results in influences on gene expression, for example, by enabling binding of a CpG methylation-sensitive TF. Hence, methylation is irrelevant to gene expression in model 1 whereas it is relevant to gene expression in model 2. In both models, it is DNA sequence variability that drives the correlation between CpG methylation and gene expression. ALT, alternative allele; REF, reference allele; RNAPII, RNA polymerase II.
Fig. 4
Fig. 4. ASM-QTLs are enriched among GWA signals.
Enrichment of ASM-QTLs among GWA signals varies in magnitude by trait category. The number of GWA signals for each trait category is shown in parentheses. Solid points, measure of center (enrichment point estimates); horizontal lines, 95% CIs; vertical red line, point of neutral enrichment (x = 1). Source data
Extended Data Fig. 1
Extended Data Fig. 1. Methylation depleted sequences.
Lines on the y-axis represent unmethylated haplotypes found in different individuals that overlap in position, with chromosomal position on the x axis. (a) For each such ‘cluster of overlapping unmethylated haplotypes’, as shown here, we look for the most frequently occurring unmethylated haplotype, shown in red. This ‘dominant’ form is then taken as the representative for the cluster of unmethylated haplotypes, which then enables comparison in 5-mCpG haplotype rates over the same coordinates (those of the representative) across all individuals in the cohort. (b) An example cluster where two (or more) representatives were found which we identify, and measure separately as distinct entities. chr, chromasome.
Extended Data Fig. 2
Extended Data Fig. 2. Validation of ASM-QTL influences in oxBS-seq data.
ASM-QTLs validated using oxBS in independent DNA samples derived from whole blood of forty-five individuals. Note, these forty-five individuals were not included in the larger cohort of 7,179 nanopore sequenced individuals. The effect size of each ASM-QTL on the 5-mCpG haplotype rates of MDSs in oxBS sequenced individuals, y-axis, are plotted against the effect size of the same ASM-QTLs on 5-mCpG haplotype rates of the corresponding MDS in nanopore sequenced samples, x-axis. We binned the validation tests according to the number of haplotypes (n) that were available for validation testing in oxBS individuals as indicated on the top of each figure. A least squares regression line, where the effect sizes in the oxBS validation cohort are used as the outcome and those in nanopore sequenced cohort as the predictor, is shown in each figure, blue colour, along with a diagonal line of identity, red colour, as we expect the regression coefficient to be equal to one (and an intercept of zero) if the effect sizes are identical in the oxBS and nanopore sequenced individuals. The regression coefficients and their 95% confidence intervals are shown at the top left-hand corner of each figure. Note that as the number of informative haplotypes (n) in the oxBS validation cohort increases, the regression coefficient approaches that of a diagonal line indicating that the effect sizes become more similar as the number of haplotypes available for testing (n) in the oxBS cohort increases. Source data
Extended Data Fig. 3
Extended Data Fig. 3. MDSs associated with gene expression.
Distance, x-axis, from MDS midpoints to the TSS locations of the associated mRNA transcript isoforms versus the effect size on expression, y-axis. MDS midpoints are represented as black dots, and the MDSs are represented as ranges that is, horizontal lines. (a) Distal associations that is, where the MDS does not contain the TSS of the associated mRNA. (b) Promoter associations that is, where the MDS contains the TSS of the associated mRNA, where the MDS is represented by a line (left to right) indicating the position relative to the TSS of the associated mRNA. (a) and (b) TSS is represented as the origin of the x-axis (x = 0), shown as a vertical dashed line, red. Note, we modify the sign of the distance such that negative distances reflect upstream positions of the MDS midpoint relative to the TSS of the associated mRNA. Hence, the upstream positions indicate that the MDS midpoint does not intersect with the direction of mRNA transcription. In contrast, downstream positions indicate that the MDS midpoint does intersect with the direction of mRNA transcription. Source data
Extended Data Fig. 4
Extended Data Fig. 4. Functional attributes of ASM-QTLs.
ASM-QTLs were more likely than expected by chance, to be found in high LD (r2 > 0.80) to A) sequence variants that influence the DNA binding affinity to transcription factors SPI1, CTCF and EBF1, and cohesin protein STAG1 and B) sequence variants located within regulatory sequences defined by ENCODE, that is, candidate cis-regulatory elements (version 3). Note, sequence variants that influence ASBs were identified in Chen et al.. The P-values shown, unadjusted for multiple comparison, were computed using our permutation-based method described under methods section „ASM-QTL in functional annotation maps‘.
Extended Data Fig. 5
Extended Data Fig. 5. ASM-QTLs dominate in correlations between MDSs and mRNA.
(a) Example of an association found between expression of VAMP5-201 (ENST00000306384), y-axis, and 5-mCpG haplotype rates, x-axis, of an MDS (chr2:85584168-85584976) located within the CpG island promoter of the TSS; that is, an example of CpG island promoter methylation. (b) The fraction of variance in VAMP5-201 mRNA isoform expression explained by (top) ASM-QTL found in association with the MDS (middle) CpG methylation of the MDS and (bottom) CpG methylation of the MDS after correction for the ASM-QTL. This same association between MDS at chr2:85584168-85584976 and mRNA expression of VAMP5-201 is then shown separately on (c) the reference- and (d) the alternative alleles of ASM-QTL chr2:85580659:IG.0:0, respectively. Least squares regression line is shown within each of three scatter plots, in blue colour, and the estimated regression coefficient and their 95% confidence intervals are shown at the bottom, along with the corresponding two-sided P-values. Source data
Extended Data Fig. 6
Extended Data Fig. 6. Modeling the impact of ASM-QTL on methylation and expression.
Model diagrams (leftmost column) describing the four different hypotheses on the mechanism by which an ASM-QTL genotype (G) influences CpG methylation (M) and gene expression (E). The outcomes of two different tests (MR-Steiger and our method of comparing βˆGMβˆGE and σˆM2βˆME) are shown; ‘v’ shape indicates that the model is supported by the test, whereas ‘x’ indicates that the model is not supported by the test and empty cells indicates that the test is unable to consider whether or not the model is supported. For clarity, each model is labelled with a number which are then used as reference to each diagram in the main text and in Fig. 3.
Extended Data Fig. 7
Extended Data Fig. 7. Varied contribution of sequence variants to human trait variability.
Enrichment, x-axis, of ASM-QTLs and other sequence variant annotations, y-axis, among GWA signals found in various human diseases and other traits. The solid points (black) represent the measure of center, that is, the enrichment point estimates and the horizontal lines (black) represent their 95%CIs. The number of sequence variants in each annotation is shown within parentheses on the y-axis. Shown are the enrichment point estimates, points, and their 95% confidence intervals, horizontal lines, for each sequence variant annotation, and the point of neutral enrichment on the x-axis as vertical line, blue. UTR=Untranslated Region, DHS=DNase hypersensitivity site, MDSs=Methylation depleted sequences. Source data

References

    1. Luo, C., Hajkova, P. & Ecker, J. R. Dynamic DNA methylation: In the right place at the right time. Science361, 1336–1340 (2018). 10.1126/science.aat6806 - DOI - PMC - PubMed
    1. Okano, M., Bell, D. W., Haber, D. A. & Li, E. DNA methyltransferases Dnmt3a and Dnmt3b are essential for de novo methylation and mammalian development. Cell99, 247–257 (1999). 10.1016/S0092-8674(00)81656-6 - DOI - PubMed
    1. Li, E., Bestor, T. H. & Jaenisch, R. Targeted mutation of the DNA methyltransferase gene results in embryonic lethality. Cell69, 915–926 (1992). 10.1016/0092-8674(92)90611-F - DOI - PubMed
    1. Clark, S. J. et al. Single-cell multi-omics profiling links dynamic DNA methylation to cell fate decisions during mouse early organogenesis. Genome Biol.23, 202 (2022). 10.1186/s13059-022-02762-3 - DOI - PMC - PubMed
    1. Dynan, W. S. & Tjian, R. The promoter-specific transcription factor Sp1 binds to upstream sequences in the SV40 early promoter. Cell35, 79–87 (1983). 10.1016/0092-8674(83)90210-6 - DOI - PubMed

LinkOut - more resources