Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jul 26;47(13):6753-6768.
doi: 10.1093/nar/gkz483.

Identification of DNA motifs that regulate DNA methylation

Affiliations

Identification of DNA motifs that regulate DNA methylation

Mengchi Wang et al. Nucleic Acids Res. .

Abstract

DNA methylation is an important epigenetic mark but how its locus-specificity is decided in relation to DNA sequence is not fully understood. Here, we have analyzed 34 diverse whole-genome bisulfite sequencing datasets in human and identified 313 motifs, including 92 and 221 associated with methylation (methylation motifs, MMs) and unmethylation (unmethylation motifs, UMs), respectively. The functionality of these motifs is supported by multiple lines of evidence. First, the methylation levels at the MM and UM motifs are respectively higher and lower than the genomic background. Second, these motifs are enriched at the binding sites of methylation modifying enzymes including DNMT3A and TET1, indicating their possible roles of recruiting these enzymes. Third, these motifs significantly overlap with "somatic QTLs" (quantitative trait loci) of methylation and expression. Fourth, disruption of these motifs by mutation is associated with significantly altered methylation level of the CpGs in the neighbor regions. Furthermore, these motifs together with somatic mutations are predictive of cancer subtypes and patient survival. We revealed some of these motifs were also associated with histone modifications, suggesting a possible interplay between the two types of epigenetic modifications. We also found some motifs form feed forward loops to contribute to DNA methylation dynamics.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Defining methylated regions and searching for methylation associated motifs. (A) The strategy of identifying DNA methylation associated motifs. (B) WGBS CpG sites are merged within 400bp regions. Based on average CpG beta values of the region, we defined commonly methylated (CMR), commonly unmethylated (CUR) and variably methylated regions (VMR). (C) Identification of DNA methylation associated motifs in 34 cells and tissues. Example motifs are shown on the right (if matched to a known motif, the known motif logo is shown on the top).
Figure 2.
Figure 2.
Identified motifs mark methylation level. (A) Example motifs are shown with average CpG methylation level calculated in 50 bp bins around all motif sites, determined by FIMO at 10–5P-value cutoff. The examples are chosen from MM, UM, de novo motifs, matched known TFs, common region and sorted variable regions. Upper panel, from left to right: UM_180.0_3.14 (matched to CTCF); UM_106.1_4.08 (de novo); UM_238.2_3.88 (matched to WT1); lower panel, from left to right: MM_65.9_2.90 (matched to TOPORS); MM_814.4_2.02 (matched to PAX5); MM_206.3_2.16 (de novo). (B) DNA methylation levels in the ROADMAP (left) and TCGA (right) data sets over the gene body. Each gene body was split into ten equal bins and the Beta values of all CpGs in the same bin were averaged over all genes. Lower panel shows the correlation between the motif occurrences and CpG methylation in ROADMAP (WGBS data from H1, mesoderm, and liver) and TCGA (450K methylation of CpGs averaged in patients from PAAD, LUAD, and BRCA) around TP53 (chr17:7 540 000–7 650 000). (C) Normalized motif occurrence of UM, MM and known TFs (excluding matched) from HOCOMOCO (17) at 5000 bp windows centering ChIP-seq peaks of TET1, DNMT3A and DNMT3B collected from various studies (22,24,25). The lower panel shows the clustered heatmap of normalized z-score. (D). Center-to-edge enrichment of UMs and MMs in comparison with TF NR6A1 and CTCF, which were reported to recruit DNMT and TET to specific loci, at the ChIP-seq peaks of DNMTs and TETs.
Figure 3.
Figure 3.
Somatic mutation at motif sites co-occur with local methylation alteration. (A) Distribution of somatic quantitative trait loci corresponding to methylation (mQTL) and gene expression (eQTL) over gene body (see details in Materials and Methods). Each gene body is split into ten equal bins. (B) Methylation level change of CpG sites nearby TET1-UM sites (TET1 binding peaks containing UM motifs) overlapping with somatic mutations. Asterisks indicate P < 0.01 calculated with paired one-tail t-test, pairing foreground observed methylation change to the corresponding background expected methylation change. Foreground (FG), somatic mQTL at TET1-UM sites. Background (BG), somatic mQTL at TET1 binding peaks (22,24,25). To ensure the statistical significance, we only considered the 15 cancers with >100 CpGs within 5000 bp of TET1-UM sites (see details in Methods). (C) An example showing disruption of a UM motif (no match with known motifs) by a C→T somatic mutation at chr16:68002415 significantly increases the methylation level of the four nearby CpGs in the LUAD patients.
Figure 4.
Figure 4.
Combining motif and mutation improves the prediction of cancer diagnosis and patient survival. (A) auROC and auPRC for cancer type prediction. Classification model of each cancer built with gradient boosting. Performance evaluated with auROC (area under the receiver operating characteristic, good for an overall evaluation.) and auPRC (area under the precision-recall curve, good for an unbalanced dataset where the positive label is scarce). Label: mutation: using somatic mutations as features. mutation+motif: using both somatic mutations and collective disruption of motif site as features (see Materials and Methods for details). * Adjusted P < 0.05. (B) Results of top predictive features (score > 0.01) using gradient boosting out-of-bag estimation. Twenty six cancers with auPRC > 0.3 are shown. (C) Survival analysis with gradient boosting with mutation and mutation + motif as models. Left: multivariate survival analyses for all solid TCGA cancers. Forest plots showing log2 hazard ratio (95% confidence interval) of the predicted high-risk group by both models. *Adjusted P < 0.05 (blue for the mutation model and red for the mutation+motif model). Right: Kaplan–Meier survival estimation (95% confidence interval) in the high-risk group versus low-risk group predicted by both models. (D) Multivariate survival analysis showing factors correlating with patient survival (P < 0.05) with the log2 hazard ratio (95% confidence interval).
Figure 5.
Figure 5.
Methylation motifs interplay with TET1, DNMT3A, and histone modification. (A) Methylation motifs matched to histone motifs (89). Motifs are aligned with Tomtom with e < 0.05. Lower panel showing several examples. (B) Feedforward loop targeting TET1 and DNMT3A. C. Feedforward loop via TET1 and DNMT3A.

References

    1. Lienert F., Wirbelauer C., Som I., Dean A., Mohn F., Schübeler D.. Identification of genetic elements that autonomously determine DNA methylation states. Nat. Genet. 2011; 43:1091–1097. - PubMed
    1. Stadler M.B., Murr R., Burger L., Ivanek R., Lienert F., Schöler A., van Nimwegen E., Wirbelauer C., Oakeley E.J., Gaidatzis D. et al. .. DNA-binding factors shape the mouse methylome at distal regulatory regions. Nature. 2011; 480:490–495. - PubMed
    1. Whitaker J.W., Chen Z., Wang W.. Predicting the human epigenome from DNA motifs. Nat. Methods. 2015; 12:265–272. - PMC - PubMed
    1. Wu C., Yao S., Li X., Chen C., Hu X.. Genomewide prediction of DNA methylation using DNA composition and sequence complexity in human. Int. J. Mol. Sci. 2017; 18:E420. - PMC - PubMed
    1. Das R., Dimitrova N., Xuan Z., Rollins R.A., Haghighi F., Edwards J.R., Ju J., Bestor T.H., Zhang M.Q.. Computational prediction of methylation status in human genomic sequences. Proc. Natl. Acad. Sci. U.S.A. 2006; 103:10713–10716. - PMC - PubMed

Publication types

MeSH terms