Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Sep 3:2:e00726.
doi: 10.7554/eLife.00726.

DNA methylation presents distinct binding sites for human transcription factors

Affiliations

DNA methylation presents distinct binding sites for human transcription factors

Shaohui Hu et al. Elife. .

Abstract

DNA methylation, especially CpG methylation at promoter regions, has been generally considered as a potent epigenetic modification that prohibits transcription factor (TF) recruitment, resulting in transcription suppression. Here, we used a protein microarray-based approach to systematically survey the entire human TF family and found numerous purified TFs with methylated CpG (mCpG)-dependent DNA-binding activities. Interestingly, some TFs exhibit specific binding activity to methylated and unmethylated DNA motifs of distinct sequences. To elucidate the underlying mechanism, we focused on Kruppel-like factor 4 (KLF4), and decoupled its mCpG- and CpG-binding activities via site-directed mutagenesis. Furthermore, KLF4 binds specific methylated or unmethylated motifs in human embryonic stem cells in vivo. Our study suggests that mCpG-dependent TF binding activity is a widespread phenomenon and provides a new framework to understand the role and mechanism of TFs in epigenetic regulation of gene transcription. DOI:http://dx.doi.org/10.7554/eLife.00726.001.

Keywords: DNA methylation; Human; epigenetics; protein microarray; protein-DNA interactions; transcription factors; transcription regulation.

PubMed Disclaimer

Conflict of interest statement

The authors declare that no competing interests exist.

Figures

Figure 1.
Figure 1.. Protein microarray-based approach identified mCpG-dependent DNA-binding activity among human TFs and cofactors.
(A) A competition assay was used to identify proteins that preferentially bind to methylated DNA motifs. SCAPER (S-phase cyclin A-associated protein in the ER) and E2F3 (E2F transcription factor 3) were shown here as two examples of methylated DNA-binding proteins. (B) A proof-of-principle assay was conducted using known methylated DNA-binding proteins on a pilot protein microarray. (C) Binding profiles of 41 TFs and 6 co-factors against 150 of the 154 tested methylated DNA motifs are summarized in the interaction map. TFs are color-coded based on the subfamilies. (D) EMSA assays validated DNA-binding activity for four selected TF candidates. Representative images from three independent experiments with similar results are shown. (E) Competition EMSA assays confirmed mCpG-dependent DNA-binding activities. As expected, 10-fold unlabeled, methylated DNA motif readily abolished the protein–DNA complex formation of the tested TFs with the biotinylated and methylated DNA motifs (Lane 1 in each image). However, 10-fold cold unmethylated DNA counterparts could not compete off methylated DNA binding, consistent with the protein microarray results. (F) HOXA5 and DIDO1 showed mCpG-dependent activation of luciferase activity in GT1-7 cells. Values represent mean ± SD (n = 3; **: p<0.01; t-test). DOI: http://dx.doi.org/10.7554/eLife.00726.003
Figure 1—figure supplement 1.
Figure 1—figure supplement 1.. Data analysis of the protein microarray assays.
(A) Workflow of data normalization. (B) Local normalization (window size 9 × 9). (C) Extrapolation of background noise distribution. Noise distribution of N2 is mirrored from distribution of N1. Standard deviation (SD) was calculated based on distribution N (Lower panel). (D) Distribution of Z scores of all proteins on a microarray. Z = 3 was selected as the cutoff in our study to identify the positives. The Z scores of some methylated DNA-binding proteins identified previously (Spruijt et al., 2013) are below our cutoff, while KLF4's Z score for one DNA motif is >6. DOI: http://dx.doi.org/10.7554/eLife.00726.004
Figure 1—figure supplement 2.
Figure 1—figure supplement 2.. Reproducibility of protein microarray data.
Left panel: signal comparison between a duplicated binding-assay with motif M303 shows a high correlation, confirming the reproducibility of the assay. Right panel: comparison between two random binding assays exhibited non-correlation between motifs M303 and M259. DOI: http://dx.doi.org/10.7554/eLife.00726.005
Figure 1—figure supplement 3.
Figure 1—figure supplement 3.. Distribution of number of mCpG-binding TFs/co-factors in a given motif-bind assay.
The median value of TFs/cofactors binding to one methylated CpG-containing motif is 8. DOI: http://dx.doi.org/10.7554/eLife.00726.006
Figure 1—figure supplement 4.
Figure 1—figure supplement 4.. Distribution of number of methylated motifs recognized by a given TF/co-factor.
Most TFs/cofactors bind to very few methylated DNA motif(s); whereas 7 TFs bind to more than 77 of the 154 motifs tested in this study. DOI: http://dx.doi.org/10.7554/eLife.00726.007
Figure 1—figure supplement 5.
Figure 1—figure supplement 5.. Distribution of TF subfamily members.
(A) Distribution of TF subfamily members that showed mCpG-binding activity. (B) Distribution of all annotated TF subfamily members presented on the TF protein microarrays. Statistic analysis showed that none of the TF subfamilies was significantly enriched (p<0.01) for methylated motif-binding activities. DOI: http://dx.doi.org/10.7554/eLife.00726.008
Figure 1—figure supplement 6.
Figure 1—figure supplement 6.. Four additional EMSA assays (A) and competition EMSA assays (B).
The results confirmed specificity of mCpG-dependent DNA-binding activities. DOI: http://dx.doi.org/10.7554/eLife.00726.009
Figure 1—figure supplement 7.
Figure 1—figure supplement 7.. Methylation level of the KLF4 and HOXA5 luciferase reporter constructs.
Eight units of KLF4 (TCCCGCCCA) and HOXA5 (AAACGCTGCC) binding motifs were separately cloned into the promoter region of a CpG-free luciferase reporter vector, and methylated with SssI before transfected into GT1-7 cells. Bisulfite sequencing confirmed that the CpG methylation levels of both motifs reached ∼100% after SssI treatment. DOI: http://dx.doi.org/10.7554/eLife.00726.010
Figure 1—figure supplement 8.
Figure 1—figure supplement 8.. Number of unique mCpG-binding TFs/co-factors in function of number of tested methylated DNA motifs.
The curve is far from saturation, suggesting that more such TFs/co-factors remain to be discovered. DOI: http://dx.doi.org/10.7554/eLife.00726.011
Figure 2.
Figure 2.. A group of 17 TFs can bind to both methylated and unmethylated motifs of distinct sequences.
(A) Our previous PDI dataset was compiled with the dataset in this study to generate binding preference of the 17 TFs. Methylated consensus motifs of the 17 TFs identified based on the protein microarray results are compared with their known consensus motifs. (B) EMSA assays confirmed that four TFs could specifically interact with both methylated and unmethylated motifs of distinct sequences. Representative images from three independent experiments with similar results are shown. (C) and (D) Two possible scenarios are proposed to distinguish the mode of interactions between these TFs and their corresponding motifs. (E) and (F) Competition EMSA assays showed that both scenarios are possible. Representative images from two independent experiments with similar results are shown in each panel. (G) OIRD sensorgrams for ZMYM3 and KLF4 binding to methylated motifs M203 and M197, and their unmethylated counterparts, respectively. The OIRD measurements were performed at two concentrations of each protein. Solid lines represent the OIRD signals. Dashed lines are fitted On- and Off-curves. Red arrows indicate the starting point when a TF protein was introduced to the OIRD reaction chamber. Blue arrows indicate the time points when wash buffer was added. (H) Summary of average KD values measured at two concentrations of each protein. ‘NB’ indicates no observed binding signals. DOI: http://dx.doi.org/10.7554/eLife.00726.012
Figure 2—figure supplement 1.
Figure 2—figure supplement 1.. Competition EMSA assays for ARID3B and ZMYM3.
As expected, unlabeled and methylated motif M319 showed dose-dependent competition against the labeled, methylated motif M319; whereas unlabeled and unmethylated motif M47 could readily compete off the binding signals. Same results were observed for ZMYM3. DOI: http://dx.doi.org/10.7554/eLife.00726.013
Figure 2—figure supplement 2.
Figure 2—figure supplement 2.. Competition EMSA assays for KLF4 and TFAP2A.
Complex formation between KLF4 and methylated mM197 and between KLF4 and unmethylated umM412 is not affected by either umM412 or mM917, respectively. However, when both methylated and non-methylated competitor DNA was added, the complex formation was abolished. DOI: http://dx.doi.org/10.7554/eLife.00726.014
Figure 2—figure supplement 3.
Figure 2—figure supplement 3.. Summary of KLF4’s dual-specificity.
Competition EMSA assays confirm KLF4's binding specificity to methylated motif M197 (mM197) and unmethylated motif M412 (umM412). DOI: http://dx.doi.org/10.7554/eLife.00726.015
Figure 2—figure supplement 4.
Figure 2—figure supplement 4.. OIRD sensorgrams for three TFs and MBD2b binding to three methylated DNA motifs.
(A) MBD2b with a reported KD value of 330 nM was used as a benchmark in the OIRD system, showing the sensorgrams of MBD2b binding to methylated M203, M213 and M197. (B)–(D) OIRD sensorgrams for ZMYM3, TFAP2A and KLF4 binding to methylated motifs M203, M213 and M197, and their unmethylated counterparts, respectively. The OIRD measurements were performed at two concentrations of each protein. Solid lines represent OIRD signals. Dashed lines are fitted On- and Off-curves. Red arrows indicate the starting point when a TF protein was introduced to the OIRD reaction chamber. Blue arrows indicate the time points when wash buffer was added. DOI: http://dx.doi.org/10.7554/eLife.00726.016
Figure 3.
Figure 3.. KLF4’s mCpG-dependent binding activity is decoupled from its binding activity to unmethylated motifs.
(A) Simulation of KLF4–DNA interactions predicted that two residues, Arg458 and Asp460, are involved in the interactions with methylated cytosine. Double arrow indicates van der Waals interactions between Arg458 and methyl group on the cytosine in one strand (5mCA). Red balls represent water molecules. (B) Asp460 further stabilizes binding to 5 mC on the other strand (5mCB) via a CH•••O (H2O—5mC) H-bond contact. (C) EMSA assays using KLF4 mutated proteins demonstrated that both R458 and D460 are crucial for mCpG-dependent binding activity. Representative images from three independent experiments with similar results are shown. (D) In cell-based luciferase assays for M197, WT KLF4 showed mCpG-dependent activation of downstream gene expression (red bars in the upper panel), while both R458A and D460A mutations abolished this activity (red bars in the middle and lower panels). (E) In cell-based luciferase assays with M412 (blue bars), both WT and mutants can activate the expression of unmethylated M412 (blue bars), but have no effect on methylated M412 (red bars). In (D) and (E), values represent mean ± SD (n = 3; **: p<0.01; t-test) DOI: http://dx.doi.org/10.7554/eLife.00726.017
Figure 3—figure supplement 1.
Figure 3—figure supplement 1.. Architecture of KLF4 DNA-binding domain.
KLF4 encodes two and half zinc finger DNA-binding domains at its C-terminus. Residues R458 and D460, which were predicted to interact with the 5-methyl group in the cytosine, are located in the zf-H2C2 domain. D432 indicates where a truncated KLF4 construct ends. DOI: http://dx.doi.org/10.7554/eLife.00726.018
Figure 3—figure supplement 2.
Figure 3—figure supplement 2.. Known crystal structures of MeCP2 and ZFP57 in complex with methylated DNA.
The pink and blue double arrows represent van der Waals force between the arginine and methyl groups. Red balls are water molecules. DOI: http://dx.doi.org/10.7554/eLife.00726.019
Figure 3—figure supplement 3.
Figure 3—figure supplement 3.. EMSA assays to evaluate impacts of KLF4 R458K, R458A::D460A mutations, and Δ432 truncation on its binding activity to motifs M412 and M197.
These results clearly demonstrated that both the single- and double-mutations, as well as the truncation, abolished KLF4's ability to form a complex with methylated motif M197, while neither showed detectable impact on complex formation with unmethylated motif M412. DOI: http://dx.doi.org/10.7554/eLife.00726.020
Figure 3—figure supplement 4.
Figure 3—figure supplement 4.. Western blot analysis of overexpression of KLF4WT, KLF4R458A and KLF4D460A proteins in GT1-7 cells.
Using GAPDH as a control, these results demonstrated equal transfection efficiency of the constructs. DOI: http://dx.doi.org/10.7554/eLife.00726.021
Figure 4.
Figure 4.. Endogenous KLF4 binds to methylated loci in human embryonic stem cells (H1) in vivo.
(A) Bioinformatics analysis to derive methylated DNA motif logo binding to KLF4 by integrating of KLF4 ChIP-Seq and methylome data in H1 cells. Based on the distribution of methylation level at the KLF4 binding sites, a top methylated consensus motif boxed in red was discovered in the highly methylated sites. As a comparison, M197 sequence recognized by KLF4 in the protein microarray assays is shown below. (B) Experimental procedure of KLF4 ChIP-bisulfite sequencing to confirm that KLF4 preferentially interacts with hyper-methylated motifs in H1 cells. (C) The gel images of KLF4 ChIP’ed loci (L1: chr1: 559311-559516; L2: chr5: 44424678-44424792; L3: chr16: 4681299-4681481; L4: chr2: 132747088-132747377; L5: chr12: 81828301-81828506) demonstrate specific and direct binding of KLF4 to its target regions. Negative controls were performed in the absence of the anti-KLF4 monoclonal antibodies. (D) Analysis of KLF4-ChIP against the five loci using the quantitative real-time PCR (qPCR) method. Fold change at each locus was obtained by taking the ratio of KLF4-ChIP qPCR signals over the negative control signals. Statistics analysis was based on three technical replicates. (E) Sanger bisulfite sequencing reads of input and KLF4-ChIP’ed DNA. Filled and blank circles indicate methylated and unmethylated CpG sites, respectively. Blue and red arrows indicate CpGs in the context of motifs M412 and M917, respectively. (F) For relatively lower methylation input, KLF4 methylated binding sites tend to have a higher methylation level after KLF4 ChIP. The lower panel in (F) shows the methylation differences at each CpG site between the input and KLF4 ChIP’ed DNA. p values were determined by binominal probability density function. DOI: http://dx.doi.org/10.7554/eLife.00726.022
Figure 4—figure supplement 1.
Figure 4—figure supplement 1.. Integration of KLF4 ChIP-seq and methylome data in H1 cell.
KLF4 ChIP-Seq and methylome data in H1 were compiled to assign the methylation levels in KLF4 ChIP'ed segments (upper panel). Lower panel was schematic plot for KLF4 binding summits. The pink ovals indicate KLF4 binding summits as determined in the KLF4 ChIP-seq experiments. The short vertical lines in red and blue indicate the CpG sites in the contexts of motif M197 and M421, respectively. Other CpG sites are annotated with gray lines. The thin green lines underneath represent the observed methylation level. DOI: http://dx.doi.org/10.7554/eLife.00726.023
Figure 4—figure supplement 2.
Figure 4—figure supplement 2.. Five selected KLF4-binding loci for further analyses.
The chromosome positions and KLF4 ChIP-seq peaks (GSM447584) are shown. DOI: http://dx.doi.org/10.7554/eLife.00726.024
Figure 4—figure supplement 3.
Figure 4—figure supplement 3.. An example of KLF4 ChIP-bisulfite sequencing assay.
The sequencing results confirmed that KLF4 bound to hyper-methylated loci in the sequence context of CCmCGCC (arrows) in H1 cell. Upper and lower panels represent bisulfite sequencing results of the input and KLF4 ChIP'ed loci. DOI: http://dx.doi.org/10.7554/eLife.00726.025

Comment in

References

    1. Bartels SJ, Spruijt CG, Brinkman AB, Jansen PW, Vermeulen M, Stunnenberg HG. 2011. A SILAC-based screen for Methyl-CpG binding proteins identifies RBP-J as a DNA methylation and sequence-specific binding protein. PLOS ONE 6:e25884. 10.1371/journal.pone.0025884 - DOI - PMC - PubMed
    1. Bartke T, Vermeulen M, Xhemalce B, Robson SC, Mann M, Kouzarides T. 2010. Nucleosome-interacting proteins regulated by DNA and histone methylation. Cell 143:470–84. 10.1016/j.cell.2010.10.012 - DOI - PMC - PubMed
    1. Bateman A, Coin L, Durbin R, Finn RD, Hollich V, Griffiths-Jones S, et al. 2004. The Pfam protein families database. Nucleic Acids Res 32:D138–41. 10.1093/nar/gkh121 - DOI - PMC - PubMed
    1. Baylin SB. 2005. DNA methylation and gene silencing in cancer. Nat Clin Pract Oncol 2(suppl 1):S4–11. 10.1038/ncponc0354 - DOI - PubMed
    1. Boyes J, Bird A. 1991. DNA methylation inhibits transcription indirectly via a methyl-CpG binding protein. Cell 64:1123–34. 10.1016/0092-8674(91)90267-3 - DOI - PubMed

Publication types