Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Sep 19;47(16):8375-8387.
doi: 10.1093/nar/gkz381.

Structural basis for preferential binding of human TCF4 to DNA containing 5-carboxylcytosine

Affiliations

Structural basis for preferential binding of human TCF4 to DNA containing 5-carboxylcytosine

Jie Yang et al. Nucleic Acids Res. .

Abstract

The psychiatric risk-associated transcription factor 4 (TCF4) is linked to schizophrenia. Rare TCF4 coding variants are found in individuals with Pitt-Hopkins syndrome-an intellectual disability and autism spectrum disorder. TCF4 contains a C-terminal basic-helix-loop-helix (bHLH) DNA binding domain which recognizes the enhancer-box (E-box) element 5'-CANNTG-3' (where N = any nucleotide). A subset of the TCF4-occupancy sites have the expanded consensus binding specificity 5'-C(A/G)-CANNTG-3', with an added outer Cp(A/G) dinucleotide; for example in the promoter for CNIH3, a gene involved in opioid dependence. In mammalian genomes, particularly brain, the CpG and CpA dinucleotides can be methylated at the 5-position of cytosine (5mC), and then may undergo successive oxidations to the 5-hydroxymethyl (5hmC), 5-formyl (5fC), and 5-carboxyl (5caC) forms. We find that, in the context of 5'-0CG-1CA-2CG-3TG-3'(where the numbers indicate successive dinucleotides), modification of the central E-box 2CG has very little effect on TCF4 binding, E-box 1CA modification has a negative influence on binding, while modification of the flanking 0CG, particularly carboxylation, has a strong positive impact on TCF4 binding to DNA. Crystallization of TCF4 in complex with unmodified or 5caC-modified oligonucleotides revealed that the basic region of bHLH domain adopts multiple conformations, including an extended loop going through the DNA minor groove, or the N-terminal portion of a long helix binding in the DNA major groove. The different protein conformations enable arginine 576 (R576) to interact, respectively, with a thymine in the minor groove, a phosphate group of DNA backbone, or 5caC in the major groove. The Pitt-Hopkins syndrome mutations affect five arginine residues in the basic region, two of them (R569 and R576) involved in 5caC recognition. Our analyses indicate, and suggest a structural basis for, the preferential recognition of 5caC by a transcription factor centrally important in brain development.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Schematic of human TCF4 and sequence alignment of bHLH domains. (A) Human TCF4 transcripts potentially generate 18 isoforms with different N-termini (73), but all TCF4 isoforms contain the C-terminal bHLH DNA binding domain. In-frame alternative splicing increases the number of TCF4 isoforms. For example, alternative splicing at exon 18 of TCF4 leads to the presence or absence of two RS repeats––containing arginine (R) and serine (S)––immediately prior to the C-terminal bHLH domain. For the study described here, we use the residue numbering of +RSRS isoform (NP_001077431.1) for the bHLH domain. (B) Pitt-Hopkins mutations in bHLH that alter either the basic arginine residues at the protein–DNA interface or alanine residues that coordinate the dimerization. Three pairs of intra-molecular interactions exist in the major groove of DNA: N573•••R576 (blue), N574•••R578 (red) and E577•••R580 (green). (C) TCF3, TCF4, and TCF12 are Class I bHLH proteins, also called E-box binding proteins, and share high sequence identity within their bHLH domains, except for 7 positions (colored cyan). In contrast, other three representative proteins (NeuroD1, Max and USF1) used in the alignment shares only 9 invariant residues (white letters in black background) within the bHLH. White letters in grey background indicate conserved variation (R and K; I and L; T and S; L and M).
Figure 2.
Figure 2.
Electrophoretic mobility shift assay of TCF4 bHLH protein binding to oligos containing a single E-box. (A) Schematic of chemical reactions of DNA cytosine methylation by DNMT and 5mC oxidations by Tet enzymes. (B) The central CpG dinucleotides are unmodified (C/C) or fully modified (M/M, H/H, F/F; where M = 5mC, H = 5hmC, and F = 5fC). (C) The central CpG dinucleotides are hemi-modified (M/C, H/C, F/C or 5caC/C; where 5caC = 5-carboxyC). (D) The two outer CpA dinucleotides are unmodified (C/C), fully modified (M/M, H/H, F/F) or hemi-modified (5caC/C). The protein concentrations used were a maximum of 7 μM (the right most lane 15 of each panel) followed by serial 2-fold dilutions (from right to left). The arrows indicated a reference point where the shift was observed for the unmodified oligo. The same samples were quantified by fluorescence polarization (Supplementary Figure S1).
Figure 3.
Figure 3.
Five structural conformations of TCF4 bHLH domain. (A) In PDB 6OD3, the crystallographic unit contains four dimers exhibited in three conformations (1 to 3). (B) In PDB 6OD4, the crystallographic unit contains two dimers, with and without bound DNA. (C) In PDB 6OD5, the crystallographic unit includes two dimers bound to DNA containing the 5caC modification. (D) Superimposition of the four dimers in PDB 6OD3. (EF) The two monomers (A and B) forming each dimer adopted dissimilar conformations. (G) Superimposition of two dimers in PDB 6OD4. (H) Superimposition of three DNA-bound conformations. (I) Superimposition of three conformations in the absence of cognate DNA.
Figure 4.
Figure 4.
Protein-phosphate interactions. (A) The dimer in conformation 1 binds DNA from the minor groove side. (B) The dimer in conformation 2 binds DNA with the extended N-terminal basic loop going through the minor groove. (C) R576 forms a H-bond with thymine in the minor groove. (D) An AT-hook of HMGA1 forms an H-bond with thymine in the minor groove (PDB: 3UXW). (E) The dimer in conformation 3 binds DNA in the major groove.
Figure 5.
Figure 5.
Protein-DNA base interactions. (A) An ordered network of water molecules (numbered 1–6 for each set) occupies the major groove of central CpG dinucleotides. (B) Electron density (2Fo-Fc) shows the pentagonal interactions formed by the five water molecules numbered 2–6. (C) R576 interacts with G1. (D) E577 interacts with A2:T2. (E) E577 interacts with C3. (F) N574•••R578 and E577•••R580 are part of the water-mediated network that interact with two phosphate groups of each strand. (G) Methylation of C3 by modeling a methyl group (in yellow ball) onto unmodified C3 potentially results in repulsion (indicated by stars) with the E577 and R580 in the C-specific conformation. (H) N573•••R576 interacts with the phosphate group of T5.
Figure 6.
Figure 6.
Recognition of 5caC modification. (A) The 5caC modification (X in the sequence) used for co-crystallization. Omit electron density (Fo-Fc) contoured at 5σ above the mean is shown for omitting the carboxylate groups of 5caC base. (B) Superimposition of two dimers in complex with unmodified DNA (yellow) and 5caC DNA (orange). Note the changes in the ends of DNA where modifications occur. (C) The movements of the tip of the basic region in helix α1 and DNA phosphate groups. (D) Movements of three residues (R569, N573 and R576) upon binding 5caC modified DNA. (E) The ends of DNA move towards protein. (F) N573 interacts with G3. (G) R569 interacts with the carboxylate group at C4. (H) R576 interacts with the carboxylate group at C5.

References

    1. Forrest M.P., Hill M.J., Quantock A.J., Martin-Rendon E., Blake D.J.. The emerging roles of TCF4 in disease and development. Trends Mol. Med. 2014; 20:322–331. - PubMed
    1. Massari M.E., Murre C.. Helix-loop-helix proteins: regulators of transcription in eucaryotic organisms. Mol. Cell Biol. 2000; 20:429–440. - PMC - PubMed
    1. Caudy M., Vassin H., Brand M., Tuma R., Jan L.Y., Jan Y.N.. daughterless, a Drosophila gene essential for both neurogenesis and sex determination, has sequence similarities to myc and the achaete-scute complex. Cell. 1988; 55:1061–1067. - PubMed
    1. Murre C., McCaw P.S., Baltimore D.. A new DNA binding and dimerization motif in immunoglobulin enhancer binding, daughterless, MyoD, and myc proteins. Cell. 1989; 56:777–783. - PubMed
    1. Henthorn P., Kiledjian M., Kadesch T.. Two distinct transcription factors that bind the immunoglobulin enhancer microE5/kappa 2 motif. Science. 1990; 247:467–470. - PubMed

Publication types

MeSH terms

Supplementary concepts