Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2024 Aug:87:102836.
doi: 10.1016/j.sbi.2024.102836. Epub 2024 May 15.

Updated understanding of the protein-DNA recognition code used by C2H2 zinc finger proteins

Affiliations
Review

Updated understanding of the protein-DNA recognition code used by C2H2 zinc finger proteins

Xing Zhang et al. Curr Opin Struct Biol. 2024 Aug.

Abstract

C2H2 zinc-finger (ZF) proteins form the largest family of DNA-binding transcription factors coded by mammalian genomes. In a typical DNA-binding ZF module, there are twelve residues (numbered from -1 to -12) between the last zinc-coordinating cysteine and the first zinc-coordinating histidine. The established C2H2-ZF "recognition code" suggests that residues at positions -1, -4, and -7 recognize the 5', central, and 3' bases of a DNA base-pair triplet, respectively. Structural studies have highlighted that additional residues at positions -5 and -8 also play roles in specific DNA recognition. The presence of bulky and either charged or polar residues at these five positions determines specificity for given DNA bases: guanine is recognized by arginine, lysine, or histidine; adenine by asparagine or glutamine; thymine or 5-methylcytosine by glutamate; and unmodified cytosine by aspartate. This review discusses recent structural characterizations of C2H2-ZFs that add to our understanding of the principles underlying the C2H2-ZF recognition code.

Keywords: C2H2 zinc fingers; DNA sequence-specific recognition; protein-DNA interactions; transcription factors.

PubMed Disclaimer

Conflict of interest statement

Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Figures

Figure 1
Figure 1. The canonical model of C2H2 ZF interaction with DNA triplet element.
(a) Example of a C2H2 ZF module. The three residues at canonical positions of −1, −4, and −7 face the DNA. Two cysteine and two histidine residues in each finger (C2H2; in stick models) are responsible for one Zn2+-ion binding. [Residue numbering shown on the blue secondary structure diagram is described in panel c] (b) Schematic representation of a single ZF unit typically bound to three or four adjacent DNA base pairs via major groove contacts. The DNA sequence of the recognition strand (bottom in green) is oriented right to left from 5′ to 3′. The complementary strand (top) is colored in gray. (c) The protein sequence is from N-to-C termini (left-to-right) and amino acids at positions −1, −4, and −7 (highlighted) relative to the first Zn-associated histidine interact specifically with the DNA bases shown above. The protein secondary structures are shown below the sequence, with arrows for β strands, lines for loops, and cylinder for α helix. The traditional structure-based numbering at −1, +2, +3, and +6 (relative to the start of the α-helix) is provided for comparison. (d) A rare example of cross-finger zinc coordination in Arabidopsis thaliana REF6. (e) A set of 156 C2H2 ZFs was obtained from https://genexplain.com/tfclass/Class%202.3_alignment.html. The ZFs were from families 2.3.1 and 2.3.2. (Left) Distribution of pI vs. MW for each ZF, with the mean ± SD shown in red. For comparison, the 11 ZFs of human CTCF are shown as blue diamonds. (Right) The logos were generated using WebLogo3.0, after grouping the ZFs by pI as >1 SD above or below the mean, and those within 1 SD of the mean. The Zn ligands (2 Cys and 2 His) are invariant. The base recognition residues are indicated by vertical arrows. (f) Distribution of recognition amino acids at positions −7 (yellow), −4 (orange), and −1 (red) in this set are shown. (g) Logo analysis [86] of the ZF elements having the three most-frequent specificities (KHA, RHK, and RER at −7/−4/−1; vertical arrows). Together, these account for 66 of the 156 ZFs (42%). BHX (basic-His-any amino acid) should recognize the DNA sequence NGG; and RER (Arg-Glu-Arg) should recognize DNA sequence GMG, where M is Tor 5-methylC [40]. KHA triplet examples included ZF1 of Sp1 and ZF1 of KLF4. RHK triplet examples included ZF3 of Sp1 and ZF3 of KLF9. RER triplet examples included ZF2 of Sp1; ZF2 of KLF4; and both ZF1 and ZF3 of EGR1. (h) An (incomplete) list of recent C2H2 ZF-DNA complex structures discussed in this review. Abbreviations: CTCF = CCCTC-binding factor; MW = molecular weight (mass); ZF = zinc finger.
Figure 2
Figure 2. An Arg–Asp (RD) switch at positions −8 and −7.
The DNA recognition strand bases are in green, with the complementary strand in gray. (a) Examples of ZFs containing RD at positions −8 and −7. (b) ZF9 of PRDM9 allele-C has R at −8 interacting with Gua. (c–d) ZF10 and ZF12 of PRDM9 allele-C have D at −7 interacting with Cyt. (e) The ZF3 of TFIIIA spans four base pairs with the Arg at position +3 between the two His ligands (HxxRxH) interacting with guanine. (f) The (modeled) corresponding Arg at position 3 of ZF10 in CTCF could make DNA contacts with two neighboring phosphate groups. (g) The (modeled) corresponding Arg (magenta) at position 3 of ZF4 in ZNF524 could make DNA base contacts. Abbreviations: CTCF = CCCTC-binding factor; ZF = zinc finger.
Figure 3
Figure 3. Varied residues at ZF position −5.
(a) Sequence alignment of six C2H2 fingers of PRDM9 allele-C with invariant Ser at position −5. (b) The completely conserved Ser at position −5 in each ZF of PRDM9 interacts with DNA that differs from base pair to base pair. The DNA recognition strand is in green, and the complementary strand in gray. (c) A CpG dinucleotide is recognized jointly by ZF3 and ZF4 of CTCF. (d) CTCF uses Glu at position −7 of ZF4 to recognize 5-methylcytosine (5mC), and Ser at −5 to contact 5mC on the opposite strand. (e) The same CTCF ZF4 can also bind unmodified CpG. (f) Methyl-specific interaction with an A:T-rich sequence by Zfp568. (g–h) Zfp568 ZF5 and ZF6 interaction with five thymine bases (with methyl groups as yellow balls). (i) Methyl-specific interactions with A:T-rich sequence by SALL4. Patient missense mutations associated with Okihiro syndrome are indicated below the sequence. (j–k) SALL4 ZF6 and ZF7 interaction with five thymine bases (with methyl groups in yellow balls). (l) Examples of ZFs containing an Arg–Asp (R-D) pair at positions −7 and −5. (m) In Egr1/Zif268, the Asp at −5 of R-D pair interacts with the cross-strand cytosine via water-mediated interactions. (n) In WT1, the R-D Asp at −5 H-bonds with the cross-strand and cross-triplet adenine. (o) In Klf4, the R-D Asp H-bonds with the cross-strand and cross-triplet cytosine. (p) Examples of ZFs containing Trp or Tyr at position −5. (q) Two orthogonal views of superimposition of five fingers reveal two alternative conformations of Trp/Tyr at −5. (r) In TFIIIA, Trp at −5 points in the same direction as Lys at −4. (s) In ZBTB7A, Tyr at −5 points in the same direction as Asp at −4 and His at −7. (t) In HIC2, Tyr at −5 points in the same direction as Arg at −7. (u) In CTCF ZF5, Tyr at −5 points in the opposite direction to Lys at −4 and Arg at −1. Abbreviations: CTCF = CCCTC-binding factor; ZF = zinc finger.
Figure 4
Figure 4. Large and charged/polar residue at position −5.
(a) CTCF has an Arg at −5 of ZF9 and a Gln at −5 of ZF10. (b) A cross-strand guanine-specific interaction mediated by Arg at −5 of CTCF ZF9 increases the spacing distances between residues at −7, −4, and −1 and DNA. (c) A cross-strand adenine specific interaction mediated by Gln at −5 of CTCF ZF10 increases the spacing distances between residues at −7, −4 and −1 and DNA. (d) Gln at −5 of ZF2 in ZNF410 makes a cross-strand adenine-specific interaction. (e) Asn at −5 of ZF4 in ZNF524 makes cross-strand interactions with two adjacent TA bases. (f) Arg at −5 of ZF1 and Glu at −5 of ZF2 in ZBTB10 both make cross-strand interactions. (g) Lys at −4 of ZF1 in ZBTB7A makes a cross-strand guanine-specific interaction. (h) ZF8 of CTCF spans the minor groove. (i) ZF4 and ZF6 of TFIIIA are positioned across the minor groove. (j) ZF2 of Zfp568 spans the minor groove, while ZF1 is involved in inter-finger interactions. (k) ZF4 of ZBTB24 crosses the major groove without making base-specific contacts. (l) ZF2 of HIC2 traverses the minor groove. In all structures, the DNA recognition strand is in green and the complementary in gray. (m) Sequence alignment of six spacer ZFs including the pre- and post-linker regions. TFIIIA ZF6 has shorter linkers but expanded distances between the two Cys ligands and two His ligands of zinc ion. Abbreviations: CTCF = CCCTC-binding factor; ZF = zinc finger.

Similar articles

Cited by

References

    1. Latchman DS: Transcription factors: an overview. Int J Exp Pathol 1993, 74:417–422. - PMC - PubMed
    1. Wolberger C: How structural biology transformed studies of transcription regulation. J Biol Chem 2021, 296, 100741. - PMC - PubMed
    1. Boumpas P, Merabet S, Carnesecchi J: Integrating transcription and splicing into cell fate: transcription factors on the block. Wiley Interdiscip Rev RNA 2023, 14, e1752. - PubMed
    1. Hecker M, Wagner AH: Transcription factor decoy technology: a therapeutic update. Biochem Pharmacol 2017, 144:29–34. - PubMed
    1. Radaeva M, Ton AT, Hsing M, Ban F, Cherkasov A: Drugging the ‘undruggable’. Therapeutic targeting of protein-DNA interactions with the use of computer-aided drug discovery methods. Drug Discov Today 2021, 26:2660–2679. - PubMed

LinkOut - more resources