Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 May 30;198(2):kiaf205.
doi: 10.1093/plphys/kiaf205.

Many transcription factor families have evolutionarily conserved binding motifs in plants

Affiliations

Many transcription factor families have evolutionarily conserved binding motifs in plants

Sanja Zenker et al. Plant Physiol. .

Abstract

Transcription factors control gene expression during development and in response to a broad range of internal and external stimuli. They regulate promoter activity by directly binding cis-regulatory elements in DNA. The angiosperm Arabidopsis (Arabidopsis thaliana) contains more than 1,500 annotated transcription factors, each containing a DNA-binding domain that is used to define transcription factor families. Analyzing the binding motifs of 686 and the binding sites of 335 Arabidopsis transcription factors, as well as motifs of 92 transcription factors from other plants, we identified a constrained vocabulary of 74 conserved motifs spanning 50 families in plants. Among 21 transcription factor families, we found 1 core motif for all analyzed members and between 2% and 72% overlapping binding sites. Five families show conservation of the motif along phylogenetic clades. Five families, including the C2H2 zinc finger family, show high diversity among motifs in plants, suggesting potential for the neofunctionalization of duplicated transcription factors based on the motif recognized. We tested whether conserved motifs remained conserved since at least 450 million years ago by determining the binding motifs of 17 transcription factors from 11 families in Marchantia (Marchantia polymorpha) using amplified DNA affinity purification sequencing. We detected nearly identical binding motifs as predicted from the angiosperm data. Our findings show a large repertoire of overlapping binding sites within a transcription factor family and species and a high degree of binding motif conservation for at least 450 million years, indicating more potential for evolution in cis- rather than trans-regulatory elements.

PubMed Disclaimer

Conflict of interest statement

Conflict of interest statement. None declared.

Figures

Figure 1.
Figure 1.
TFBM conservation analysis. A) Representative visualization of (amplified) DAP-Seq binding events (peaks) of TFs (data reanalyzed from O’Malley et al. 2016; López-Vidriero et al. 2021) relative to the TSS on the AT2G46310 (CRF5) promoter (all Arabidopsis [A. thaliana] promoter figures as interactive visualizations are deposited under https://doi.org/10.4119/unibi/2982196). Binding events are colored by log10-tranformed signal value and open chromatin regions are depicted by four lines at the bottom. B) Histogram showing the number of experimentally determined in vitro binding sites of 534 TFs (same data as in A) on 2 different definitions (−1 kilobase [kb] and −1 kb, + 500 bases [b]) of 27,206 nuclear protein coding gene promoters in Arabidopsis (1 outlier each with >1,000 binding sites not shown). C) Schematic phylogenetic tree depicting major groups with TFBM data available (black) and chlorophytes as an outgroup (gray). The angiosperm Arabidopsis and bryophyte Marchantia (M. polymorpha) are highlighted as representative model organisms. Approximate million years from the last common ancestor below. D) Workflow for TFBM conservation analysis. Parts of this figure were created with BioRender.com. DAP-Seq, DNA affinity purification sequencing; TFs, transcription factors; TSS, transcription start site.
Figure 2.
Figure 2.
Analysis of TF families with high TFBM conservation. A) Unrooted phylogenetic tree calculated using RAxML of full-length amino acid sequences aligned with MUSCLE of the WRKY family TFs with TFBMs. Support values at the nodes are based on 1,000 bootstrap iterations. The scale bar is in units of amino acid substitutions per site. Clade annotations are from Eulgem et al. (2000) and Interpro domain annotations. Collapsed phylogenetic tree is shown with indication of orthologous genes from Marchantia polymorpha (M.p.) in each subgroup. B) Consensus TFBMs of conserved families generated by merging individual TFBMs of TF family members for each TF family. Base height corresponds to information content. C) Merged peak set of 33 WRKY (amplified) DAP-Seq samples showing the subsets of shared and unique peaks. Darker color corresponds to a higher percentage of TFs sharing this peak subset and tile size encodes relative number of peaks in a given subset. D) Expression correlation of 45 WRKY family members in A. thaliana. The Pearson correlation coefficient is indicated by color and dot size. ARF, auxin response factor; AS2LOB, ASYMMETRIC LEAVES2/lateral organ boundary domain; BBRBPC, BARLEY B RECOMBINANT/BASIC PENTACYSTEINE; BES1, BRI1-EMS-SUPPRESSOR; bHLH TCP, basic helix-loop-helix TEOSINTE BRANCHED1/CYCLOIDEA/PROLIFERATING CELL FACTOR; Dof, DNA binding 1 finger; CAMTA, CALMODULIN-BINDING TRANSCRIPTION ACTIVATOR; CPP, cysteine-rich polycomb-like protein; E2FDP, eukaryotic 2 transcription factor/dimerization partner; GARP, GOLDEN2/ARR-B/PSR1; HD, homeodomain; HSF, heat shock factor; MADS MIKC; NAC, NAM/ATAF1/CUC2; SBP, SQUAMOSA promoter binding protein.
Figure 3.
Figure 3.
Analysis of TF families with semi-conserved TFBMs. A) Unrooted phylogenetic tree calculated using RAxML of full-length amino acid sequences aligned with MUSCLE of the MYB-related family with support values based on 1,000 bootstraps. The scale bar is in units of amino acid substitutions per site. Interpro domain annotations indicate structural similarities. Clade annotations are from Chang et al. (2020) and domain annotations are from Interpro. Collapsed phylogenetic tree (not drawn to scale) with indication of orthologous TFs from Marchantia polymorpha (M.p.) in each subgroup. B) Collapsed phylogenetic tress (not drawn to scale) with orthologous M. polymorpha TFs found in the different semi-conserved TF families. Light gray indicates orthologues were not found. TFBMs represent the consensus TFBMs present in the clades. C) Merged peak sets for each of the MYB-related TFBM subgroup TF samples showing the subsets of shared and unique peaks. Darker color corresponds to a higher percentage of TFs sharing this peak subset and tile size encodes relative number of peaks in a given subset. ARID, AT-rich interaction domain; bHLH, basic helix-loop-helix; bZIP, basic leucine zipper; MYB, MYELOBLASTOSIS.
Figure 4.
Figure 4.
The trihelix family is an example of a TF family with diverse TFBMs. Unrooted phylogenetic tree calculated using RAxML of full-length amino acid sequences aligned with MUSCLE of the Trihelix TF family members with TFBM determined. Support values at the nodes are based on 1,000 bootstrap iterations and domain annotations from Interpro. The scale bar is in units of amino acid substitutions per site.
Figure 5.
Figure 5.
TFBMs in M. polymorpha show high conservation in comparison with A. thaliana TFBMs. A) Consensus TFBMs determined for each family (left) and experimentally determined TFBMs of one orthologous TF family member in M. polymorpha by ampDAP-Seq (right). B) Consensus TFBM of each clade (left) and TFBM of one orthologous TF in M. polymorpha from each subclade. Tree not drawn to scale. Parts of this figure were created with BioRender.com. BBRBPC, BARLEY B RECOMBINANT/BASIC PENTACYSTEINE; E2FDP, eukaryotic 2 transcription factor/dimerization partner; HD, homeodomain; NAC, NAM/ATAF1/CUC2; MYB, MYELOBLASTOSIS.

Similar articles

Cited by

References

    1. Aggarwal P, Das Gupta M, Joseph AP, Chatterjee N, Srinivasan N, Nath U. Identification of specific DNA binding residues in the TCP family of transcription factors in Arabidopsis. Plant Cell. 2010:22(4):1174–1189. 10.1105/tpc.109.066647 - DOI - PMC - PubMed
    1. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990:215(3):403–410. 10.1016/S0022-2836(05)80360-2 - DOI - PubMed
    1. Anashkina AA. Protein-DNA recognition mechanisms and specificity. Biophys Rev. 2023:15(5):1007–1014. 10.1007/s12551-023-01137-7 - DOI - PMC - PubMed
    1. Appelhagen I, Jahns O, Bartelniewoehner L, Sagasser M, Weisshaar B, Stracke R. Leucoanthocyanidin Dioxygenase in Arabidopsis thaliana: characterization of mutant alleles and regulation by MYB–BHLH–TTG1 transcription factor complexes. Gene. 2011:484(1–2):61–68. 10.1016/j.gene.2011.05.031 - DOI - PubMed
    1. Avsec Ž, Agarwal V, Visentin D, Ledsam JR, Grabska-Barwinska A, Taylor KR, Assael Y, Jumper J, Kohli P, Kelley DR. Effective gene expression prediction from sequence by integrating long-range interactions. Nat Methods. 2021:18(10):1196–1203. 10.1038/s41592-021-01252-x - DOI - PMC - PubMed

LinkOut - more resources