Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jun;11(6):1205-1219.
doi: 10.1038/s41477-025-02007-8. Epub 2025 Jun 12.

Transcription factor binding divergence drives transcriptional and phenotypic variation in maize

Affiliations

Transcription factor binding divergence drives transcriptional and phenotypic variation in maize

Mary Galli et al. Nat Plants. 2025 Jun.

Abstract

Regulatory elements are essential components of plant genomes that have shaped the domestication and improvement of modern crops. However, their identity, function and diversity remain poorly characterized, limiting our ability to harness their full power for agricultural advances using induced or natural variation. Here we mapped transcription factor (TF) binding for 200 TFs from 30 families in two distinct maize inbred lines historically used in maize breeding. TF binding comparison revealed widespread differences between inbreds, driven largely by structural variation, that correlated with gene expression changes and explained complex quantitative trait loci such as Vgt1, an important determinant of flowering time, and DICE, an herbivore resistance enhancer. CRISPR-Cas9 editing of TF binding regions validated the function and structure of regulatory regions at various loci controlling plant architecture and biotic resistance. Our maize TF binding catalogue identifies functional regulatory regions and enables collective and comparative analysis, highlighting its value for agricultural improvement.

PubMed Disclaimer

Conflict of interest statement

Competing interests: The authors declare no competing interests.

Figures

Extended Data Fig. 1.
Extended Data Fig. 1.. DAP-seq data samples broadly across maize TF families.
a, Stacked bar graph showing 36 TF families in which at least one family member was tested in DAP-seq. The number of members tested for each is shown with a colored bar; the number of members for which high quality datasets were obtained is shown in parentheses next to the family name. b, Scatterplot showing the number of B73 and Mo17 peaks for each TF. c, Boxplots showing the distribution of the number of target genes obtained for each TF, n = 200 d, Heatmap showing Pearson correlation of TF binding profiles genome wide (10-bp bins). Side annotation colors correspond to TF structural families. TFs in the diversity panel are indicated with an asterisk. e, Average number of reads in peaks and number of peaks with GEM motif are indicated. Each datapoint corresponds to a DAP-seq dataset from either B73 or Mo17. Whiskers in boxplots in panels c and e indicate the minimum and maximum values; central lines correspond to medians and box boundaries denote the upper (25th percentile) and lower (75th percentile) quartiles with n = 200 TFs.
Extended Data Fig. 2.
Extended Data Fig. 2.. Enrichment analysis of selected TFs and GWAS traits.
a, GO enrichment analysis of putative target genes assigned to peaks of selected TFs. b, Bubble plot showing enrichment for NAM panel GWAS SNPs associated with branch zone, cob length, and leaf angle that overlapped SBP30/UB3 DAP-seq peaks (right panel; red boxed areas) and bubble plot showing MADS73 DAP-seq peaks were significantly enriched for SNPs associated with several flowering time-related traits (left panel; black boxed areas). c, Bubble plot showing enrichment for Wisconsin Diversity panel GWAS SNPs associated with branching phenotypes that overlapped SBP30/UB3 DAP-seq peaks (right panel; red boxed areas) and bubble plot showing MADS73 DAP-seq peaks were significantly enriched for SNPs associated with flowering time-related traits (left panel; black boxed areas).
Extended Data Fig. 3.
Extended Data Fig. 3.. B73 TF diversity panel and DAP-CRM analysis.
a, Heatmap showing Pearson correlation of genome-wide binding profiles for 66 TFs in the B73 TF diversity panel. b, Phylogeny, tissue-specific RNA-expression (TPM; transcripts per million), and binding motifs of selected SBP TFs tested in maize DAP-seq. Expression patterns are often different for family members with similar motifs. c, Genome browser screenshot of SBP TFs shown in b at the TSH1 locus. d, Histogram showing the size distribution of DAP-CRMs. The dotted line indicates the average size of all DAP-CRMs. e, Histogram of the number of TFs per DAP-CRM. The black dotted line indicates the average number of TFs per DAP-CRM. The inset shows a close-up on the number of TFs from 20–63 TFs per DAP-CRM. f, Frequency of DAP-CRMs versus randomly shuffled CRMs of the same size within a 3kb window surrounding the TSS.
Extended Data Fig. 4.
Extended Data Fig. 4.. Genome browser views of composite DAP-seq peaks.
a, B73v5 IGV genome browser view of the Vgt1 and RAP2.7 locus showing binding by DAP-seq TFs. Lower panel shows close-up of DAP-CRMs with binding by TFs not displayed as main tracks due to space constraints. b, Genome browser view of the UNBRANCHED2 (UB2) locus showing binding by DAP-seq TFs. c, Genome browser view of the INDETERMINATE GAMETOPHYTE1 (IG1) locus showing binding site location of DAP-seq TFs. In each panel the number below the DAP-CRM track indicates the number of TF peaks present in the DAP-CRM. Please visit website genome browser for a display of all TF tracks http://hlab.bio.nyu.edu/projects/zm_crm/.
Extended Data Fig. 5.
Extended Data Fig. 5.. Quality assessment of B73 and Mo17 DAP-seq datasets.
a, Pearson correlation of TF binding profiles for B73 and Mo17 genomes. Most B73 peak datasets (rows annotated in yellow) clustered with Mo17 peak datasets for the same TFs lifted to B73 (rows annotated in teal), as indicated by the magenta boxes. b, Percentage of non-crossmappable peaks (coordinates could not be lifted) that overlapped with “Non-Alignable” region (NotAligned), duplicated regions (DUP), or structural variants (SV, insertions or deletions greater than 50bp). Each datapoint corresponds to the percentage of peaks from an individual TF overlapping the indicated category, n=200. Whiskers in boxplots in panels b-d indicate the minimum and maximum values; central lines correspond to medians and box boundaries denote the upper (25th percentile) and lower (75th percentile) quartiles with n = 200 TFs. c, Upper: scatterplots showing no difference in average reads per peak or percentage of peaks with motif for shared and specific datasets. Each dot corresponds to a single TF. Lower: distribution of average reads per peak and percentage of peaks with motif for shared and specific datasets. n = 66 d, Percentage of peaks within each category that overlapped SVs and transposable elements (TEs), n=200.
Extended Data Fig. 6.
Extended Data Fig. 6.. Shared and genotype-specific peaks.
a, Percentage of TF diversity panel peaks also called as peaks in Mo17 (shared, grey) and those called only in B73v5 but not Mo17 (B73-specific, blue). b, Percentage of TF diversity panel peaks also called as peaks in B73 (shared, grey) and those called only in Mo17 but not B73 (Mo17-specific, green). Mo17 datasets show a similar percentage of Mo17-specific and Mo17-B73 shared peaks relative to those seen for B73-specific and B73-Mo17 shared peaks (TF order is same as in a). c, JBrowse2 genome browser view showing similar binding intensity of MADS69 in the 5’UTR of CYCLIN13 (Zm00001eb193830) in both genotypes. d, scatterplot showing good correlation of MADS69 reads in B73v5 compared to Mo17 MADS69 reads converted to B73v5 coordinates. e, JBrowse2 view of the MATE18 locus showing B73-specific peaks residing near a GWAS hit. Note that the B73-specific sequence insertion is not annotated as a TE. f, B73-specific peaks more often contain more than four SNPs per peak (top plot) or more SNPs per peak center (20bp region surrounding the peak summit) (bottom plot). Shared peaks more often contain less than four SNPs per peak (top plot) or per the 20bp region surrounding the peak summit (bottom plot). Data points represent individual TFs, n=65. For f and g, whiskers indicate the minimum and maximum values; central lines correspond to medians and box boundaries denote the upper (25th percentile) and lower (75th percentile) quartiles. g, Overlap of B73-specific peaks and B73-Mo17 shared peaks with orthogonal datasets or DAP-CRMs. Both B73-specific peaks and shared peaks show statistically significant enrichment (p value < 0.0001, two-tailed t-test) relative to randomly shuffled peaks, providing support that both B73-specific peaks and shared peaks could be functional. Data points represent individual TFs, n=200.
Extended Data Fig. 7.
Extended Data Fig. 7.. Peak positional variation among genotypes.
a, Comparative JBrowse2 genome browser view showing shared peaks of B73 and Mo17 at the MADS67 locus. b, JBrowse2 genome browser view showing TF binding site positional variants at the ZmATL34 locus. Lower heatmap shows ZmATL34 RNA expression levels (transcripts per million) from several tissues. A 13-fold increase in Mo17 RNA expression was observed in radicle tissue compared to B73. c, Top: percentage of shared TF diversity panel promoter peaks (top 20%) that correspond to positional variants (PosV) greater than 500bp or less than 500bp. Bottom: percentage of top (shared and specific) promoter peaks that corresponded to shared PosVs or B73-specific peaks. Each datapoint corresponds to an individual TF, n=65. Whiskers indicate the minimum and maximum values; central lines correspond to medians and box boundaries denote the upper (25th percentile) and lower (75th percentile) quartiles. d, Plot showing percentage of B73-Mo17 shared promoter peaks (<−10kb) that were associated with differentially expressed genes in various tissues from. Genes associated with PosV<500 bp peaks are shown in dark pink, while those associated with PosV>500 peaks are shown in light pink. Fisher’s exact tests were performed to assess the enrichment of DEGs in PosV>500 bp genes relative to PosV<500 bp genes for each TF and tissue combination, and the resulting p-values were adjusted for multiple testing. e, Heatmap of -log10 FDR-adjusted p-values from one-sided Fisher’s exact tests, evaluating whether root cell type marker genes determined by single cell RNA-seq data are enriched for putative target genes associated with DAP-seq peaks shared between B73 and Mo17, compared to putative targets of B73-specific peaks.
Extended Data Fig. 8.
Extended Data Fig. 8.. DoubleDAP-seq analysis.
a, Schematic showing the doubleDAP-seq assay in which putative heterodimeric TF-DNA complexes can be pulled down and compared to results from a single protein DAP-seq assay to assess binding site specificity differences. b, Neighbor-joining phylogeny based on amino acid similarity of group VII (blue) and group VIII (green) bHLHs from maize and Arabidopsis. Members that were tested in DAP-seq are shown in bold. Three members that were tested in DAP-seq that did not yield any peaks are shown in italics. c, AlphaFold prediction of BHLH85 homodimer. d, Normalized RNA-seq expression (TPM; transcripts per million) of group VIII and VII bHLHs showing heterodimer pairs are co-expressed in many tissues. Normalized expression data from Walley et al.. e, Venn diagram showing low degree of overlap between Q-A-R (subclade VIII) homodimers and Q-A-R:ZmSPT (subclade VII) heterodimers as exemplified by HALO-BHLH113 (Q-A-R bHLH) in single DAP and HALO-BHLH113:SBPTag-BHLH125/ZmSPT1 in doubleDAP. Only 2% of HALO-BHLH113 peaks were captured in the doubleDAP dataset indicating that the heterodimer configuration is preferred relative to the homodimer.
Extended Data Fig. 9.
Extended Data Fig. 9.. CRISPR editing of maize cis-regulatory regions influences phenotype.
a, Genome browser screenshot of TSH1 locus showing binding in upstream promoter region by many TFs. Grey shaded areas upstream of and within CRM169681 indicate regions that were deleted in alleles shown in b. b, Schematic showing two independent CRISPR alleles with deletions and inversions that eliminate at least five TF binding sites (colored bars; colors match TFs shown in genome browser screenshot in a). Lightly shaded grey areas indicate regions deleted in both alleles. c, Images of mature tassels for WT, two tsh1 promoter CRISPR alleles, and tsh1-ref mutant (coding region mutation). Both CRISPR promoter alleles show outgrowth of the tassel sheath leaf (white arrows) that is not present in the WT (black arrow). Tassel branching is reduced in the tsh1-ref, and the CRISPR promoter alleles (white brackets) relative to WT (black bracket). d, SEMs of immature tassels of the CRISPR promoter alleles showing bract outgrowth (white arrowheads). e, qRT-PCR analysis of CRISPR promoter alleles showing reduced expression in immature tassels relative to WT immature tassels. Data are presented as mean values. Error bars represent standard deviation of three biological replicates for each edited allele (two biological replicates for wildtype). Data points correspond to individual biological replicates. Statistical significance determined by two-sided t-test. f, CRISPR editing of BIF2 3’UTR ARF binding sites. A cis-regulatory module (CRM18765) is situated downstream of the BIF2 gene. Within this CRM region, there is a strong ARF peak containing five ARF binding motifs (three TGTCs and two GACAs). Three single guide RNAs (gRNAs) were designed for CRISPR-Cas9 editing that specifically targeted the ARF motifs. Three deletion alleles were obtained. Homozygous plants of these alleles exhibited a weak bif2 phenotype with various degrees of severity (BIF2-crm1cr1 and BIF2-crm1cr3 are more severe than BIF2-crm1cr2) during the early ear development stage, characterized by partial barren patches (white arrows) on the ear primordia as seen by SEM.
Extended Data Fig. 10.
Extended Data Fig. 10.. Sequence alignment of Mo17 tandem-duplicated CRMs.
Nucleotide alignment of B73v3_DICE (single copy), Mo17_CRM119798 (tandem copy 1), and Mo17_CRM119799 (tandem copy 2) showing conservation of individual TF binding motifs.
Fig. 1.
Fig. 1.. Large scale DAP-seq profiling of maize TFs provides high quality genome-wide binding site data for B73v5 and Mo17.
a, Overview of TF binding site mapping and comparison approach b, Phylogenetic tree (based on bZIP amino acid sequences) showing subclade specificity of maize bZIP protein target sites. Loci shown include known targets of maize bZIPs or their Arabidopsis homologs. c, Heatmap of -log10 enrichment p-values computed by GARFIELD for overlaps between DAP-seq TF binding sites and trait-associated SNPs from the maize NAM GWAS panel (GWAS significance threshold 1e-6); only TFs that rank within the three most significant (enrichment p-value<=0.001) for at least one trait are shown.
Fig. 2.
Fig. 2.. Maize TF diversity panel reveals functional cis-regulatory modules.
a, Maize TF diversity panel is represented by mostly distinct motifs. b, Bar graph showing the number of DAP-CRMs and their prevalence in gene features. c, Overlap of DAP-CRMs with orthogonal functional datasets. ACR, accessible chromatin regions from leaf and ear ATAC-seq; MOA, MOA-seq; UMR, unmethylated regions; CNS, conserved non-coding sequences from sorghum. d, Genome browser screenshot of TF binding peaks within the GRASSY TILLERS1 locus which contains eight distinct DAP-CRMs, each with three or more TFs. The estimated region of the prolificacy QTL is shown. e, Heatmap of -log10 enrichment p-values computed by GARFIELD for overlaps between DAP-CRMs consisting of specific TF combinations and maize NAM GWAS trait-associated SNPs for selected traits (GWAS significance threshold of 1e-7).
Fig. 3.
Fig. 3.. Genotype-specific peaks are prevalent in B73 and Mo17 and explain genetically defined QTL.
a, Boxplots showing percentages of genotype-specific and shared peaks in B73 and Mo17. b, Percentages of B73-specific (dark blue) and shared (light blue) peaks that overlap with B73-Mo17 variants assigned as duplicated regions (DUPs), indels (INDELs), SNPs, and structural variants (SVs). For a and b, data points correspond to individual TFs, n=200 TFs; whiskers indicate the minimum and maximum values; central lines correspond to medians and box boundaries denote the upper (25th percentile) and lower (75th percentile) quartiles. c, Comparative genomic view of Vgt1-RAP2.7 locus showing three MADS69 peaks (light blue rectangles) upstream of RAP2.7 in B73v5, one of which is in the genetically defined Vgt1 enhancer. Lower left panel shows a close-up of the region in Mo17 containing the MITE and the corresponding region in B73v5 showing the MADS69 peak. d, Alignment of the MADS69 CArG-box motif with B73 and Mo17 sequences. The MITE insertion disrupts high information content nucleotides within the motif. e, Heatmap of -log10 enrichment p-values computed by GARFIELD for association between B73-specific peaks (stratified by genomic variant categories) and trait-associated SNPs from the maize NAM GWAS panel (GWAS significance threshold 1e-6). f, Boxplots showing motif score differences for B73-specific peaks for diversity panel TFs (n = 68). Whiskers indicate the minimum and maximum values; central lines correspond to medians and box boundaries denote the upper (25th percentile) and lower (75th percentile) quartiles. g, Comparative genomic view showing a 12bp indel at the ARF15 locus that causes ARF binding in B73 but not Mo17. h, Heatmap of -log10 FDR-adjusted p-values from one-sided Fisher’s exact tests, evaluating whether B73 vs. Mo17 differentially expressed genes across tissues are enriched among putative target genes associated with B73-specific DAP-seq peaks relative to putative targets near peaks shared between the two genotypes. Each column of the heatmap corresponds to target genes of one TF.
Fig. 4.
Fig. 4.. DoubleDAP-seq analysis of Q-A-R type bHLHs.
a, Clustal alignment of amino acids in bHLH DNA binding domain of Q-A-R bHLHs (light green; group VIII) and group VIIb bHLHs (light blue) with QAR and HER residues shown. b, Ribbon diagrams of AlphaFold predicted structures of BHLH85 (Q-A-R), BHLH125 (H-E-R), and the heterodimer. Empirically determined DAP-seq motifs are shown below each structure. c, Number of peaks and top motifs called for DAP-seq and doubleDAP-seq datasets where TF:TF indicates HALO-bHLHVIII:SBP-Tag-bHLHVIIb. d, Pearson correlation of genome-wide binding events of the homo- and heterodimers. e, Genome browser screenshots of binding by homo- and heterodimers. f, Summary of protein-protein interactions based on DAP-seq experiments.
Fig. 5.
Fig. 5.. CRISPR induced cis-regulatory variation drives expression and phenotypic differences.
a, JBrowse2 genome browser screenshot of ~10kb region surrounding the DICE enhancer in Mo17 and B73. Two conserved DAP-CRMs (pink highlighted area) and one Mo17-specific CRM (purple highlighted area) that appears to be a partial segmental duplication of the upstream CRM and binding sites (CRM119798) are evident. b, RNA-seq data from 11-day old seedlings from Zhou et al., 2019 showing expression levels of various BX genes located near the DICE enhancer. Gene order is same as on chromosome. Mo17 shows 51-fold greater levels of BX1 expression (yellow box) relative to B73. c, CRISPR editing of Mo17 sequences using multiplexed guides near the DICE enhancer revealed specific TF binding sites important for BX1 expression. Relative expression of BX1 determined by qRT-PCR for six independent alleles is shown on right. Data are presented as mean values of three biological replicates relative to wildtype siblings +/− SD; data points correspond to individual biological replicates. ** indicates significant difference relative to WT sibling in two-sided t-test. d, Schematic depicting individual enhancer components that contribute to enhanced expression of BX1 in Mo17.

Update of

References

    1. Kim S. & Wysocka J. Deciphering the multi-scale, quantitative cis-regulatory code. Mol Cell 83, 373–392 (2023). - PMC - PubMed
    1. Rodriguez-Leal D, Lemmon ZH, Man J, Bartlett ME & Lippman ZB Engineering Quantitative Trait Variation for Crop Improvement by Genome Editing. Cell 171, 470–480 e8 (2017). - PubMed
    1. Liang Y, Liu HJ, Yan J. & Tian F. Natural Variation in Crops: Realized Understanding, Continuing Promise. Annu Rev Plant Biol 72, 357–385 (2021). - PubMed
    1. Wang X. et al. Dissecting cis-regulatory control of quantitative trait variation in a plant stem cell circuit. Nat Plants 7, 419–427 (2021). - PubMed
    1. Meyer RS & Purugganan MD Evolution of crop species: genetics of domestication and diversification. Nat Rev Genet 14, 840–52 (2013). - PubMed

Methods only references:

    1. Bartlett A. et al. Mapping genome-wide transcription-factor binding sites using DAP-seq. Nat Protoc 12, 1659–1672 (2017). - PMC - PubMed
    1. Burdo B. et al. The Maize TFome--development of a transcription factor open reading frame collection for functional genomics. Plant J 80, 356–66 (2014). - PMC - PubMed
    1. Yilmaz A. et al. GRASSIUS: a platform for comparative regulatory genomics across the grasses. Plant Physiol 149, 171–80 (2009). - PMC - PubMed
    1. Mirdita M. et al. ColabFold: making protein folding accessible to all. Nat Methods 19, 679–682 (2022). - PMC - PubMed
    1. Bolger AM, Lohse M. & Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–20 (2014). - PMC - PubMed

Publication types

Associated data

LinkOut - more resources