Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Sep 23;52(17):10161-10179.
doi: 10.1093/nar/gkae585.

DNA sequence and chromatin differentiate sequence-specific transcription factor binding in the human malaria parasite Plasmodium falciparum

Affiliations

DNA sequence and chromatin differentiate sequence-specific transcription factor binding in the human malaria parasite Plasmodium falciparum

Victoria A Bonnell et al. Nucleic Acids Res. .

Erratum in

Abstract

Development of the malaria parasite, Plasmodium falciparum, is regulated by a limited number of sequence-specific transcription factors (TFs). However, the mechanisms by which these TFs recognize genome-wide binding sites is largely unknown. To address TF specificity, we investigated the binding of two TF subsets that either bind CACACA or GTGCAC DNA sequence motifs and further characterized two additional ApiAP2 TFs, PfAP2-G and PfAP2-EXP, which bind unique DNA motifs (GTAC and TGCATGCA). We also interrogated the impact of DNA sequence and chromatin context on P. falciparum TF binding by integrating high-throughput in vitro and in vivo binding assays, DNA shape predictions, epigenetic post-translational modifications, and chromatin accessibility. We found that DNA sequence context minimally impacts binding site selection for paralogous CACACA-binding TFs, while chromatin accessibility, epigenetic patterns, co-factor recruitment, and dimerization correlate with differential binding. In contrast, GTGCAC-binding TFs prefer different DNA sequence context in addition to chromatin dynamics. Finally, we determined that TFs that preferentially bind divergent DNA motifs may bind overlapping genomic regions due to low-affinity binding to other sequence motifs. Our results demonstrate that TF binding site selection relies on a combination of DNA sequence and chromatin features, thereby contributing to the complexity of P. falciparum gene regulatory mechanisms.

PubMed Disclaimer

Figures

Graphical Abstract
Graphical Abstract
Figure 1.
Figure 1.
Multiple P. falciparum TFs with overlapping sequence preferences and design of the P. falciparum genomic-context protein-binding microarray (gcPBM). (A) Graphical representation of each TF examined in this study categorized into ‘CACACA-binding’, ‘GTGCAC-binding’, and ‘Other’. Protein lengths (in number of amino acids) are drawn to scale. Predicted protein domains were determined using NCBI Conserved Domain Search or defined by previous literature. Position weight matrix (PWM) logos are from previously published universal protein-binding microarray (PBM) experiments. * Denotes the DNA-binding domains (DBDs) tested in this study; (B) Graphical representation of the P. falciparum gcPBM design. Position weight matrix (PWM) data was searched against intergenic regions of the P. falciparum genome (Pfalciparum3D7; version 3, release 38) and categorized into four motif types (CACACA, GTGCAC, GTAC and TGCATGCA). All sequences were replicated eight(*) or six(**) times. Microarray graphic was created with BioRender.com.
Figure 2.
Figure 2.
CACACA-binding AP2 domains have moderate differences in sequence context preferences at medium-to-low affinities. (A) Binding intensity distributions for CACACA probes and respective negative control probes for AP2-LT (CACACA probes [Green] and negative control probes [Grey]). Significantly different binding defined using a two-tailed Mann–Whitney test [P-value < 0.0001]); (B) four-color plot of the top 100 bound probes by AP2-LT with enriched motif above and calculated E-value and number of occurrences below. Color representations: A (red), C (blue), G (yellow) and T (green); (C) comparison of the gcPBM binding intensities for AP2-LT technical replicates (Pearson correlation: R2= 0.95) (CACACA probes [green] and negative control probes [grey]); (D) comparison of the gcPBM results for PF3D7_0420300_D1 versus AP2-LT (Pearson: R2= 0.80); (E) comparison of the gcPBM results for PF3D7_0420300_D1 versus AP2-HC (Pearson: R2= 0.86); (F) comparison of the gcPBM results for AP2-LT versus AP2-HC (Pearson: R2= 0.85); (G) binding intensity distributions from AP2-LT for negative control probes (grey), all CACACA probes (green), the AP2-LT extended motif probes (blue) and all 8-mer CACACA probes represented in the gcPBM (green; right of the vertical line). Dotted lines in each violin plot are the calculated mean; (H) Left: four-color plot of all CACACA probes above the threshold (defined by the 90th percentile of negative control probes) sorted by fold change (log2[AP2-LT/AP2-HC]). Right: zoom in on the top 100 differentially bound sites by AP2-LT (top right) and AP2-HC (bottom right) with enriched motifs, calculated E-values, and motif occurrence counts within those top 100 sites.
Figure 3.
Figure 3.
AP2-LT mostly binds to intergenic regions upstream of late-stage genes demarcated by chromatin accessibility and active epigenetic modifications in vivo. (A) Four-color plot of TGCAC-centered AP2-LT bound sites, enriched DNA motif above, calculated motif peak occurrence, and calculated motif E-value below. Color representations: A (red), C (blue), G (yellow) and T (green); (B) four-color plot of TGCACN5TGCAC-centered AP2-LT bound sites, enriched DNA motif above, calculated motif peak occurrence and calculated motif E-value below; (C) percent of MACS2-called peaks that overlap with 5′-upstream regions (purple), gene coding sequences (green), or 3′- downstream regions (peach); (D) mean-centered transcript abundance profile of 275 AP2-LT gene targets (mean [black] with one standard deviation [grey]) compared to the mean-centered transcript abundance profile of the AP2-LT transcript (bold green). Calculated R2 and P-value from Pearson correlation between putative gene targets and AP2-LT profiles bottom right; (E) chromatin accessibility across eight asexual stage timepoints (5 hpi, 10 hpi, 15 hpi, 20 hpi, 25 hpi, 30 hpi, 35 hpi and 40 hpi) for AP2-LT ChIP-bound (green) and ChIP-unbound (grey) sites. Central line plotted is the median normalized read count over gDNA control. Upper and lower lines are the 75th percentile and 25th percentile, respectively; (F) profile plot of the mean ChIP-seq fold enrichment (Log2[IP/Input]) of five active epigenetic marks (H2A.Z, H3K9ac, H3K4me3, H3K27ac and H3K18ac) for ChIP-bound (green) and ChIP-unbound (grey) sites; and (G) profile plot of the mean ChIP-seq fold enrichment (Log2[IP/Input]) of five repressive epigenetic marks (H3K9me3, H3K36me2/3, H4K20me3 and H3K4me1) for ChIP-bound (green) and ChIP-unbound (grey) sites.
Figure 4.
Figure 4.
Binding specificity is dependent on nucleotides proximal to the GTGCAC motif. (A) Binding intensity distributions for GTGCAC probes and the respective negative control probes for AP2-I_D3. (GTGCAC probes [blue] and negative control probes [grey]). Significantly different binding defined using a two-tailed Mann–Whitney test [P-value < 0.0001]); (B) four-color plot of top 100 bound probes by AP2-I_D3 with enriched motif above and calculated E-value and number of occurrences below. Color representations: A (red), C (blue), G (yellow) and T (green); (C) comparison of the binding intensities for AP2-I_D3 technical replicates (Pearson correlation: R2= 0.912). (Negative control probes [grey], GTGCAC probes [blue], HDP1-preferred TGTGCACA probes [orange], SIP2_D1-preferred GGTGCAC probes [purple], and AP2-I_D3-preferred AGTGCATTA probes [green]; (D) comparison between SIP2_D1 and AP2-I_D3 (Pearson: R2= 0.442). (E) Comparison between SIP2_D1 and HDP1 (Pearson: R2= 0.386). (F) Comparison between AP2-I_D3 and HDP1 (Pearson: R2= 0.514); (G) binding intensity distributions from AP2-I_D3 for GTGCAC negative control probes (grey), all GTGCAC probes (blue), the extended motif probes by all three GTGCAC-binding TFs (AP2-I_D3 [green], SIP2_D1 [purple] and HDP1 [orange]), and 8-mer GTGCAC probes represented in the gcPBM (blue; right of the vertical line); (H) EMSA of AP2-I_D3 binding to a AGTGCATTA probe with increasing numbers of mutations to the extended motif. Protein–DNA interaction graphic was created with BioRender.com; (I) calculated minor groove width (MGW) predictions across all AGTGCATTA probes (green) and all GGTGCAC probes (purple). Solid line represents the mean and dotted lines encompassing the shaded area is one standard deviation. *Denotes statistically significant differences (P-value < 0.05) between means (two-sided Wilcoxon rank sum test). N = IUPAC for any nucleotide. Y = IUPAC for C or T nucleotides; (J) calculated electrostatic potential (EP) predictions across all AGTGCATTA probes (green) and all GGTGCAC probes (purple). Solid line represents the mean and dotted lines encompassing the shaded area is one standard deviation. *Denotes statistically significant differences (P-value < 0.05) between means (two-sided Wilcoxon rank sum test). N = IUPAC for any nucleotide. Y = IUPAC for C or T nucleotides.
Figure 5.
Figure 5.
Overlapping in vitro binding preferences across DNA motif types. (A) AP2-G binding intensity distributions for GTAC negative control probes (grey), all GTAC probes (yellow) and all GTGCAC probes (blue); (B) AP2-I_D3 binding intensity distributions for GTGCAC negative control probes (grey), all GTGCAC probes (blue) and all GTAC probes (yellow); (C) comparison of the binding intensities for AP2-G and AP2-I_D3. Negative control probes (grey), GTAC probes (yellow), GTGCAC probes (blue), AP2-G ChIP-bound sites (dark yellow), AP2-I ChIP-bound sites (dark blue) and AP2-G/AP2-I co-bound ChIP-bound sites (black triangles); (D) AP2-LT binding intensity distributions for CACACA negative control probes (grey), all CACACA probes (green), all GTGCAC probes (blue) and all TGCATGCA probes (purple); (E) AP2-I_D3 binding intensity distributions for GTGCAC negative control probes (grey), all GTGCAC probes (blue), all CACACA probes (green), and all TGCATGCA probes (purple); (F) AP2-EXP binding intensity distributions for TGCATGCA negative control probes (grey), all TGCATGCA probes (purple), all GTGCAC probes (blue), and all CACACA probes (green).

References

    1. Slattery M., Zhou T., Yang L., Dantas Machado A.C., Gordân R., Rohs R.. Absence of a simple code : how transcription factors read the genome. Trends Biochem. Sci. 2014; 39:381–399. - PMC - PubMed
    1. Jones S., Heyningen P.Van, Berman H.M., Thornton J.M. Protein-DNA interactions : a structural analysis. J. Mol. Biol. 1999; 287:877–896. - PubMed
    1. Kim T.H., Ren B.. Genome-wide analysis of protein-DNA interactions. Annu. Rev. Genomics Hum. Genet. 2006; 7:81–102. - PubMed
    1. Luscombe N.M., Laskowski R.A., Thornton J.M.. Amino acid-base interactions: a three-dimensional analysis of protein-DNA interactions at an atomic level. Nucleic Acids Res. 2001; 29:2860–2874. - PMC - PubMed
    1. Rohs R., West S.M., Sosinsky A., Liu P., Mann R.S., Honig B.. The role of DNA shape in protein-DNA recognition. Nature. 2009; 461:1248–1253. - PMC - PubMed

MeSH terms