. 2024 Sep 23;52(17):10161-10179.

doi: 10.1093/nar/gkae585.

DNA sequence and chromatin differentiate sequence-specific transcription factor binding in the human malaria parasite Plasmodium falciparum

Victoria A Bonnell^{1

2

3}, Yuning Zhang^{4

5

6}, Alan S Brown^{1

2

3}, John Horton^{4

5}, Gabrielle A Josling^{1

2

3}, Tsu-Pei Chiu⁷, Remo Rohs^{7

8

9

10}, Shaun Mahony^{1

2}, Raluca Gordân^{4

5

11

12}, Manuel Llinás^{1

2

3

13}

Affiliations

¹ Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA.
² Huck Institutes Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA 16802, USA.
³ Huck Institutes Center for Malaria Research, The Pennsylvania State University, University Park, PA 16802, USA.
⁴ Center for Genomic and Computational Biology, Duke University, Durham, NC 27708, USA.
⁵ Department of Biostatistics and Bioinformatics, Duke University, Durham, NC 27708, USA.
⁶ Program in Computational Biology and Bioinformatics, Duke University, Durham, NC 27708, USA.
⁷ Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA.
⁸ Department of Chemistry, University of Southern California, Los Angeles, CA 90089, USA.
⁹ Department of Physics and Astronomy, University of Southern California, Los Angeles, CA 90089, USA.
¹⁰ Thomas Lord Department of Computer Science, University of Southern California, Los Angeles, CA 90089, USA.
¹¹ Department of Computer Science, Duke University, Durham, NC 27708, USA.
¹² Department of Molecular Genetics and Microbiology, Duke University, Durham, NC 27708, USA.
¹³ Department of Chemistry, The Pennsylvania State University, University Park, PA 16802, USA.

PMID: 38966997
PMCID: PMC11417369
DOI: 10.1093/nar/gkae585

DNA sequence and chromatin differentiate sequence-specific transcription factor binding in the human malaria parasite Plasmodium falciparum

Victoria A Bonnell et al. Nucleic Acids Res. 2024.

. 2024 Sep 23;52(17):10161-10179.

doi: 10.1093/nar/gkae585.

Authors

Affiliations

¹ Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA.
² Huck Institutes Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA 16802, USA.
³ Huck Institutes Center for Malaria Research, The Pennsylvania State University, University Park, PA 16802, USA.
⁴ Center for Genomic and Computational Biology, Duke University, Durham, NC 27708, USA.
⁵ Department of Biostatistics and Bioinformatics, Duke University, Durham, NC 27708, USA.
⁶ Program in Computational Biology and Bioinformatics, Duke University, Durham, NC 27708, USA.
⁷ Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA.
⁸ Department of Chemistry, University of Southern California, Los Angeles, CA 90089, USA.
⁹ Department of Physics and Astronomy, University of Southern California, Los Angeles, CA 90089, USA.
¹⁰ Thomas Lord Department of Computer Science, University of Southern California, Los Angeles, CA 90089, USA.
¹¹ Department of Computer Science, Duke University, Durham, NC 27708, USA.
¹² Department of Molecular Genetics and Microbiology, Duke University, Durham, NC 27708, USA.
¹³ Department of Chemistry, The Pennsylvania State University, University Park, PA 16802, USA.

PMID: 38966997
PMCID: PMC11417369
DOI: 10.1093/nar/gkae585

Erratum in

Correction to 'DNA sequence and chromatin differentiate sequence-specific transcription factor binding in the human malaria parasite Plasmodium falciparum'.
[No authors listed] [No authors listed] Nucleic Acids Res. 2024 Sep 23;52(17):10730. doi: 10.1093/nar/gkae733. Nucleic Acids Res. 2024. PMID: 39180398 Free PMC article. No abstract available.

Abstract

Development of the malaria parasite, Plasmodium falciparum, is regulated by a limited number of sequence-specific transcription factors (TFs). However, the mechanisms by which these TFs recognize genome-wide binding sites is largely unknown. To address TF specificity, we investigated the binding of two TF subsets that either bind CACACA or GTGCAC DNA sequence motifs and further characterized two additional ApiAP2 TFs, PfAP2-G and PfAP2-EXP, which bind unique DNA motifs (GTAC and TGCATGCA). We also interrogated the impact of DNA sequence and chromatin context on P. falciparum TF binding by integrating high-throughput in vitro and in vivo binding assays, DNA shape predictions, epigenetic post-translational modifications, and chromatin accessibility. We found that DNA sequence context minimally impacts binding site selection for paralogous CACACA-binding TFs, while chromatin accessibility, epigenetic patterns, co-factor recruitment, and dimerization correlate with differential binding. In contrast, GTGCAC-binding TFs prefer different DNA sequence context in addition to chromatin dynamics. Finally, we determined that TFs that preferentially bind divergent DNA motifs may bind overlapping genomic regions due to low-affinity binding to other sequence motifs. Our results demonstrate that TF binding site selection relies on a combination of DNA sequence and chromatin features, thereby contributing to the complexity of P. falciparum gene regulatory mechanisms.

PubMed Disclaimer

Figures

**Figure 1.**
Multiple *P. falciparum* TFs with overlapping sequence preferences and design of the *P. falciparum* genomic-context protein-binding microarray (gcPBM). (A) Graphical representation of each TF examined in this study categorized into ‘CACACA-binding’, ‘GTGCAC-binding’, and ‘Other’. Protein lengths (in number of amino acids) are drawn to scale. Predicted protein domains were determined using NCBI Conserved Domain Search or defined by previous literature. Position weight matrix (PWM) logos are from previously published universal protein-binding microarray (PBM) experiments. * Denotes the DNA-binding domains (DBDs) tested in this study; (B) Graphical representation of the *P. falciparum* gcPBM design. Position weight matrix (PWM) data was searched against intergenic regions of the *P. falciparum* genome (Pfalciparum3D7; version 3, release 38) and categorized into four motif types (CACACA, GTGCAC, GTAC and TGCATGCA). All sequences were replicated eight(*) or six(**) times. Microarray graphic was created with BioRender.com.

**Figure 2.**
CACACA-binding AP2 domains have moderate differences in sequence context preferences at medium-to-low affinities. (A) Binding intensity distributions for CACACA probes and respective negative control probes for AP2-LT (CACACA probes [Green] and negative control probes [Grey]). Significantly different binding defined using a two-tailed Mann–Whitney test [P-value < 0.0001]); (B) four-color plot of the top 100 bound probes by AP2-LT with enriched motif above and calculated E-value and number of occurrences below. Color representations: A (red), C (blue), G (yellow) and T (green); (C) comparison of the gcPBM binding intensities for AP2-LT technical replicates (Pearson correlation: R²= 0.95) (CACACA probes [green] and negative control probes [grey]); (D) comparison of the gcPBM results for PF3D7_0420300_D1 versus AP2-LT (Pearson: R²= 0.80); (E) comparison of the gcPBM results for PF3D7_0420300_D1 versus AP2-HC (Pearson: R²= 0.86); (F) comparison of the gcPBM results for AP2-LT versus AP2-HC (Pearson: R²= 0.85); (G) binding intensity distributions from AP2-LT for negative control probes (grey), all CACACA probes (green), the AP2-LT extended motif probes (blue) and all 8-mer CACACA probes represented in the gcPBM (green; right of the vertical line). Dotted lines in each violin plot are the calculated mean; (H) *Left:* four-color plot of all CACACA probes above the threshold (defined by the 90th percentile of negative control probes) sorted by fold change (log₂[AP2-LT/AP2-HC]). *Right:* zoom in on the top 100 differentially bound sites by AP2-LT (*top right*) and AP2-HC (*bottom right*) with enriched motifs, calculated E-values, and motif occurrence counts within those top 100 sites.

**Figure 3.**
AP2-LT mostly binds to intergenic regions upstream of late-stage genes demarcated by chromatin accessibility and active epigenetic modifications *in vivo*. (A) Four-color plot of TGCAC-centered AP2-LT bound sites, enriched DNA motif above, calculated motif peak occurrence, and calculated motif E-value below. Color representations: A (red), C (blue), G (yellow) and T (green); (B) four-color plot of TGCACN₅TGCAC-centered AP2-LT bound sites, enriched DNA motif above, calculated motif peak occurrence and calculated motif E-value below; (C) percent of MACS2-called peaks that overlap with 5′-upstream regions (purple), gene coding sequences (green), or 3′- downstream regions (peach); (D) mean-centered transcript abundance profile of 275 AP2-LT gene targets (mean [black] with one standard deviation [grey]) compared to the mean-centered transcript abundance profile of the AP2-LT transcript (bold green). Calculated R² and P-value from Pearson correlation between putative gene targets and AP2-LT profiles bottom right; (E) chromatin accessibility across eight asexual stage timepoints (5 hpi, 10 hpi, 15 hpi, 20 hpi, 25 hpi, 30 hpi, 35 hpi and 40 hpi) for AP2-LT ChIP-bound (green) and ChIP-unbound (grey) sites. Central line plotted is the median normalized read count over gDNA control. Upper and lower lines are the 75th percentile and 25th percentile, respectively; (F) profile plot of the mean ChIP-seq fold enrichment (Log₂[IP/Input]) of five active epigenetic marks (H2A.Z, H3K9ac, H3K4me3, H3K27ac and H3K18ac) for ChIP-bound (green) and ChIP-unbound (grey) sites; and (G) profile plot of the mean ChIP-seq fold enrichment (Log₂[IP/Input]) of five repressive epigenetic marks (H3K9me3, H3K36me2/3, H4K20me3 and H3K4me1) for ChIP-bound (green) and ChIP-unbound (grey) sites.

**Figure 4.**
Binding specificity is dependent on nucleotides proximal to the GTGCAC motif. (A) Binding intensity distributions for GTGCAC probes and the respective negative control probes for AP2-I_D3. (GTGCAC probes [blue] and negative control probes [grey]). Significantly different binding defined using a two-tailed Mann–Whitney test [P-value < 0.0001]); (B) four-color plot of top 100 bound probes by AP2-I_D3 with enriched motif above and calculated E-value and number of occurrences below. Color representations: A (red), C (blue), G (yellow) and T (green); (C) comparison of the binding intensities for AP2-I_D3 technical replicates (Pearson correlation: R²= 0.912). (Negative control probes [grey], GTGCAC probes [blue], HDP1-preferred TGTGCACA probes [orange], SIP2_D1-preferred GGTGCAC probes [purple], and AP2-I_D3-preferred AGTGCATTA probes [green]; (D) comparison between SIP2_D1 and AP2-I_D3 (Pearson: R²= 0.442). (E) Comparison between SIP2_D1 and HDP1 (Pearson: R²= 0.386). (F) Comparison between AP2-I_D3 and HDP1 (Pearson: R²= 0.514); (G) binding intensity distributions from AP2-I_D3 for GTGCAC negative control probes (grey), all GTGCAC probes (blue), the extended motif probes by all three GTGCAC-binding TFs (AP2-I_D3 [green], SIP2_D1 [purple] and HDP1 [orange]), and 8-mer GTGCAC probes represented in the gcPBM (blue; right of the vertical line); (H) EMSA of AP2-I_D3 binding to a AGTGCATTA probe with increasing numbers of mutations to the extended motif. Protein–DNA interaction graphic was created with BioRender.com; (I) calculated minor groove width (MGW) predictions across all AGTGCATTA probes (green) and all GGTGCAC probes (purple). Solid line represents the mean and dotted lines encompassing the shaded area is one standard deviation. *Denotes statistically significant differences (P-value < 0.05) between means (two-sided Wilcoxon rank sum test). N = IUPAC for any nucleotide. Y = IUPAC for C or T nucleotides; (J) calculated electrostatic potential (EP) predictions across all AGTGCATTA probes (green) and all GGTGCAC probes (purple). Solid line represents the mean and dotted lines encompassing the shaded area is one standard deviation. *Denotes statistically significant differences (P-value < 0.05) between means (two-sided Wilcoxon rank sum test). N = IUPAC for any nucleotide. Y = IUPAC for C or T nucleotides.

**Figure 5.**
Overlapping *in vitro* binding preferences across DNA motif types. (A) AP2-G binding intensity distributions for GTAC negative control probes (grey), all GTAC probes (yellow) and all GTGCAC probes (blue); (B) AP2-I_D3 binding intensity distributions for GTGCAC negative control probes (grey), all GTGCAC probes (blue) and all GTAC probes (yellow); (C) comparison of the binding intensities for AP2-G and AP2-I_D3. Negative control probes (grey), GTAC probes (yellow), GTGCAC probes (blue), AP2-G ChIP-bound sites (dark yellow), AP2-I ChIP-bound sites (dark blue) and AP2-G/AP2-I co-bound ChIP-bound sites (black triangles); (D) AP2-LT binding intensity distributions for CACACA negative control probes (grey), all CACACA probes (green), all GTGCAC probes (blue) and all TGCATGCA probes (purple); (E) AP2-I_D3 binding intensity distributions for GTGCAC negative control probes (grey), all GTGCAC probes (blue), all CACACA probes (green), and all TGCATGCA probes (purple); (F) AP2-EXP binding intensity distributions for TGCATGCA negative control probes (grey), all TGCATGCA probes (purple), all GTGCAC probes (blue), and all CACACA probes (green).

See this image and copyright information in PMC

References

1. Slattery M., Zhou T., Yang L., Dantas Machado A.C., Gordân R., Rohs R.. Absence of a simple code : how transcription factors read the genome. Trends Biochem. Sci. 2014; 39:381–399. - PMC - PubMed
1. Jones S., Heyningen P.Van, Berman H.M., Thornton J.M. Protein-DNA interactions : a structural analysis. J. Mol. Biol. 1999; 287:877–896. - PubMed
1. Kim T.H., Ren B.. Genome-wide analysis of protein-DNA interactions. Annu. Rev. Genomics Hum. Genet. 2006; 7:81–102. - PubMed
1. Luscombe N.M., Laskowski R.A., Thornton J.M.. Amino acid-base interactions: a three-dimensional analysis of protein-DNA interactions at an atomic level. Nucleic Acids Res. 2001; 29:2860–2874. - PMC - PubMed
1. Rohs R., West S.M., Sosinsky A., Liu P., Mann R.S., Honig B.. The role of DNA shape in protein-DNA recognition. Nature. 2009; 461:1248–1253. - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
- PubMed Central
- Silverchair Information Systems
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

DNA sequence and chromatin differentiate sequence-specific transcription factor binding in the human malaria parasite Plasmodium falciparum

Affiliations

DNA sequence and chromatin differentiate sequence-specific transcription factor binding in the human malaria parasite Plasmodium falciparum

Authors

Affiliations

Erratum in

Abstract

Figures

References

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Molecular Biology Databases

Miscellaneous