. 2018 Aug 15;37(16):e98896.

doi: 10.15252/embj.201798896. Epub 2018 Jul 13.

A high-throughput method to identify trans-activation domains within transcription factor sequences

Affiliations

¹ Research Institute of Molecular Pathology (IMP), Vienna BioCenter (VBC), Vienna, Austria.
² Institute of Molecular Biotechnology (IMBA), Vienna BioCenter (VBC), Vienna, Austria.
³ Research Institute of Molecular Pathology (IMP), Vienna BioCenter (VBC), Vienna, Austria stark@starklab.org.
⁴ Medical University of Vienna, Vienna BioCenter (VBC), Vienna, Austria.

PMID: 30006452
PMCID: PMC6092621
DOI: 10.15252/embj.201798896

A high-throughput method to identify trans-activation domains within transcription factor sequences

Cosmas D Arnold et al. EMBO J. 2018.

. 2018 Aug 15;37(16):e98896.

doi: 10.15252/embj.201798896. Epub 2018 Jul 13.

Authors

Affiliations

¹ Research Institute of Molecular Pathology (IMP), Vienna BioCenter (VBC), Vienna, Austria.
² Institute of Molecular Biotechnology (IMBA), Vienna BioCenter (VBC), Vienna, Austria.
³ Research Institute of Molecular Pathology (IMP), Vienna BioCenter (VBC), Vienna, Austria stark@starklab.org.
⁴ Medical University of Vienna, Vienna BioCenter (VBC), Vienna, Austria.

PMID: 30006452
PMCID: PMC6092621
DOI: 10.15252/embj.201798896

Abstract

Even though transcription factors (TFs) are central players of gene regulation and have been extensively studied, their regulatory trans-activation domains (tADs) often remain unknown and a systematic functional characterization of tADs is lacking. Here, we present a novel high-throughput approach tAD-seq to functionally test thousands of candidate tADs from different TFs in parallel. The tADs we identify by pooled screening validate in individual luciferase assays, whereas neutral regions do not. Interestingly, the tADs are found at arbitrary positions within the TF sequences and can contain amino acid (e.g., glutamine) repeat regions or overlap structured domains, including helix-loop-helix domains that are typically annotated as DNA-binding. We also identified tADs in the non-native reading frames, confirming that random sequences can function as tADs, albeit weakly. The identification of tADs as short protein sequences sufficient for transcription activation will enable the systematic study of TF function, which-particularly for TFs of different transcription activating functionalities-is still poorly understood.

Keywords: glutamine‐rich regions; high‐throughput functional screen; transcription; transcription factor; trans‐activation domain.

PubMed Disclaimer

Figures

**Figure 1. Trans‐activation domains (tADs) and their identification by tAD‐seq**
TFs are typically modular with two distinct functionalities—they bind to specific DNA sequences via their DNA‐binding domains (DBDs, blue), and trans‐activate transcription via trans‐activation domains (tADs, red). tADs but not DBDs are sufficient to activate transcription when recruited to the promoter of a reporter gene (e.g., luciferase) via a heterologous DBD, here the Gal4‐DBD.
Detailed schematic overview of the tAD‐seq workflow, including Gal4‐DBD‐candidate library cloning (tAD‐seq library), co‐transfection of library and 4xUAS‐GFP reporter plasmids, separation of GFP⁺ (tAD enriched) and GFP⁻ cells by FACS, and NGS‐based tAD identification by quantification of Gal4‐DBD‐candidate transcripts in GFP⁺ vs. GFP⁻ cells.
TF‐mRNA enrichment in GFP⁺ vs. GFP⁻ cells reflects the TFs activating and repressing functionalities. Left: heat map depicting transcription activating and repressing functions (shades of red, see color legend) of the ten indicated TFs tested individually by recruitment to 4xUAS‐luciferase reporters (data from Stampfel *et al*, 2015). Right: bar plot indicating relative distribution of TF transcripts between GFP⁺ and GFP⁻ cells as measured by RT–qPCR (n = 3, error bars: s.d.) after transfecting a pool of ten TFs into S2 cells and separation of GFP⁺ and GFP⁻ cells by FACS.

**Figure 2. tAD‐seq recovers the known tAD of MTF‐1 from a complex pool of candidates**
UCSC Genome Browser (GB) screenshots (dedicated genome containing only TF CDSs and flanking plasmid backbone sequence, see Materials and Methods) displaying candidate fragment coverage for GFP⁺ cells (left) and GFP⁻ cells (right) in a reading frame‐specific manner (+1 frame, purple and non‐native frames, orange). The black bar on top indicates the full‐length CDS of MTF‐1. The colored bars below indicate the regions individually tested in luciferase assays (red = tAD; blue = neutral control region). High coverage is only observed in the native (+1) frame at the position of the known tAD.
Relative luciferase activity of sequences that overlap the MTF‐1 tAD (red) or a neutral region (blue, see also A). Shown are the normalized luciferase activities for tAD candidates and neutral fragments (Gal4‐DBD‐candidate) normalized to a negative control (Gal4‐DBD‐GFP; n = 4, error bars: s.d., P‐value: two‐sided Student's t‐test vs. neutral region; FPM fragments per million).

**Figure 3. tAD‐seq identifies novel tADs from a complex pool of candidates**
A–G
Candidate fragment coverage (+1 frame) from GFP⁺ (top) and GFP⁻ (bottom) cells for Bteb2 (A), HLH3B (B), CG14451 (C), slou (D), bin (E), CG17186 (F), and cas (G). The dark gray bar at the bottom indicates the called tAD region and the red and blue bars the positions of the tAD candidate and the neutral control region, respectively, tested in luciferase assays. Normalized luciferase activities (normalized to GFP control) of tAD candidate and neutral control are shown on the right (n = 4 for Bteb2, HLH3B, and CG14451; n = 3 for slou, bin, CG17186, and cas; error bars: s.d., P‐value: two‐sided Student's t‐test vs. neutral region).
H
Summary of individual tAD activity tests by luciferase assays for candidate tADs and neutral regions predicted by tAD‐seq (see Table EV3). Fourteen out of 21 predicted tADs (red) are active (enrichment > twofold above GFP; P < 0.05; two‐sided Student's t‐test vs. GFP control) vs. zero of eight predicted neutral regions (difference between candidate tADs and neutral regions: hypergeometric P‐value with a pseudo‐count of 1 for neutral regions).

**Figure 4. tAD‐seq identifies tADs in non‐native reading frames**
A
UCSC GB screenshots displaying candidate fragment coverage from GFP⁺ cells (left) and GFP⁻ cells (right) for the repressive TF engrailed (en; black bar: full‐length CDS; purple: +1 frame; orange: non‐native frames). The red bar indicates the region individually tested in luciferase assays. High coverage is only observed in frame +3.
B
Normalized luciferase activities (firefly/Renilla) of tAD candidate and GFP control are shown (n = 3, error bars: s.d., P‐value: two‐sided Student's t‐test vs. GFP control).
C, D
Candidate fragment coverage from GFP⁺ and GFP⁻ cells (orange) and the tAD calls (dark gray bar at bottom) are shown for bbx in frame +2 and seq in frame +3, respectively.
E
Normalized luciferase activities (firefly/Renilla) of bbx and seq “out‐of‐frame” tAD candidates and GFP control (n = 3, error bars: s.d., P‐value: two‐sided Student's t‐test vs. GFP).

**Figure 5. Protein sequence annotation of TFs and the tADs identified by tAD‐seq**
The tADs of HLH3B, HLH54F, and sage overlap with basic helix–loop–helix (HLH) domains. Shown are annotated schematic views of the full‐length TF CDSs (gray bars). tADs are shown in red, and the HLH domains in blue.
Normalized luciferase activities (firefly/Renilla) of the sage tAD_2 and HLH54F tAD candidates compared to the respective GFP control (n = 3 and n = 4, respectively, error bars: s.d., P‐value: two‐sided Student's t‐test vs. GFP control; for HLH3B, see Fig 3B).
The tADs of E2f, Clk, dar1, bin, slou, and Hnf4 contain glutamine‐rich (Q‐rich) regions. Schematic view as in (A), and Q‐rich regions in turquoise.
Out‐of‐frame tADs of Doc1 (frame +3) and bbx (frame +2) contain Q‐rich regions. Displayed are the Q‐rich regions (turquoise) within the tADs (red).
Data information: The scale bars in each panel apply to all elements (aa: amino acid).

**Figure EV1. tADs can have minimal sizes below which they do not function**
A, B
UCSC GB screenshots displaying candidate fragment coverage for taxi (tx; frame +1) (A) or CG32105 (B) of short (top)‐ and long (bottom)‐fragment candidate libraries from GFP⁺ cells (dark) and GFP⁻ cells (light). The black bar on top indicates the full‐length CDS, the dark gray bars below the coverage tracks the tADs called in the long‐fragment tAD‐seq screen (no tAD was found in the short‐fragment screen). The red bar and the light red bars indicate tAD candidates tested in luciferase assays, the normalized luciferase activities (normalized to GFP control) of long and short tAD candidates are shown on the right (n = 3 long tAD candidate; n = 4 short tAD candidates, error bars: s.d., P‐value: two‐sided Student's t‐test vs. short tAD candidates, reported is the highest P‐value).

**Figure EV2. Fine mapping of tAD locations**
A–C
Candidate fragment coverage and tAD calls (dark gray bars below coverage tracks) for HLH3B (frame +1) (A), MTF‐1 (B) or Clk (C) of short (top)‐ and long (bottom)‐fragment candidate libraries from GFP⁺ cells (dark) and GFP⁻ cells (light) are shown left. Normalized luciferase activities (firefly/Renilla) of the tAD candidates and the respective GFP control are shown (n = 4 for HLH3B and MTF‐1; n = 3 for Clk, error bars: s.d., P‐value: two‐sided Student's t‐test vs. GFP control) are shown right; for HLH3B, see Fig 3B). Short‐fragment coverage tracks and luciferase measurements for HLH3B and MTF‐1 are repeated from Figs 3B and 2, respectively, for comparison.

**Figure EV3. Mapping of tAD boundaries**
A–D
Short‐fragment library and candidate fragment coverage for the native (+1) frames for GFP⁺ and GFP⁻ cells (dark and light purple tracks), called tADs (dark gray bars), and two luciferase assay tested tAD candidates (red) and neutral (blue) fragments are shown. Normalized luciferase activities (normalized to GFP control) for tAD candidates and neutral regions (n = 4, error bars: s.d., P‐value: two‐sided Student's t‐test vs. neutral region; n = 3 CG14451‐tAD_short). Data for coverage tracks and luciferase assays of tAD candidates (labeled: tAD) and neutral regions repeated from Figs 2 and 3 for comparison.

See this image and copyright information in PMC

References

1. Abedi M, Caponigro G, Shen J, Hansen S, Sandrock T, Kamb A (2001) Transcriptional transactivation by selected short random peptides attached to lexA‐GFP fusion proteins. BMC Mol Biol 2: 10 - PMC - PubMed
1. Albà MM, Guigo R (2004) Comparative analysis of amino acid repeats in rodents and humans. Genome Res 14: 549–554 - PMC - PubMed
1. Allada R, White NE, So WV, Hall JC, Rosbash M (1998) A mutant Drosophila homolog of mammalian Clock disrupts circadian rhythms and transcription of period and timeless. Cell 93: 791–804 - PubMed
1. Arnold CD, Gerlach D, Stelzer C, Boryń ŁM, Rath M, Stark A (2013) Genome‐wide quantitative enhancer activity maps identified by STARR‐seq. Science 339: 1074–1077 - PubMed
1. Atanesyan L, Günther V, Dichtl B, Georgiev O, Schaffner W (2012) Polyglutamine tracts as modulators of transcriptional activation from yeast to mammals. Biol Chem 393: 63–70 - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

Grants and funding

P 29613/FWF_/Austrian Science Fund FWF/Austria

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases
Research Materials
- Addgene Non-profit plasmid repository
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A high-throughput method to identify trans-activation domains within transcription factor sequences

Affiliations

A high-throughput method to identify trans-activation domains within transcription factor sequences

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases

Research Materials

Miscellaneous