Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2024 Apr 10:2024.03.12.584681.
doi: 10.1101/2024.03.12.584681.

Widespread variation in molecular interactions and regulatory properties among transcription factor isoforms

Affiliations

Widespread variation in molecular interactions and regulatory properties among transcription factor isoforms

Luke Lambourne et al. bioRxiv. .

Update in

  • Widespread variation in molecular interactions and regulatory properties among transcription factor isoforms.
    Lambourne L, Mattioli K, Santoso C, Sheynkman G, Inukai S, Kaundal B, Berenson A, Spirohn-Fitzgerald K, Bhattacharjee A, Rothman E, Shrestha S, Laval F, Carroll BS, Plassmeyer SP, Emenecker RJ, Yang Z, Bisht D, Sewell JA, Li G, Prasad A, Phanor S, Lane R, Moyer DC, Hunt T, Balcha D, Gebbia M, Twizere JC, Hao T, Holehouse AS, Frankish A, Riback JA, Salomonis N, Calderwood MA, Hill DE, Sahni N, Vidal M, Bulyk ML, Fuxman Bass JI. Lambourne L, et al. Mol Cell. 2025 Apr 3;85(7):1445-1466.e13. doi: 10.1016/j.molcel.2025.03.004. Epub 2025 Mar 26. Mol Cell. 2025. PMID: 40147441

Abstract

Most human Transcription factors (TFs) genes encode multiple protein isoforms differing in DNA binding domains, effector domains, or other protein regions. The global extent to which this results in functional differences between isoforms remains unknown. Here, we systematically compared 693 isoforms of 246 TF genes, assessing DNA binding, protein binding, transcriptional activation, subcellular localization, and condensate formation. Relative to reference isoforms, two-thirds of alternative TF isoforms exhibit differences in one or more molecular activities, which often could not be predicted from sequence. We observed two primary categories of alternative TF isoforms: "rewirers" and "negative regulators", both of which were associated with differentiation and cancer. Our results support a model wherein the relative expression levels of, and interactions involving, TF isoforms add an understudied layer of complexity to gene regulatory networks, demonstrating the importance of isoform-aware characterization of TF functions and providing a rich resource for further studies.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:. Sequence and expression diversity of annotated TF isoforms
A. Study schematic. B. Histogram showing the number of unique annotated protein isoforms for each TF gene. C. Boxplot showing the total percent of amino acids altered via deletions, insertions, or frameshifts in alternative isoforms compared to their cognate reference isoforms. D. Barplot showing the observed fraction of alternative isoforms with ≥ 10% removal of various protein domains (green bars) compared to the expected fraction (black error bars, 99% CI) as defined by a null model assuming the domain is randomly positioned along the protein. DBD = DNA-binding domain; NLS/NES = nuclear localization/export signal. E. Heatmap showing the maximum expression value of alternative TF isoforms (y-axis) compared to their cognate reference isoforms (x-axis) across GTEx (left) and developmental (right) RNA-seq datasets. GTEx dataset has been re-sampled to compare to the developmental dataset. F. Heatmap showing the maximum isoform fraction (y-axis) compared to the minimum isoform fraction (x-axis) of alternative TF isoforms in developmental RNA-seq data, where isoform fraction is defined as the expression level of an isoform normalized to the total expression level of its host gene. Dashed lines show the definitions used for isoforms that exhibit “switching” events and isoforms that remain lowly expressed. Only isoforms whose host genes are expressed at ≥ 1 TPM in ≥ 1 sample are shown. G. Stacked barplot summarizing the number of alternative isoforms defined to exhibit a switch event, a shift event, or be lowly expressed in re-sampled GTEx and developmental RNA-seq data. H. Example of an alternative TF isoform (HEY2–202) that exhibits a switch event. Top: log2 TPM values for each HEY2 isoform; bottom: isoform expression as a percentage of total gene expression for each HEY2 isoform. All heart and ovary samples in both GTEx and developmental RNA-seq are shown. Right: exon diagram of HEY2 isoforms with annotated protein domains. RD = repression domain; HLH = helix loop helix; DBD = DNA-binding domain.
Figure 2:
Figure 2:. Overview of TFIso1.0 clone collection and TF molecular function assays
A. Schematic showing the PCR-based approach used to generate TFIso1.0. B. Barplot showing the percentage of alternative isoforms in GENCODE, all of TFIso1.0, and only the novel isoforms in TFIso1.0 exhibiting various sequence differences compared to their cognate reference isoforms. C. Boxplot showing the median and maximum expression levels (in TPM) in developmental RNA-seq data of reference, annotated alternative, and novel alternative isoforms in TFIso1.0. D. Schematic showing the three primary assays used in this study. eY1H = enhanced yeast one-hybrid; Y2H = yeast two-hybrid; M1H = mammalian one-hybrid; Gal4-AD = Gal4 activation domain; Gal4-DBD = Gal4 DNA-binding domain; Gal4-UAS = Gal4 upstream activation sequence. E. Stacked barplot showing the percent of TF isoforms belonging to various TF families in GENCODE, the entire TFIso1.0 collection, and those that have been successfully tested in each assay. F. Barplot showing the proportion of isoforms exhibiting ≥ 1 PPI, ≥ 1 PDI, ≥ 2-fold activation/repression in M1H, or any one of the three across reference, annotated alternative, and novel alternative isoforms, normalized to the number of isoforms that were successfully tested in each assay. Error bars are 68.3% Bayesian CI. G. The sub-networks of PPIs and PDIs from profiling different TF isoforms.
Figure 3:
Figure 3:. DNA binding preferences of TF isoforms
A. Change in the number of PDIs in the alternative isoform compared to the reference isoform for alternative isoforms with full loss of DBD, partial loss of DBD, insertions within the DBD, or that contain the full DBD. Each point is colored by the percentage of the sequence difference in the alternative isoform stemming from a predicted disordered protein region. B. Top: exon diagrams of cloned HEY1 isoforms with annotated Pfam domains. 12 nt = location of the 4 amino acid insertion in the alternative isoform, at the end of exon 3. Bottom left: AlphaFold model of the alternative isoform of HEY1 aligned to an experimental structure of a homologous protein in a dimer bound to DNA (PDB ID 4H10), with DNA in green and dimerization partner in gray. Bottom right: PDI results from Y1H assay for the 3 baits successfully assayed for both isoforms of HEY1; black box = binding and white box = no binding. C. Top: exon diagrams of CREB1 isoforms with annotated Pfam domains. pKID = phosphorylated kinase-inducible domain. Bottom left: AlphaFold model of the alternative isoform of CREB1 aligned to an experimental structure of a CREB1 homodimer bound to DNA (PDB ID 1DH3), with DNA in green and dimerization partner in gray. Bottom right: PDI results from the Y1H assay for the 4 baits successfully assayed for both isoforms of CREB1. D. Schematic showing protein-binding microarray (PBM) experiments. Scores for all possible 8-mers are calculated from universal “all 10-mer” PBMs. E. Top: exon diagrams of TBX5 isoforms with annotated DBD. 3 nt = TBX5–2 is missing 1 amino acid at the start of its first exon compared to the reference. Bottom: AlphaFold model of the reference isoform of TBX5 aligned to an experimental DNA-bound structure (PDB ID 5FLV), with DNA in green. F. Left: PDI results from the Y1H assay for the 3 isoforms of TBX5. Showing 8 baits that were successfully tested against all 3 isoforms. Right: Sequence logo derived from the top 50 8-mers as determined via PBMs for each of the 3 TBX5 isoforms. G. Scatter plots showing the PBM affinity scores for the alternative isoform (y-axis) compared to the reference isoform (x-axis) of TBX5 for every 8-mer, for either TBX5–2 (left plot) or TBX5–3 (right plot), each compared to TBX5–1. Points are colored by the differential affinity q-value calculated by the upbm package. Open circles correspond to 8-mers containing the canonical TBX5 6-mer AGGTGT (or its reverse complement); filled circles correspond to 8-mers containing the altered 6-mer ACGTGT (or its reverse complement). H. Expression of TBX5 isoforms in developmental RNA-seq (left) and GTEx (right). Top: log2 TPM values for each TBX5 isoform; bottom: isoform expression as a percentage of total gene expression for each TBX5 isoform. All heart samples are shown. I. Barplot showing the enrichment of the canonical TBX5 6-mer AGGTGT, the altered TBX5 6-mer ACGTGT, or a negative control Homeodomain 6-mer TAATTA (or each of their reverse complements) in TBX5 ChIP-seq peaks (foreground) compared to matched genomic negative control regions (background). P-values shown are from a Fisher’s exact test.
Figure 4:
Figure 4:. Transcriptional activity and protein binding preferences of TF isoforms
A. Summary plot showing the change in the transcriptional activity (log2 fold-change, as determined via M1H assays) of the alternative isoform compared to the reference isoform for alternative isoforms with full or partial loss of annotated activation or repression effector domains, no loss of annotated effector domains, or containing no annotated effector domains. Each point is colored by the total number of amino acids in annotated effector domains for a given isoform. B. Pie chart showing the categories of PPI partners (as determined via Y2H assays) found to interact with ≥1 TF isoform. C. Box plots showing the absolute change in transcriptional activity associated with no change in PPIs (equal) or a change in PPIs (change) for various categories of PPI partners. P-values shown are from a one-sided Mann Whitney U test. D. Left: exon diagrams of CREB5 isoforms. RD = repression domain; 18 nt = CREB5–1 reference isoform clone is missing 6 amino acids at the N-terminus compared to annotated CREB5–204. Middle: PPI results from the Y2H assay for the 2 isoforms of CREB5; black box = binding and white box = no binding. Right: transcriptional activity from the M1H assay for the 2 isoforms of CREB5. E. Schematic showing how to calculate the fraction of isoforms interacting, using the PPI results for the 6 isoforms of ATF2. Showing PPI partners that were successfully tested against all 6 isoforms. F. Heatmap showing the rewiring score for combinations of families of TF isoforms (y-axis) and families of TF PPI partners (x-axis). Within-family dimerizations are therefore denoted on the diagonal of the heatmap. TF families that bind DNA as obligate dimers are marked with outlined black circles on the diagonal. The size of the circle denotes the number of PPIs, whereas the color denotes the mean fraction of isoforms interacting. Only TF isoform families with ≥3 TF partner interactions are shown; for the full heatmap see Figure S4I. G. Violin plot showing the fraction of TF isoforms (categorized before the slash in bolded blue text) that retain interactions with various TF PPI partner types (categorized after the slash). P-values shown are from a two-sided permutation test.
Figure 5:
Figure 5:. Functional differences between TF isoforms and TF paralogs
A. Schematic showing the definition of TF paralogs (blue vs. green) compared to TF isoforms (blue series or green series). B–C. Violin plots showing the Jaccard distance in PDIs (B) and PPIs (C) across reference/alternative isoform pairs, reference paralog pairs, or non-paralog reference pairs as a negative control. A Jaccard distance of 0 corresponds to entirely similar binding profiles, whereas a Jaccard distance of 1 corresponds to entirely dissimilar binding profiles. D. Violin plot showing the absolute log2 fold-change in M1H activation between isoforms, paralogs, and non-paralog controls. E. Violin plot showing the amino acid sequence identity (note that 100% identity is at the bottom of the y-axis, to remain consistent with the other plots) between isoforms, paralogs, and non-paralog controls. F–H. Analogous to B–C, but with isoform and paralog pairs broken up into bins based on their amino acid sequence identity. Number of pairs in each bin are denoted below the violins. I. Middle: pairwise sequence alignment of the reference isoforms of paralogs THRA and THRB, with darker green denoting perfectly matched amino acids and lighter green denoting mismatched amino acids. White regions indicate a gap in the alignment, and the gray schematics above and below the colored alignment denote which sequence is considered (thick gray block) or gapped (thin gray line). DBD and hormone receptor domains are denoted in each of the two paralogs. C4 ZF = C4 zinc finger. Right: AlphaFold2 predicted structures for isoforms of THRA and THRB. J. Exon diagrams of THRA isoforms (top) and THRB isoforms (bottom). AD = activation domain; RD = repression domain K. Left: PDI results from the Y1H assay for the isoforms of paralogous TFs THRA and THRB. Right: Transcriptional activity from the M1H assay.
Figure 6:
Figure 6:. Condensate formation and subcellular localization differences between TF isoforms
A. Schematic showing the assessment of condensate formation and subcellular localization using high-throughput imaging across 2 cell lines, HEK293T and U2OS. B. Description of the TF isoforms that were selected for profiling in the high-throughput imaging assays. C. Stacked bar plot showing the percent of alternative isoforms that show differences in localization, condensate formation, both, or neither as compared to their reference isoform in either HEK293T or U2OS cells. D.–F. Violin plots showing the differences in TF molecular functions (PDIs, D; PPIs, E; transcriptional activity, F) between alternative-reference TF isoform pairs that either show no difference in condensate formation or localization or those that do. For these analyses, only TF isoform pairs with consistent results across the two imaging cell lines were considered. P-values calculated using a two-sided permutation test. G. Exon diagram showing the two cloned isoforms of PBX1 in TFIso1.0, with Pfam domains annotated. H. Y2H PPI results for the two isoforms of PBX1. I. M1H transcriptional activation results for the two isoforms of PBX1. J. Representative images of PBX1 isoform expression in HEK293T cells (63x magnification). K. Saturation (Csat) curve analysis of PBX1 isoforms. Dots represent individual cells, x-axis shows total protein concentration from fluorescence (Ctot), y-axis shows concentration in the dilute phase (Cdil). Arbitrary units (AU) are at reference settings. Csat = saturation concentration; D = dominance. L. Exon diagram showing the six cloned isoforms of FOXP2 in TFIso1.0, with Pfam domains and nuclear localization sequence (NLS). M. Representative images of FOXP2 isoform expression in HEK293T cells (63x magnification). N. Csat analysis of FOXP2 isoforms.
Figure 7:
Figure 7:. Alternative TF isoforms can function as negative regulators
A. Schematic showing examples of TF isoforms classified as either rewirers or negative regulators. B. Cartoon heatmap showing how molecular function assay results were used to classify alternative isoforms as either similar to the reference, rewirers, or negative regulators. C. Nested pie chart showing the number of alternative isoforms categorized as either similar to reference, rewirers, or negative regulators (outer circle) and the number of annotated (solid colors) and novel (hatched colors) isoforms that comprise each category. D.The percent prevalence of various changes in molecular function among rewirers and negative regulators (left graph) compared to the percent prevalence of each assay among all alternative isoforms. Note that because most TFs have been assessed in ≥1 assay, these categories are not mutually exclusive with each other. D. The percent of alternative isoforms that either show loss of function in a particular assay (if categorized as a negative regulator, left) or change in function in a particular assay (if categorized as a rewirer, right) as compared to their reference isoforms. Note that because most TFs have been assessed in at least one assay, these categories are not mutually exclusive with each other; for a full plot of negative regulator classification reasons, see Figure S7D. E. Example of a negative regulator TF isoform (CREB1-alt). Left: exon diagram showing domain annotations. AD = activation domain; RD = repression domain; pKID = phosphorylated kinase-inducible domain. Middle: M1H results. Right: PDIs. F. Boxplot showing the gene-level tissue specificities (tau metric), calculated from the Developmental RNA-seq data, among TF genes with either only rewirer alternative isoforms, only negative regulator alternative isoforms, only alternative isoforms that are similar to reference, some combination of the above, or only alternative isoforms that were unable to be classified (NA). P-values shown are from a two-sided Mann Whitney test. G. Volcano plot showing the effect of TF over-expression on differentiation. The over-expression effect size (x-axis) is the Diffusion difference and the p-value (y-axis) is the −log10 of the Diffusion P-value, both as calculated in the TF mORF Atlas. H. Volcano plot showing differential abundance of TF isoforms in breast cancer. The median isoform difference (x-axis) is the median difference of fractional isoform expression among paired tumor/normal breast cancer samples and the p-value (y-axis) is the −log10 of the adjusted, paired Wilcoxon p-value. I. Paired swarm plot showing the relative expression of the alternative isoform of CREB1 (as a fraction of total CREB1 gene expression) in matched breast cancer tumor and normal samples (from the same patient). P-value shown is from a two-sided Mann Whitney test, adjusted for multiple hypothesis correction. J. Expression levels of CREB1 isoforms in GTEx. Top: log2 TPM values for each CREB1 isoform; bottom: isoform expression as a percentage of total gene expression for each CREB1 isoform. K. Schematic model showing one example mechanism of how, whereas rewirer isoforms lead to altered GRNs, negative regulator isoforms can lead to misregulation of canonical GRNs either in the absence or presence of the reference isoform. Negative regulator TF isoforms that outcompete their reference isoforms in the same cell can be thought of as naturally-occurring dominant negatives.

References

    1. Ptashne M. (1988). How eukaryotic transcriptional activators work. Nature 335, 683–689. - PubMed
    1. Jolma A., Kivioja T., Toivonen J., Cheng L., Wei G., Enge M., Taipale M., Vaquerizas J.M., Yan J., Sillanpää M.J., et al. (2010). Multiplexed massively parallel SELEX for characterization of human transcription factor binding specificities. Genome Res. 20, 861–873. - PMC - PubMed
    1. Jolma A., Yan J., Whitington T., Toivonen J., Nitta K.R., Rastas P., Morgunova E., Enge M., Taipale M., Wei G., et al. (2013). DNA-binding specificities of human transcription factors. Cell 152, 327–339. - PubMed
    1. Fuxman Bass J.I., Sahni N., Shrestha S., Garcia-Gonzalez A., Mori A., Bhat N., Yi S., Hill D.E., Vidal M., and Walhout A.J.M. (2015). Human gene-centered transcription factor networks for enhancers and disease variants. Cell 161, 661–673. - PMC - PubMed
    1. Barrera L.A., Vedenko A., Kurland J.V., Rogers J.M., Gisselbrecht S.S., Rossin E.J., Woodard J., Mariani L., Kock K.H., Inukai S., et al. (2016). Survey of variation in human transcription factors reveals prevalent DNA binding changes. Science 351, 1450–1454. - PMC - PubMed

Publication types