. 2021 Jun 2;4(1):661.

doi: 10.1038/s42003-021-02153-7.

Transcription factor enrichment analysis (TFEA) quantifies the activity of multiple transcription factors from a single experiment

Jonathan D Rubin¹, Jacob T Stanley², Rutendo F Sigauke³, Cecilia B Levandowski¹, Zachary L Maas², Jessica Westfall⁴, Dylan J Taatjes¹, Robin D Dowell^{5

6

7}

Affiliations

¹ Department of Biochemistry, University of Colorado, Boulder, CO, USA.
² BioFrontiers Institute, University of Colorado, Boulder, CO, USA.
³ Computational Bioscience Program, Anschutz Medical Campus, University of Colorado, Aurora, CO, USA.
⁴ Department of Molecular, Cellular and Developmental Biology, University of Colorado, Boulder, CO, USA.
⁵ BioFrontiers Institute, University of Colorado, Boulder, CO, USA. robin.dowell@colorado.edu.
⁶ Department of Molecular, Cellular and Developmental Biology, University of Colorado, Boulder, CO, USA. robin.dowell@colorado.edu.
⁷ Department of Computer Science, University of Colorado, Boulder, CO, USA. robin.dowell@colorado.edu.

PMID: 34079046
PMCID: PMC8172830
DOI: 10.1038/s42003-021-02153-7

Transcription factor enrichment analysis (TFEA) quantifies the activity of multiple transcription factors from a single experiment

Jonathan D Rubin et al. Commun Biol. 2021.

. 2021 Jun 2;4(1):661.

doi: 10.1038/s42003-021-02153-7.

Authors

Jonathan D Rubin¹, Jacob T Stanley², Rutendo F Sigauke³, Cecilia B Levandowski¹, Zachary L Maas², Jessica Westfall⁴, Dylan J Taatjes¹, Robin D Dowell^{5

6

7}

Affiliations

¹ Department of Biochemistry, University of Colorado, Boulder, CO, USA.
² BioFrontiers Institute, University of Colorado, Boulder, CO, USA.
³ Computational Bioscience Program, Anschutz Medical Campus, University of Colorado, Aurora, CO, USA.
⁴ Department of Molecular, Cellular and Developmental Biology, University of Colorado, Boulder, CO, USA.
⁵ BioFrontiers Institute, University of Colorado, Boulder, CO, USA. robin.dowell@colorado.edu.
⁶ Department of Molecular, Cellular and Developmental Biology, University of Colorado, Boulder, CO, USA. robin.dowell@colorado.edu.
⁷ Department of Computer Science, University of Colorado, Boulder, CO, USA. robin.dowell@colorado.edu.

PMID: 34079046
PMCID: PMC8172830
DOI: 10.1038/s42003-021-02153-7

Abstract

Detecting changes in the activity of a transcription factor (TF) in response to a perturbation provides insights into the underlying cellular process. Transcription Factor Enrichment Analysis (TFEA) is a robust and reliable computational method that detects positional motif enrichment associated with changes in transcription observed in response to a perturbation. TFEA detects positional motif enrichment within a list of ranked regions of interest (ROIs), typically sites of RNA polymerase initiation inferred from regulatory data such as nascent transcription. Therefore, we also introduce muMerge, a statistically principled method of generating a consensus list of ROIs from multiple replicates and conditions. TFEA is broadly applicable to data that informs on transcriptional regulation including nascent transcription (eg. PRO-Seq), CAGE, histone ChIP-Seq, and accessibility data (e.g., ATAC-Seq). TFEA not only identifies the key regulators responding to a perturbation, but also temporally unravels regulatory networks with time series data. Consequently, TFEA serves as a hypothesis-generating tool that provides an easy, rigorous, and cost-effective means to broadly assess TF activity yielding new biological insights.

PubMed Disclaimer

Conflict of interest statement

Dr. Dowell is a founder of Arpeggio Biosciences. The remaining authors declare no competing interests.

Figures

**Fig. 1. TFEA calculates motif enrichment using differential and positional information.**
The TFEA pipeline requires, minimally, a ranked list of ROIs (control in blue, treatment in orange). Optionally, a user may provide raw read coverage and regions (ROI, colored boxes labeled a–d), in which case TFEA will perform ranking using DESeq^, analysis. With a set of ranked ROIs (orange up, blue down), TFEA analyzes motif enrichment for each motif provided (red circles). For each motif, positions are determined by FIMO scans, and an enrichment curve is calculated by weighting each motif instance (with weight w_i, using an exponential decay as a function of the motif distance d_i from the region center) and adding this value to a running sum. An E-score is calculated as 2 * AUC, e.g., the area under the enrichment curve between the running sum and a uniform background (dashed line), and scaled by the number of motif instances N. For statistical significance, the ROI rank is randomly shuffled 1000 times, and E-scores are recalculated for each shuffle. The true E-Score is then compared to the distribution of E-Scores obtained from the shuffling events. For example, the output of TFEA, see Supplementary Fig. 1 and Supplementary Fig. 2.

**Fig. 2. *muMerge* precisely combines multiple samples into consensus ROIs.**
a A schematic for the *muMerge* method. Each sample region (light blue box) is represented by a probability distribution (green, Eq. (1), with centers μ_i and stdev ρσ_i), which are combined into a joint probability distribution (dark blue peak, Eq. (2)) from which the final ROI estimates are inferred (dark blue bar). b Test 1 demonstrates the position and width accuracy of a calculated ROI for a single locus, μ, as the number of sample replicates are increased (from one to ten). The three methods, *bedtools merge* (orange), *bedtools intersect* (red), and *muMerge* (dark blue), for generating ROIs from multiple samples are compared. With *muMerge* the uncertainty on $\hat{μ}$ (i.e., the standard deviation of the distance between the ground truth position, μ, and its estimate, $\hat{μ} \in {μ_{m u M e r g e}, μ_{m e r g e}, μ_{i n t e r s e c t}}$ ) decreases quickly while the estimated ROI width remains essentially constant. The standard error, indicated by colored shading, is less than the line width in most cases. c Test 2 demonstrates the precision of the calculated ROI for two closely spaced loci, μ₁ and μ₂, as the spacing between them is increased. In this case, *muMerge* transitions from a single locus to two distinct loci more gradually (violin plots, ROI position) and the estimated ROI widths do not deviate from the expected value (violin plots, ROI width), unlike *merge* and *intersect*. In all cases, expected value and variance used for the simulations is indicated by dashed grey lines and shading, respectively. For further detail on the results of Test 1 and 2 and how the simulations were performed, see Supplementary Fig. 4 and Methods *section "muMerge*: Simulating replicates for calculation of ROIs".

**Fig. 3. TFEA improves the detection of p53 following Nutlin-3a treatment.**
a Application of the MD-Score, MDD-Score, and TFEA to GROSeq data in HCT116 cells with 1 h Nutlin-3a or DMSO treatment. MA plots contrast a number of regions with motif (x-axis) to the change in each score (y-axis). Each dot is a distinct position-specific scoring matrix (e.g., TF) with significant changes highlighted in red. Cutoffs determined by comparing untreated replicates (see Supplemental Fig. 12). b Application of the MD-Score, MDD-Score, and TFEA to PRO-Seq data in MCF10A cells with 1 h Nutlin-3a or DMSO treatment. c Motif displacement distribution plot of TP53 motif instances within 1.5 kb of all ROI in either DMSO (blue) or Nutlin-3a (red) (as a heatmap, darker indicates more motif instances). d Percentage overlap of TP53 motif instances within 150 bp of DMSO and Nutlin-3a ROIs. e Similar to (c) but in MCF10A cells. See Supplementary Data 1 for a complete list of accession numbers for data utilized.

**Fig. 4. TFEA balances TF positional and differential signal.**
a Optimal cutoffs are determined using the mean true positive rate (TPR; green) and mean false positive rate (FPR; orange) across the different signal and background levels as a function of varying the threshold cutoff. b F1 score of AME and TFEA for varied signal and background, using optimal AME cutoff 1e−30 and TFEA cutoff 0.1. c Difference in F1 score between TFEA and AME across all simulations (n = 121; value = F1_TFEA − F1_AME). TFEA (red) outperforms AME (blue) in 26% of cases (value > 0) whereas AME outperforms TFEA in 21% of cases (value < 0). d F1 scores and e difference in scores for highest signal tested (10% signal), now varying the standard deviation of the signal and background. See Supplementary Fig. 17 for more details on simulations.

**Fig. 5. TFEA dissects the temporal dynamics of infection.**
a Analysis of lipopolysaccharide (LPS) time-series cap analysis gene expression (CAGE) data^, using AME and TFEA. Trajectories of activity profiles show LPS triggers immediate activation of the NF-κβ complex (TF65/RelB/NFKB1; yellow), observable at 15 min (blue arrow). TFEA detects a concomitant downregulation of a set of transcription factors, exemplified here by TYY1 (purple). TFEA also resolves subsequent dynamics (green bracket) of ISGF3 activation (containing IRF9/STAT1/STAT2; red lines). b Schematic depicting the molecular insights gained from TFEA analysis. See Supplementary Fig. 19 for more analysis. See Supplementary Data 1 for a complete list of accession numbers for data utilized.

**Fig. 6. TFEA captures rapid dynamics of the glucocorticoid receptor (GR) following treatment with dexamethasone.**
a TFEA correctly identifies GR (red line) from time-series ChIP data on the histone acetyl-transferase p300, H3K27ac, and DNase I. No signal is observed in the negative control H3K9me3. TFEA shows a temporal lag in the H3K27ac signal (orange arrows). b Known cellular dynamics of GR induced by dexamethasone (Dex). c Mechanistic and temporal insights gained by performing TFEA analysis, question marks indicate datasets where earlier time points were not available to resolve temporal information. See Supplementary Data 1 for a complete list of accession numbers for data utilized.

See this image and copyright information in PMC

References

1. The ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature489, 57–74 (2012). - PMC - PubMed
1. Davis CA, et al. The Encyclopedia of DNA elements (ENCODE): data portal update. Nucleic Acids Res. 2018;46:D794–D801. doi: 10.1093/nar/gkx1081. - DOI - PMC - PubMed
1. Lambert SA, et al. The human transcription factors. Cell. 2018;172:650–665. doi: 10.1016/j.cell.2018.01.029. - DOI - PMC - PubMed
1. Kulakovskiy IV, et al. HOCOMOCO: expansion and enhancement of the collection of transcription factor binding sites models. Nucleic Acids Res. 2016;44:D116–D125. doi: 10.1093/nar/gkv1249. - DOI - PMC - PubMed
1. Fornes O, et al. JASPAR 2020: update of the open-access database of transcription factor binding profiles. Nucleic Acids Res. 2019;48:D87–D92. - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

R01 GM125871/GM/NIGMS NIH HHS/United States

LinkOut - more resources

Full Text Sources
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases
Research Materials
- NCI CPTC Antibody Characterization Program
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Transcription factor enrichment analysis (TFEA) quantifies the activity of multiple transcription factors from a single experiment

Affiliations

Transcription factor enrichment analysis (TFEA) quantifies the activity of multiple transcription factors from a single experiment

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Molecular Biology Databases

Research Materials

Miscellaneous