Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jun 2;4(1):661.
doi: 10.1038/s42003-021-02153-7.

Transcription factor enrichment analysis (TFEA) quantifies the activity of multiple transcription factors from a single experiment

Affiliations

Transcription factor enrichment analysis (TFEA) quantifies the activity of multiple transcription factors from a single experiment

Jonathan D Rubin et al. Commun Biol. .

Abstract

Detecting changes in the activity of a transcription factor (TF) in response to a perturbation provides insights into the underlying cellular process. Transcription Factor Enrichment Analysis (TFEA) is a robust and reliable computational method that detects positional motif enrichment associated with changes in transcription observed in response to a perturbation. TFEA detects positional motif enrichment within a list of ranked regions of interest (ROIs), typically sites of RNA polymerase initiation inferred from regulatory data such as nascent transcription. Therefore, we also introduce muMerge, a statistically principled method of generating a consensus list of ROIs from multiple replicates and conditions. TFEA is broadly applicable to data that informs on transcriptional regulation including nascent transcription (eg. PRO-Seq), CAGE, histone ChIP-Seq, and accessibility data (e.g., ATAC-Seq). TFEA not only identifies the key regulators responding to a perturbation, but also temporally unravels regulatory networks with time series data. Consequently, TFEA serves as a hypothesis-generating tool that provides an easy, rigorous, and cost-effective means to broadly assess TF activity yielding new biological insights.

PubMed Disclaimer

Conflict of interest statement

Dr. Dowell is a founder of Arpeggio Biosciences. The remaining authors declare no competing interests.

Figures

Fig. 1
Fig. 1. TFEA calculates motif enrichment using differential and positional information.
The TFEA pipeline requires, minimally, a ranked list of ROIs (control in blue, treatment in orange). Optionally, a user may provide raw read coverage and regions (ROI, colored boxes labeled ad), in which case TFEA will perform ranking using DESeq, analysis. With a set of ranked ROIs (orange up, blue down), TFEA analyzes motif enrichment for each motif provided (red circles). For each motif, positions are determined by FIMO scans, and an enrichment curve is calculated by weighting each motif instance (with weight wi, using an exponential decay as a function of the motif distance di from the region center) and adding this value to a running sum. An E-score is calculated as 2 * AUC, e.g., the area under the enrichment curve between the running sum and a uniform background (dashed line), and scaled by the number of motif instances N. For statistical significance, the ROI rank is randomly shuffled 1000 times, and E-scores are recalculated for each shuffle. The true E-Score is then compared to the distribution of E-Scores obtained from the shuffling events. For example, the output of TFEA, see Supplementary Fig. 1 and Supplementary Fig. 2.
Fig. 2
Fig. 2. muMerge precisely combines multiple samples into consensus ROIs.
a A schematic for the muMerge method. Each sample region (light blue box) is represented by a probability distribution (green, Eq. (1), with centers μi and stdev ρσi), which are combined into a joint probability distribution (dark blue peak, Eq. (2)) from which the final ROI estimates are inferred (dark blue bar). b Test 1 demonstrates the position and width accuracy of a calculated ROI for a single locus, μ, as the number of sample replicates are increased (from one to ten). The three methods, bedtools merge (orange), bedtools intersect (red), and muMerge (dark blue), for generating ROIs from multiple samples are compared. With muMerge the uncertainty on μ^ (i.e., the standard deviation of the distance between the ground truth position, μ, and its estimate, μ^{μmuMerge,μmerge,μintersect}) decreases quickly while the estimated ROI width remains essentially constant. The standard error, indicated by colored shading, is less than the line width in most cases. c Test 2 demonstrates the precision of the calculated ROI for two closely spaced loci, μ1 and μ2, as the spacing between them is increased. In this case, muMerge transitions from a single locus to two distinct loci more gradually (violin plots, ROI position) and the estimated ROI widths do not deviate from the expected value (violin plots, ROI width), unlike merge and intersect. In all cases, expected value and variance used for the simulations is indicated by dashed grey lines and shading, respectively. For further detail on the results of Test 1 and 2 and how the simulations were performed, see Supplementary Fig. 4 and Methods section "muMerge: Simulating replicates for calculation of ROIs".
Fig. 3
Fig. 3. TFEA improves the detection of p53 following Nutlin-3a treatment.
a Application of the MD-Score, MDD-Score, and TFEA to GROSeq data in HCT116 cells with 1 h Nutlin-3a or DMSO treatment. MA plots contrast a number of regions with motif (x-axis) to the change in each score (y-axis). Each dot is a distinct position-specific scoring matrix (e.g., TF) with significant changes highlighted in red. Cutoffs determined by comparing untreated replicates (see Supplemental Fig. 12). b Application of the MD-Score, MDD-Score, and TFEA to PRO-Seq data in MCF10A cells with 1 h Nutlin-3a or DMSO treatment. c Motif displacement distribution plot of TP53 motif instances within 1.5 kb of all ROI in either DMSO (blue) or Nutlin-3a (red) (as a heatmap, darker indicates more motif instances). d Percentage overlap of TP53 motif instances within 150 bp of DMSO and Nutlin-3a ROIs. e Similar to (c) but in MCF10A cells. See Supplementary Data 1 for a complete list of accession numbers for data utilized.
Fig. 4
Fig. 4. TFEA balances TF positional and differential signal.
a Optimal cutoffs are determined using the mean true positive rate (TPR; green) and mean false positive rate (FPR; orange) across the different signal and background levels as a function of varying the threshold cutoff. b F1 score of AME and TFEA for varied signal and background, using optimal AME cutoff 1e−30 and TFEA cutoff 0.1. c Difference in F1 score between TFEA and AME across all simulations (n = 121; value = F1TFEA − F1AME). TFEA (red) outperforms AME (blue) in 26% of cases (value > 0) whereas AME outperforms TFEA in 21% of cases (value < 0). d F1 scores and e difference in scores for highest signal tested (10% signal), now varying the standard deviation of the signal and background. See Supplementary Fig. 17 for more details on simulations.
Fig. 5
Fig. 5. TFEA dissects the temporal dynamics of infection.
a Analysis of lipopolysaccharide (LPS) time-series cap analysis gene expression (CAGE) data, using AME and TFEA. Trajectories of activity profiles show LPS triggers immediate activation of the NF-κβ complex (TF65/RelB/NFKB1; yellow), observable at 15 min (blue arrow). TFEA detects a concomitant downregulation of a set of transcription factors, exemplified here by TYY1 (purple). TFEA also resolves subsequent dynamics (green bracket) of ISGF3 activation (containing IRF9/STAT1/STAT2; red lines). b Schematic depicting the molecular insights gained from TFEA analysis. See Supplementary Fig. 19 for more analysis. See Supplementary Data 1 for a complete list of accession numbers for data utilized.
Fig. 6
Fig. 6. TFEA captures rapid dynamics of the glucocorticoid receptor (GR) following treatment with dexamethasone.
a TFEA correctly identifies GR (red line) from time-series ChIP data on the histone acetyl-transferase p300, H3K27ac, and DNase I. No signal is observed in the negative control H3K9me3. TFEA shows a temporal lag in the H3K27ac signal (orange arrows). b Known cellular dynamics of GR induced by dexamethasone (Dex). c Mechanistic and temporal insights gained by performing TFEA analysis, question marks indicate datasets where earlier time points were not available to resolve temporal information. See Supplementary Data 1 for a complete list of accession numbers for data utilized.

References

    1. The ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature489, 57–74 (2012). - PMC - PubMed
    1. Davis CA, et al. The Encyclopedia of DNA elements (ENCODE): data portal update. Nucleic Acids Res. 2018;46:D794–D801. doi: 10.1093/nar/gkx1081. - DOI - PMC - PubMed
    1. Lambert SA, et al. The human transcription factors. Cell. 2018;172:650–665. doi: 10.1016/j.cell.2018.01.029. - DOI - PMC - PubMed
    1. Kulakovskiy IV, et al. HOCOMOCO: expansion and enhancement of the collection of transcription factor binding sites models. Nucleic Acids Res. 2016;44:D116–D125. doi: 10.1093/nar/gkv1249. - DOI - PMC - PubMed
    1. Fornes O, et al. JASPAR 2020: update of the open-access database of transcription factor binding profiles. Nucleic Acids Res. 2019;48:D87–D92. - PMC - PubMed

Publication types

MeSH terms