Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Aug;29(8):1363-1375.
doi: 10.1101/gr.240663.118. Epub 2019 Jul 24.

Benchmark and integration of resources for the estimation of human transcription factor activities

Affiliations

Benchmark and integration of resources for the estimation of human transcription factor activities

Luz Garcia-Alonso et al. Genome Res. 2019 Aug.

Erratum in

Abstract

The prediction of transcription factor (TF) activities from the gene expression of their targets (i.e., TF regulon) is becoming a widely used approach to characterize the functional status of transcriptional regulatory circuits. Several strategies and data sets have been proposed to link the target genes likely regulated by a TF, each one providing a different level of evidence. The most established ones are (1) manually curated repositories, (2) interactions derived from ChIP-seq binding data, (3) in silico prediction of TF binding on gene promoters, and (4) reverse-engineered regulons from large gene expression data sets. However, it is not known how these different sources of regulons affect the TF activity estimations and, thereby, downstream analysis and interpretation. Here we compared the accuracy and biases of these strategies to define human TF regulons by means of their ability to predict changes in TF activities in three reference benchmark data sets. We assembled a collection of TF-target interactions for 1541 human TFs and evaluated how different molecular and regulatory properties of the TFs, such as the DNA-binding domain, specificities, or mode of interaction with the chromatin, affect the predictions of TF activity. We assessed their coverage and found little overlap on the regulons derived from each strategy and better performance by literature-curated information followed by ChIP-seq data. We provide an integrated resource of all TF-target interactions derived through these strategies, with confidence scores, as a resource for enhanced prediction of TF activities.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
TF–target resources overview. (A) Summary of the resources and strategies used to derive human TF–target interactions classified according to the evidence level: manually curated resources (yellow), ChIP-seq binding experimental data (orange), prediction of TF binding motifs based on gene promoter sequences (green), or inference from GTEx data (blue). All the resources were used in the benchmark, except NFIRegulomeDB, which has too low coverage. (B) TF coverage and overlaps across the different evidence classes, represented via UpSet plots (Lex and Gehlenborg 2014). Left bar plot represents the total number of TF per evidence class. Top bar plot represents the number of overlapping TFs in the intersection. Dark circles in the matrix indicate the evidence class that is part of the intersection. (C) TF classes (from TFClass) enriched in the TFs covered by more than two lines of evidence. Dots indicate the log odds ratio; error bars, the confidence interval. Colors indicate the FDR. (D) UpSet plot representing the TF–target's coverage and overlaps across the different evidence classes (similar as in B). Note that for regulons inferred from GTEx, only TF–targets or three or more tissues are shown. For TFBSs and ChIP-seq, only top 500 unique hits are shown; P < 0.0001.
Figure 2.
Figure 2.
Benchmark data sets. (A) Description of the three benchmark data sets. (B) Benchmark analysis scheme.
Figure 3.
Figure 3.
Comparison of TF activity prediction performances by TF–target resource for each benchmark data set. (A) Performance comparison of the regulon data sets, in terms of TF activity prediction, against the three benchmark data sets. Confidence versus coverage plots in which the x-axis represents the average AUPRC from the activity rank's position of the perturbed/essential TF with respect to the negative controls; y-axis represents the number of TFs (with five or more targets) in the benchmark covered by each regulon data set. Dot colors indicate the evidence type (single data sets/evidence). Linked dots represent different filtering strategies in the generation of transcriptionally inferred and TFBS-derived regulons. (B) Performance comparison of the regulon data sets on the overlapping TFs. The x-axis indicates the AUPRC from the activity rank's position of the perturbed/essential TF against the same number of randomly selected negatives; y-axis represents the regulon data set. The number of overlapping TFs is indicated at the top right corner. (C) Similar to A but comparing GTEx-inferred (green) versus TCGA-inferred (red) regulons. Results for both tissue/cancer-specific (dark color) and the respective normal and pancancer consensus regulons (light) are shown. (D) Similar to A but here the regulons are built as a combination of the initial regulon data sets (i.e., TF–target supported by an agreement of two (or three) of any of the four mentioned strategies). Dot colors indicate the nature of the combination (combined evidence). The label accompanying the “consensus within curated resources” dots indicates the number of resources supporting the TF–target interaction.
Figure 4.
Figure 4.
TF properties biasing the inference of TF activities across the TF regulon data sets. (A) Overview of the TF properties annotated for the 1541 human TFs under study. (B) TF properties enriched (FDR < 0.01) in the benchmark results B1, B2, and B3. Bar length is proportional to the enrichment score (ES), whereas color represents the significance strength (P-value). Properties enriched in more than one benchmark data set are labeled with an asterisk.
Figure 5.
Figure 5.
Scoring TF–target interactions from different evidence. (A) Scoring scheme. (B) TF and TF–target interaction coverage per score cutoff for the normal (dark green) and pancancer (red) collections. (C) Performance of scored regulons in the B1, B2, and B3 benchmark data sets.

References

    1. Alvarez MJ, Shen Y, Giorgi FM, Lachmann A, Ding BB, Ye BH, Califano A. 2016. Functional characterization of somatic mutations in cancer using network-based inference of protein activity. Nat Genet 48: 838–847. 10.1038/ng.3593 - DOI - PMC - PubMed
    1. Bailey TL, Johnson J, Grant CE, Noble WS. 2015. The MEME Suite. Nucleic Acids Res 43: W39–W49. 10.1093/nar/gkv416 - DOI - PMC - PubMed
    1. Barretina J, Caponigro G, Stransky N, Venkatesan K, Margolin AA, Kim S, Wilson CJ, Lehár J, Kryukov GV, Sonkin D, et al. 2012. The cancer cell line encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483: 603–607. 10.1038/nature11003 - DOI - PMC - PubMed
    1. Bleda M, Tarraga J, de Maria A, Salavert F, Garcia-Alonso L, Celma M, Martin A, Dopazo J, Medina I. 2012. CellBase, a comprehensive collection of RESTful web services for retrieving relevant biological information from heterogeneous sources. Nucleic Acids Res 40: W609–W614. 10.1093/nar/gks575 - DOI - PMC - PubMed
    1. Boros J, Donaldson IJ, O'Donnell A, Odrowaz ZA, Zeef L, Lupien M, Meyer CA, Liu XS, Brown M, Sharrocks AD. 2009. Elucidation of the ELK1 target gene network reveals a role in the coordinate regulation of core components of the gene regulation machinery. Genome Res 19: 1963–1973. 10.1101/gr.093047.109 - DOI - PMC - PubMed

Publication types

MeSH terms