Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Jun 12:5:11432.
doi: 10.1038/srep11432.

TRRUST: a reference database of human transcriptional regulatory interactions

Affiliations

TRRUST: a reference database of human transcriptional regulatory interactions

Heonjong Han et al. Sci Rep. .

Abstract

The reconstruction of transcriptional regulatory networks (TRNs) is a long-standing challenge in human genetics. Numerous computational methods have been developed to infer regulatory interactions between human transcriptional factors (TFs) and target genes from high-throughput data, and their performance evaluation requires gold-standard interactions. Here we present a database of literature-curated human TF-target interactions, TRRUST (transcriptional regulatory relationships unravelled by sentence-based text-mining, http://www.grnpedia.org/trrust), which currently contains 8,015 interactions between 748 TF genes and 1,975 non-TF genes. A sentence-based text-mining approach was employed for efficient manual curation of regulatory interactions from approximately 20 million Medline abstracts. To the best of our knowledge, TRRUST is the largest publicly available database of literature-curated human TF-target interactions to date. TRRUST also has several useful features: i) information about the mode-of-regulation; ii) tests for target modularity of a query TF; iii) tests for TF cooperativity of a query target; iv) inferences about cooperating TFs of a query TF; and v) prioritizing associated pathways and diseases with a query TF. We observed high enrichment of TF-target pairs in TRRUST for top-scored interactions inferred from high-throughput data, which suggests that TRRUST provides a reliable benchmark for the computational reconstruction of human TRNs.

PubMed Disclaimer

Figures

Figure 1
Figure 1
(a) The overall process of constructing the TRRUST database via the manual curation of Medline abstracts using a sentence-based text-mining approach is outlined. GS stands for gold-standard. (b) A Venn diagram illustrates the overlap of TF-target regulatory interactions from four literature-curated databases: TRRUST, TRED-LC (literature-curated interactions of TRED), HTRIdb-LC (literature-curated interactions of HTRIdb), and TFactS.
Figure 2
Figure 2
(a) A network of TF (red nodes) and non-TF genes (green nodes) based on the regulatory interactions from TRRUST is shown. (b) Bar graphs show the number of TFs for two classes based on the different modularity of their targets. Only TFs with more than five target genes were considered for this analysis, resulting in 213 TFs with modular targets and 62 TFs with non-modular targets. (c) Bar graphs show the number of target genes for two classes based on the different cooperativity of their TFs. Only target genes regulated by more than five TFs were considered for this analysis, resulting in 344 target genes regulated by cooperative TFs and 53 target genes regulated by disjoint TFs.
Figure 3
Figure 3. Selective screenshots from TRRUST search results for an example query gene, BRCA1, are shown.
(a) A functional network of BRCA1 target genes based on HumanNet links is shown. (b) The physical interaction network of TFs that regulate BRCA1 based on literature-curated protein-protein interactions derived from major databases is shown. (c) A network of TFs that are predicted to cooperate with BRCA1 based on literature-curated protein-protein interactions derived from major databases is shown. (d) Disease Ontology terms prioritized for BRCA1 are listed. The top three associated diseases, breast carcinoma, prostate carcinoma, and malignant neoplasm of pancreas, are all validated by the literature.
Figure 4
Figure 4
Scatter plots representing the relationship between scores from algorithms (x-axis) and the enrichment fold for TRRUST (a,b), TFactS (c,d), TRED-LC (e,f) and HTRIdb-LC (g,h) gene pairs (y-axis) for inferred human TRNs are shown. TF-target interactions inferred from ChIP-chip/seq data of hmChIP database were scored by the ChIPXpress algorithm (a) and those from a series of microarray samples from the Gene Expression Omnibus database (GSE14764) were scored by the GENIE3 algorithm (b). The enrichment fold was measured for each of successive bins of 1,000 links, which were sorted by algorithm scores. We found best regressions between algorithm scores and the enrichment of benchmarking TF-target interactions using a sigmoidal curve fit for all tested databases. TRRUST exhibits substantially better correlation for the hmChIP-ChIPXpress (Fig. 4a, r2 = 0.74) and GSE14764-GENIE3 (Fig. 4b, r2 = 0.48) TRNs than the other databases (Fig. 4c–h). We used the most significant 100,000 TF-target interactions for all benchmarking analyses, and computed the logarithm of the original ChIPEXpress score due to the highly biased score distribution for the low score range.

References

    1. Vaquerizas J. M., Kummerfeld S. K., Teichmann S. A. & Luscombe N. M. A census of human transcription factors: function, expression and evolution. Nat Rev Genet 10, 252–63 (2009). - PubMed
    1. Marbach D. et al. Wisdom of crowds for robust gene network inference. Nat Methods 9, 796–804 (2012). - PMC - PubMed
    1. Salgado H. et al. RegulonDB v8.0: omics data sets, evolutionary conservation, regulatory phrases, cross-validated gold standards and more. Nucleic Acids Res 41, D203–13 (2013). - PMC - PubMed
    1. Faith J. J. et al. Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles. PLoS Biol 5, e8 (2007). - PMC - PubMed
    1. Hodges P. E., McKee A. H., Davis B. P., Payne W. E. & Garrels J. I. The Yeast Proteome Database (YPD): a model for the organization and presentation of genome-wide functional data. Nucleic Acids Res 27, 69–73 (1999). - PMC - PubMed

Publication types