Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jun 26;22(1):482.
doi: 10.1186/s12864-021-07760-6.

UniBind: maps of high-confidence direct TF-DNA interactions across nine species

Affiliations

UniBind: maps of high-confidence direct TF-DNA interactions across nine species

Rafael Riudavets Puig et al. BMC Genomics. .

Abstract

Background: Transcription factors (TFs) bind specifically to TF binding sites (TFBSs) at cis-regulatory regions to control transcription. It is critical to locate these TF-DNA interactions to understand transcriptional regulation. Efforts to predict bona fide TFBSs benefit from the availability of experimental data mapping DNA binding regions of TFs (chromatin immunoprecipitation followed by sequencing - ChIP-seq).

Results: In this study, we processed ~ 10,000 public ChIP-seq datasets from nine species to provide high-quality TFBS predictions. After quality control, it culminated with the prediction of ~ 56 million TFBSs with experimental and computational support for direct TF-DNA interactions for 644 TFs in > 1000 cell lines and tissues. These TFBSs were used to predict > 197,000 cis-regulatory modules representing clusters of binding events in the corresponding genomes. The high-quality of the TFBSs was reinforced by their evolutionary conservation, enrichment at active cis-regulatory regions, and capacity to predict combinatorial binding of TFs. Further, we confirmed that the cell type and tissue specificity of enhancer activity was correlated with the number of TFs with binding sites predicted in these regions. All the data is provided to the community through the UniBind database that can be accessed through its web-interface ( https://unibind.uio.no/ ), a dedicated RESTful API, and as genomic tracks. Finally, we provide an enrichment tool, available as a web-service and an R package, for users to find TFs with enriched TFBSs in a set of provided genomic regions.

Conclusions: UniBind is the first resource of its kind, providing the largest collection of high-confidence direct TF-DNA interactions in nine species.

Keywords: ChIP-seq; Cis-regulatory modules; Evolutionary conservation; TF-DNA interactions; Transcription factor binding sites; Transcription regulation; UniBind.

PubMed Disclaimer

Conflict of interest statement

None declared.

Figures

Fig. 1
Fig. 1
Overview of the UniBind robust collection. A Barplots showing the number of TFs (dark orange), TFBSs (green), datasets (blue), and cell and tissue types (light orange) stored in the robust collection of UniBind for each analyzed species. All values are log10-transformed. B Distribution of the percentages of the genomes covered by robust TFBSs in each species (one color per species, see legend)
Fig. 2
Fig. 2
Evolutionary conservation of human and mouse TFBSs in the robust collection. Distributions of the average base-pair evolutionary conservation scores (phyloP and phastCons scores using multi-species genome alignments, see legends) at regions centered around human (A) and mouse (B) TFBSs from the robust collection. Random expectation (grey lines) was obtained by shuffling the original TFBS locations and obtaining the conservation score of the regions obtained. C Fraction of mouse lifted archetype TFBSs in the UniBind robust collection (y-axis) with respect to increasing relative distances (x-axis) from human archetype TFBSs from the same archetype computed using the bedtools reldist command. The figure provides, for each value of relative distance, the median (blue line) together with the 10th to 90th percentiles (grey area) of the observed frequencies. When two genomic tracks are not spatially related, one expects the fraction of relative distance distribution to be uniform. D Distributions of average base-pair evolutionary conservation scores (phastCons100way) at 1,000,000 randomly selected and shuffled TFBSs from JASPAR 2020 and UniBind 2021
Fig. 3
Fig. 3
Genomic distribution of TFBSs. Distribution of the proportion of TFBSs from the robust collection overlapping with different types of genomic regions (columns; see legend) across species (rows). For each species, we provide the observed (first lines, denoted Obs) and expected (second lines, denoted Exp) proportions of TFBSs in each type of genomic regions. Expected proportions were estimated by randomly positioning the TFBSs in the corresponding genomes (see Methods)
Fig. 4
Fig. 4
Analysis of the overlap of TFBSs with respect to active cis-regulatory regions in human and mouse. A-B Fraction of TFBSs in the UniBind robust collection (y-axis) with respect to increasing relative distances (x-axis) from ENCODE candidate cis-regulatory regions (cCREs) computed using the bedtools reldist command for human (A) and mouse (B). When two genomic tracks are not spatially related, one expects the fraction of relative distance distribution to be uniform. C Genomic tracks from the UCSC Genome Browser at the human LDLR gene locus (from start to first coding exon) providing information about PhyloP and PhastCons evolutionary conservation scores and the locations of ENCODE cCREs, UniBind CRMs, UniBind TFBSs from the robust collection (using the dense display mode to maximally condense the track) and the non-redundant collection of archetype TFBSs. Colors in the ENCODE cCREs track indicate: promoter-like signature (red), proximal enhancer-like signature (orange), and distal enhancer-like signature (yellow)
Fig. 5
Fig. 5
Correlation between enhancer activity and TF binding. For each enhancer predicted using Cap Analysis of Gene Expression (CAGE) by the FANTOM5 consortium, we computed the number of TFs with overlapping TFBSs in the robust collection of UniBind (x-axis). The figure provides, for each value of the number of TFs found to bind in enhancers, the median (blue line) together with the 10th to 90th percentiles (grey area) of cell type specific activity of these enhancers. The expression measures were derived from CAGE (capturing enhancer RNA expression). The specificity of activity (y-axis) is provided within the [0; 1] range with 0 representing ubiquitous enhancer activity and 1 exclusive expression activity
Fig. 6
Fig. 6
TF combinatorial binding in invasive breast ductal carcinoma. Hierarchical clustering of the pairwise Pearson correlation coefficient between all TFBSs from untreated MCF7 cells from the robust collection of UniBind. Different clusters and their respective TFs are coloured in red, blue, green, and purple. In the heatmap, blue colors indicate a higher positive correlation coefficient between datasets, while red colors indicate an anticorrelation (see legend)
Fig. 7
Fig. 7
The UniBind TFBS set enrichment tool. A The UniBind enrichment web-application allows users to select the enrichment analysis type, set a title, provide an email address for notification upon completion of the analysis, upload of the required input files based on the enrichment analysis type, and select the species and collection to compute the enrichment. B Enrichment results shown as swarm plots of the -log10(p-values) (Fisher exact tests; see Methods). Each point corresponds to a TFBS set for a given TF in a given ChIP-seq experiment. Distinct colors are assigned to the top 10 TFs with at least one TFBS set enriched (see legend). C The enrichment results can be further explored by restricting the output to TFBS sets obtained in specific cell lines and tissues, which can be searched by keywords and selected. D Swarm plot similar to (B) but restricted to TFBS sets obtained from breast-related tissues and cell lines

Similar articles

Cited by

References

    1. Suter DM. Transcription factors and DNA play Hide and Seek. Trends Cell Biol. 2020;30(6):491–500. doi: 10.1016/j.tcb.2020.03.003. - DOI - PubMed
    1. Wasserman WW, Sandelin A. Applied bioinformatics for the identification of regulatory elements. Nat Rev Genet. 2004;5(4):276–287. doi: 10.1038/nrg1315. - DOI - PubMed
    1. Johnson DS, Mortazavi A, Myers RM, Wold B. Genome-wide mapping of in vivo protein-DNA interactions. Science. 2007;316(5830):1497–1502. doi: 10.1126/science.1141319. - DOI - PubMed
    1. Furey TS. ChIP-seq and beyond: new and improved methodologies to detect and characterize protein-DNA interactions. Nat Rev Genet. 2012;13(12):840–852. doi: 10.1038/nrg3306. - DOI - PMC - PubMed
    1. Yevshin I, Sharipov R, Kolmykov S, Kondrakhin Y, Kolpakov F. GTRD: a database on gene transcription regulation—2019 update. Nucleic Acids Res. 2018;47:D100–D105. doi: 10.1093/nar/gky1128. - DOI - PMC - PubMed

LinkOut - more resources