Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2002 May;12(5):832-9.
doi: 10.1101/gr.225502.

rVista for comparative sequence-based discovery of functional transcription factor binding sites

Affiliations
Comparative Study

rVista for comparative sequence-based discovery of functional transcription factor binding sites

Gabriela G Loots et al. Genome Res. 2002 May.

Abstract

Identifying transcriptional regulatory elements represents a significant challenge in annotating the genomes of higher vertebrates. We have developed a computational tool, rVista, for high-throughput discovery of cis-regulatory elements that combines clustering of predicted transcription factor binding sites (TFBSs) and the analysis of interspecies sequence conservation to maximize the identification of functional sites. To assess the ability of rVista to discover true positive TFBSs while minimizing the prediction of false positives, we analyzed the distribution of several TFBSs across 1 Mb of the well-annotated cytokine gene cluster (Hs5q31; Mm11). Because a large number of AP-1, NFAT, and GATA-3 sites have been experimentally identified in this interval, we focused our analysis on the distribution of all binding sites specific for these transcription factors. The exploitation of the orthologous human-mouse dataset resulted in the elimination of > 95% of the approximately 58,000 binding sites predicted on analysis of the human sequence alone, whereas it identified 88% of the experimentally verified binding sites in this region.

PubMed Disclaimer

Figures

Figure 1
Figure 1
rVISTA data flow. The user submits a global alignment file (generated by the AVID program) and optional annotation files for the two orthologous sequences. The imported TRANSFAC matrix library and the MATCH program are consequently used to identify all transcription factor binding site (TFBS) matches in each individual sequence and to generate a file with all TFBS matches in the reference sequence (used as baseline for visualization). Next, the global alignment and the sequence annotations provided are used to identify all aligned TFBSs present in the noncoding DNA (in the absence of annotation, the program will identify all aligned sites across the entire alignment). A second file is generated containing aligned noncoding TFBSs. DNA sequence conservation is determined by the hula-hoop module, which identifies TFBSs surrounded by conserved sequences and generates a data table with detailed statistics. The final data processing step includes a user-interactive visualization module. The user customizes the data by choosing which TF sites to visualize (we are giving an example for choosing GATA-3 sites), what TRANSFAC parameters to use for all TF matches (rVISTA default 0.75/0.8), and by selectively clustering individual or combinatorial sites. The user can customize the clustering of any of the three data sets (all matches in the reference sequence are depicted as blue tick marks, aligned TFBS matches are in red, and conserved TFBS matches are in green).
Figure 2
Figure 2
Visualizing rVISTA cluster analysis for a 25-kb region across the GM-CSF and IL-3 genomic interval. (A) Ikaros-2 TFBS clusters (two sites over 60-bp region). Ikaros-2 matches fitting the clustering criteria for the human sequence alone are depicted in blue, aligned clusters in red, and conserved clusters in green. (B) Multiclustering of individual sites can be performed by independently choosing the clustering criteria for each TF. AP-1 (blue), NFAT (red), and GATA-3 (green) clusters (two sites over 100 bp) of conserved TFBSs are illustrated. (C) Combinatorial clustering of TFBS. By use of the clustering criteria of 1 NFAT and 1 AP-1 across a 60-bp DNA fragment, the rVISTA program identifies all the AP-1 (blue) and NFAT (red) paired and displays them as tickmarks. This clustering module can be applied to the three data sets allowing the visualization of clusters in the reference sequence, among the aligned sites, and the conserved sites.
Figure 3
Figure 3
rVISTA analysis algorithm identifies experimentally characterized TFBS. (A) Two functionally characterized NFAT/AP-1 clusters indicated by black vertical arrows ([two sites/60 bp] [Table 1: E7 and E8]) are identified by rVISTA and are the only two clusters of conserved TFBSs present in the IL-5 promoter. The VISTA alignment highlights exons in blue, UTRs in yellow, and conserved noncoding in red. (B) A GATA-3 pair in the IL-5 promoter indicated by black vertical arrow is highly conserved and represents the only functional GATA-3 cluster ([2 GATA/60 bp] [Table 1]) in the proximal promoter (500 bp upstream of the 5′UTR) of this cytokine.
Figure 4
Figure 4
Distribution of conserved GATA-3 binding sites across the 22 promoter regions (2 kb upstream of 5′UTR) of all annotated genes from the 1-Mb cytokine gene cluster (Hs5q31; Mm11). Cytokine genes are labeled by arrows, gray bars indicate observed GATA-3 sites, and open bars represent predicted GATA-3 sites as a result of random distribution. Random distribution was estimated on the basis of the frequency of GATA-3 sites across the 1-Mb human sequence and the DNA conservation of each promoter. (A), conserved individual GATA-3 sites. (B), conserved GATA-3 present in clusters (two or more conserved sites enclosed in a 60-bp DNA fragment).

References

    1. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215:403–410. - PubMed
    1. Berman BP, Nibu Y, Pfeiffer BD, Tomancak P, Celniker SE, Levine M, Rubin GM, Eisen MB. Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome. PNAS. 2002;99:757–762. - PMC - PubMed
    1. Burke TF, Casolaro V, Georas SN. Characterization of P5, a novel NFAT/AP-1 site in the human IL-4 promoter. Biochem Biophys Res Commun. 2000;270:1016–1023. - PubMed
    1. Cakouros D, Cockerill PN, Bert AG, Mital R, Roberts DC, Shannon MF. A NF-kappa B/Sp1 region is essential for chromatin remodeling and correct transcription of a human granulocyte-macrophage colony-stimulating factor transgene. J Immunol. 2001;167:302–310. - PubMed
    1. Cockerill GW, Bert AG, Ryan GR, Gamble JR, Vadas MA, Cockerill PN. Regulation of granulocyte-macrophage colony-stimulating factor and E-selectin expression in endothelial cells by cyclosporin A and the T-cell transcription factor NFAT. Blood. 1995;86:2689–2698. - PubMed

Publication types