Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2006;7(8):R78.
doi: 10.1186/gb-2006-7-8-R78. Epub 2006 Aug 17.

Transcriptional and structural impact of TATA-initiation site spacing in mammalian core promoters

Affiliations
Comparative Study

Transcriptional and structural impact of TATA-initiation site spacing in mammalian core promoters

Jasmina Ponjavic et al. Genome Biol. 2006.

Abstract

Background: The TATA box, one of the most well studied core promoter elements, is associated with induced, context-specific expression. The lack of precise transcription start site (TSS) locations linked with expression information has impeded genome-wide characterization of the interaction between TATA and the pre-initiation complex.

Results: Using a comprehensive set of 5.66 x 10(6) sequenced 5' cDNA ends from diverse tissues mapped to the mouse genome, we found that the TATA-TSS distance is correlated with the tissue specificity of the downstream transcript. To achieve tissue-specific regulation, the TATA box position relative to the TSS is constrained to a narrow window (-32 to -29), where positions -31 and -30 are the optimal positions for achieving high tissue specificity. Slightly larger spacings can be accommodated only when there is no optimally spaced initiation signal; in contrast, the TATA box like motifs found downstream of position -28 are generally nonfunctional. The strength of the TATA binding protein-DNA interaction plays a subordinate role to spacing in terms of tissue specificity. Furthermore, promoters with different TATA-TSS spacings have distinct features in terms of consensus sequence around the initiation site and distribution of alternative TSSs. Unexpectedly, promoters that have two dominant, consecutive TSSs are TATA depleted and have a novel GGG initiation site consensus.

Conclusion: In this report we present the most comprehensive characterization of TATA-TSS spacing and functionality to date. The coupling of spacing to tissue specificity at the transcriptome level provides important clues as to the function of core promoters and the choice of TSS by the pre-initiation complex.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Representative examples of subclasses of SP promoters. Histograms show the fraction of tags that map into the 120 bp region centered on the TC. TC identifiers are shown above each histogram. Three subclasses of the SP TCs defined by Carninci and coworkers [8] were analyzed: (a) single-TSS promoters having a single well defined TSS; (b) shallow-TSS promoters, which is the subset of single TSS promoters that have one sharp peak surrounded by multiple weakly defined TSSs; and (c) twin-TSS classed promoters, which are characterized by two closely located, well defined TSSs, and in turn can be classified by the number of base pairs in between them (0-3 bp spacing). bp, base pair; SP, single peak; TC, tag cluster; TSS, transcription start site.
Figure 2
Figure 2
Tissue specificity measured by relative entropy. (a) Tissue specificity correlation between EST and CAGE data sources, measured as the mean relative entropy in each of the nine gene sets. Standard error bars for CAGE (red) and EST (blue) are shown. The plots of the six tissue-specific sets are distinct from the three ubiquitously expressed sets. (b) Tissue specificity correlation between EST and CAGE data sources, using the tissue specificity (relative entropy) of individual genes in each set. Spearman correlation coefficients and associated P values rejecting the null hypothesis (no correlation) are shown in Table 1. CAGE, cap analysis of gene expression; EST, expressed sequence tag.
Figure 3
Figure 3
The spacing between TATA box and the dominant TSS is associated with transcriptional specificity. (a) Tissue specificity (measured as median relative entropy) for promoters with different TATA-TSS spacing. Positions with 20 counts or more are shown as red dots with standard error bars. (b) Histogram showing number of promoters with the TATA box located at a given position. In both plots, only the most prominent TATA box is considered in each promoter. Both representations indicate that most functional TATA boxes reside in a narrow 4 bp window from positions -32 to -29, dominated by positions -31 and -30. The rapid decrease in site counts and transcriptional specificity downstream of -29 suggests that 28 bp is the minimal TATA-TSS distance for TATA-driven initiation; it might also have functional properties distinct from more favorable spacings (see main text). bp, base pair; TSS, transcription start site.
Figure 4
Figure 4
TATA-TSS spacing influences initiation site usage. Histogram showing the distribution of the four possible dinucleotides (PyPu, PyPy, PuPy, and PuPu) at the initiation site [-1, +1] for promoters with the TATA box located at each position in the -34 to -28 range. As described previously [8], initiation sites composed of PyPu dinucleotides are the most prominent, regardless of spacing. The dinucleotide distribution is significantly different for promoters where the TATA box starts at -28. Pu, purine; Py, pyrimidine.
Figure 5
Figure 5
Extended TATA-TSS distances require unambiguous PyPu initiation sites. The fraction of PyPu dinucleotides in a sliding 2 bp wide window was calculated for each TATA spacing class in the [-5, +5] promoter region. Promoters with extended TATA-TSS distances (32-34 bp) are depleted of PyPu dinucleotides immediately upstream of the dominant TSS [-1,+1] (namely, [-2,-1] and [-3,-2]; fraction of PyPu dinucleotides shown as grey bars) and have a PyPu consensus at this site. Introduction of PyPu dinucleotides in this region would probably create new TSSs with a more favored distance to the TATA box. The PyPu distribution is largely symmetrical in promoters where the TATA box is located at position -31 to -29, indicating a possible intrinsic stretching mechanism within the PIC for selecting strong initiation sites located further away than the most favored distance (30 or 31 bp). bp, base pairs; PIC, pre-initiation complex; Pu, purine; Py, pyrimidine; TSS, transcription start site.
Figure 6
Figure 6
TATA-TSS spacing is correlated with promoter and initiation site characteristics. (a-g) Sequence logos [43] for promoters divided into spacing subclasses based on the location of the most prominent TATA box. CAGE tag distribution trends in each spacing subclasses are shown below each logo; specifically, the median fraction of CAGE tags within each promoter for each spacing class is plotted using a log-scaled y-axis (see Materials and methods). The locations of the dominant TSS and the TATA-box start are indicated with black arrows. Both the initiation site (positions -3 to +1) consensus and CAGE tag distributions differ between the different classes. Of particular interest is the extended initiation site motif for promoters located at -33 and -34, as well as the different consensus for promoters with TATA boxes located at -28. The CAGE tag distribution is skewed in a direction that is consistent with alternative start sites at a more favorable spacing (closer to position -30 or -31). CAGE, cap analysis of gene expression; TSS, transcription start site.
Figure 7
Figure 7
Non-optimal TATA-TSS spacing is compensated for by increased signal strength in the TSS region. The signal strength around the initiation site [-5,+5] (measured as information content in bits [45]) is lowest in promoters that have the most favored TATA-TSS spacings (30 and 31 bp). The signal strength is increased in promoters with a TATA-TSS spacing ranging from 32 to 34 bp. This increase is due to an extended initiation site motif, as shown in corresponding sequence logos in Figure 6. bp, base pairs; TSS, transcription start site. bp, base pairs; TSS, transcription start site.
Figure 8
Figure 8
HMM simulations demonstrate increased signal strength as a result of PyPu depletion. Sequence logos resulting from sequence generation using an HMM incorporating rules for describing PyPu usage (see Materials and methods). Specifically, PyPu dinucleotides are not allowed in positions where they would introduce new initiation sites with more favorable TATA-TSS distances (-31, -32, and so on until the known spacing occurs). This results in an increase of Py nucleotides upstream of the TSS. bp, base pairs; HMM, Hidden Markov Model; Pu, purine; Py, pyrimidine; TSS, transcription start site.
Figure 9
Figure 9
Exploration of the effects of TATA-TBP interaction strength on tissue specificity. We investigated possible dependencies between tissue specificity measured by relative entropy and three aspects of TATA-TBP interaction potential in the -40 to -19 region of each promoter: (a) the predicted TATA box with the highest score fulfilling the score threshold criteria defined in Materials and methods; (b) the sum of all predicted TATA boxes each fulfilling the specified score criteria; and (c) the predicted TATA box with the highest score fulfilling certain score threshold criteria, given TATA box location. For clarity, each plot in panel c corresponds to one type of TATA-TSS spacing, and can be considered a subset of the data points in panel a. The subdivision of the TATA-containing promoters into the different TATA-TSS spacing classes confers no additional support for a significant relation between TBP-TATA interaction strength and transcriptional specificity. In combination with panel a, this strongly suggests that TATA-TSS distance is more strongly linked to tissue specificity than the TATA-TBP interaction strength within TATA-driven core promoters. TBP, TATA box binding protein; TSS, transcription start site.
Figure 10
Figure 10
Exploration of SP-class promoters with twin-TSS. (a-d) Sequence logo representations of promoters with two close, dominant peaks separated by 0-3 bp. In contrast to previous sequence logos, we applied no constraint on TATA presence for promoter inclusion. Black arrows denote the location of the two dominant TSSs. The +1 position is arbitrarily defined as the position of the TSS located the furthest upstream. When there is no spacing between the peaks, promoters are depleted of TATA boxes. This type of promoter has an atypical initiation site consensus closely resembling that of transcripts in 3' untranslated region promoters [8]. More diverged peaks have a higher amount of TATA-like motifs around position -30 with respect to the most upstream peak. bp, base pairs; TSS, transcription start site.
Figure 11
Figure 11
Pseudo-code corresponding to the Hidden Markov model simulation.

References

    1. Kadonaga JT. Regulation of RNA polymerase II transcription by sequence-specific DNA binding factors. Cell. 2004;116:247–257. doi: 10.1016/S0092-8674(03)01078-X. - DOI - PubMed
    1. Butler JE, Kadonaga JT. The RNA polymerase II core promoter: a key component in the regulation of gene expression. Genes Dev. 2002;16:2583–2592. doi: 10.1101/gad.1026202. - DOI - PubMed
    1. Smale ST, Kadonaga JT. The RNA polymerase II core promoter. Annu Rev Biochem. 2003;72:449–479. doi: 10.1146/annurev.biochem.72.121801.161520. - DOI - PubMed
    1. Hampsey M. Molecular genetics of the RNA polymerase II general transcriptional machinery. Microbiol Mol Biol Rev. 1998;62:465–503. - PMC - PubMed
    1. Hahn S. Structure and mechanism of the RNA polymerase II transcription machinery. Nat Struct Mol Biol. 2004;11:394–403. doi: 10.1038/nsmb763. - DOI - PMC - PubMed

Publication types

LinkOut - more resources