Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 Mar 1;389(1):52-65.
doi: 10.1016/j.gene.2006.09.029. Epub 2006 Oct 10.

Prevalence of the initiator over the TATA box in human and yeast genes and identification of DNA motifs enriched in human TATA-less core promoters

Affiliations

Prevalence of the initiator over the TATA box in human and yeast genes and identification of DNA motifs enriched in human TATA-less core promoters

Chuhu Yang et al. Gene. .

Abstract

The core promoter of eukaryotic genes is the minimal DNA region that recruits the basal transcription machinery to direct efficient and accurate transcription initiation. The fraction of human and yeast genes that contain specific core promoter elements such as the TATA box and the initiator (INR) remains unclear and core promoter motifs specific for TATA-less genes remain to be identified. Here, we present genome-scale computational analyses indicating that approximately 76% of human core promoters lack TATA-like elements, have a high GC content, and are enriched in Sp1-binding sites. We further identify two motifs - M3 (SCGGAAGY) and M22 (TGCGCANK) - that occur preferentially in human TATA-less core promoters. About 24% of human genes have a TATA-like element and their promoters are generally AT-rich; however, only approximately 10% of these TATA-containing promoters have the canonical TATA box (TATAWAWR). In contrast, approximately 46% of human core promoters contain the consensus INR (YYANWYY) and approximately 30% are INR-containing TATA-less genes. Significantly, approximately 46% of human promoters lack both TATA-like and consensus INR elements. Surprisingly, mammalian-type INR sequences are present - and tend to cluster - in the transcription start site (TSS) region of approximately 40% of yeast core promoters and the frequency of specific core promoter types appears to be conserved in yeast and human genomes. Gene Ontology analyses reveal that TATA-less genes in humans, as in yeast, are frequently involved in basic "housekeeping" processes, while TATA-containing genes are more often highly regulated, such as by biotic or stress stimuli. These results reveal unexpected similarities in the occurrence of specific core promoter types and in their associated biological processes in yeast and humans and point to novel vertebrate-specific DNA motifs that might play a selective role in TATA-independent transcription.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Structure of the core promoter in eukaryotic genes and sequence of core promoter elements. (A) The positions in nucleotides (nt) relative to the transcription start site (TSS, +1) are given for core promoter elements: BRE, TFIIB response element; TATA, TATA box; INR, initiator element; DPE, downstream promoter element. (B) Consensus sequences of the core promoter elements used in this study. Key, IUPAC nomenclature.
Fig. 2
Fig. 2
GC/AT content of human promoters. (A) Average GC/AT content profile of 15,685 non-redundant human genes in the UCSC database between -2000 and +2000 relative to the TSS (+1). (B) GC/AT content profile as in (A) for the narrower region -250 to +250. (C) GC/AT content profiles as in (B) but for only those human genes in the UCSC database that contain at least one TATA-532 element between -150 and +50 (5077 genes, ∼32%) as defined in Fig. 1 and the text. (D) As in (C) but for the remaining TATA-less genes (10,608 genes, ∼68%).
Fig. 3
Fig. 3
Distribution of the TATA box in human and yeast promoters. (A) Frequency profile of TATA-532 elements (top) and the canonical TATA-8 consensus (bottom) within the region -250 to +150 relative to the TSS (+1) of 10,271 non redundant human genes from the DBTSS database. (B) Frequency profile of the canonical TATA-8 consensus in the -250 to +150 region of all genes in S. pombe (5,095 genes) from the NCBI database. (C) Frequency profile of the canonical TATA-8 consensus in the -250 to +150 region of all genes in S. cerevisiae from the NCBI database (6165 genes, top), as well as those S. cerevisiae genes that do not contain an ATG at +1 (3195 genes, middle) and those that do (2970 genes, bottom) (see text for details). Only the TATA-8 sequences were used for the profile analysis; the TATA-532 search yielded no peak. Note the difference in frequency scale on the y-axis between S. cerevisiae and S. pombe. Bin size is 5 nt.
Fig. 4
Fig. 4
Distribution of INR elements in human, Drosophila and yeast promoters. Frequency profiles of the mammalian INR (see Fig. 1) in the -100 to +50 region relative to the TSS (+1) of (A) 10,271 human genes from the DBTSS database, (B) 13,923 Drosophila genes from the NCBI database, (C) 5095 S. pombe genes from the NCBI database and (D) S. cerevisiae genes as defined in Fig. 3C. Bin size is 3 nt.
Fig. 5
Fig. 5
Frequencies of the different categories of core promoters in human and yeast genes. (A) Human promoters (10,271 total) from DBTSS were searched by scanning a 110 nt window within the -80 to +80 region relative to TSS (+1) for the existence of TATA-532 and INR elements (see Fig. 1). TATA only, genes with a TATA box but no INR in the -80 to +80 region; TATA+INR, genes having both TATA and INR at a fixed orientation and distance from each other (TATA box 15 to 30 nt upstream of the INR, 211 genes, 2.1%) as well as genes with a TATA and INR in any orientation and spacing within the -80 to +80 region (1358 promoters, 13.2%); INR only, genes with an INR element but no TATA box; None, genes with neither a TATA box nor an INR element in the -80 to +80 region. The number (No.) and percent (%) of genes of each category are given. (B) S. cerevisiae promoters (6165 genes) were searched for the presence of at least one TATA-8 element in the -150 to +1 region and/or one INR element in the -25 to +25 region and grouped into different promoter categories as in (A). The number (No.) and percent (%) of genes in each category are given.
Fig. 6
Fig. 6
Conserved promoter motifs selectively enriched in human TATA-less promoters. (A) The four categories of human promoters described in Fig. 5 (TATA only, TATA+INR, INR only, and None) were searched in the -250 to +150 region with the indicated consensus motifs from (Xie et al., 2005): M6 (an Sp1 binding motif), M3 (an ELK-1 binding motif), and M22 (binding factor unknown). The percent (%) and number (in parentheses) of genes with the indicated motifs that belong to each core promoter category are given and compared to all the genes (All). Frequency profiles of (B) M6 motifs, (C) M3 motifs, and (D) M22 motifs, within the -250 to +150 region of TATA-containing genes (2434 total) and TATA-less genes (7837 total) as defined in Fig. 5 are shown with a bin size of 10 nt.
Fig. 7
Fig. 7
Human genes in distinct core promoter categories are associated with different biological processes. (A) Shown are the most overrepresented Biological Processes from Gene Ontology (GO, http://www.geneontology.org/) for different core promoter categories: TATA only, INR only and None as defined in Fig. 5. TATA plus INR (module) is a subset of TATA+INR genes in which a TATA box is 15 to 30 nt upstream of an INR within the -80 to +80 window (see text for more details). Given is the percent (and number) of genes in a given promoter category that fall within a given Biological Process (total number of genes annotated for Biological Processes in a given promoter category is also indicated), with the corresponding EASE (Expression Analysis Systematics Explorer) score (http://apps1.niaid.nih.gov/david/; Hosack et al., 2003) which uses the upper bound of distribution of jackknife Fisher exact probabilities to distinguish enriched gene categories with respect to the entire DBTSS database. Shown are the largest non overlapping gene categories with EASE scores of 0.0001 or lower, that are specific to a given promoter category. (B) Histogram comparing the EASE scores of GO Biological Processes of the “TATA only” and “ None” categories. Shown are all categories for which the EASE score is <0.001 for either category. The Biological Process “energy derivation by oxidation of organic compounds” is truncated. Bold, Biological Processes shown in (A). The “INR only” category was not as specifically enriched compared to the “None” category, except for the Biological processes: Protein biosynthesis mRNA metabolism and oxidative phosphorylation (see supplementary Fig. S8). See supplementary Fig. S9 for a comparison of the Cellular Component of “TATA only” v. the “None” genes.

Similar articles

Cited by

References

    1. Aerts S, Thijs G, Dabrowski M, Moreau Y, De Moor B. Comprehensive analysis of the base composition around the transcription start site in Metazoa. BMC Genomics. 2004;5:34–44. - PMC - PubMed
    1. Aso T, Conaway JW, Conaway RC. Role of core promoter structure in assembly of the RNA polymerase II preinitiation complex. A common pathway for formation of preinitiation intermediates at many TATA and TATA-less promoters. J. Biol. Chem. 1994;269:26575–26583. - PubMed
    1. Bajic VB, Choudhary V, Hock CK. Content analysis of the core promoter region of human genes. In Silico Biol. 2004;4:109–125. - PubMed
    1. Basehoar AD, Zanton SJ, Pugh BF. Identification and distinct regulation of yeast TATA box-containing genes. Cell. 2004;116:699–709. - PubMed
    1. Bazykin GA, Kondrashov AS. Rate of promoter class turn-over in yeast evolution. BMC Evol. Biol. 2006;6:14. - PMC - PubMed

Publication types

LinkOut - more resources