Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 Feb;17(2):145-55.
doi: 10.1101/gr.5872707. Epub 2007 Jan 8.

Characterization and predictive discovery of evolutionarily conserved mammalian alternative promoters

Affiliations

Characterization and predictive discovery of evolutionarily conserved mammalian alternative promoters

Daehyun Baek et al. Genome Res. 2007 Feb.

Abstract

Recent studies suggest that surprisingly many mammalian genes have alternative promoters (APs); however, their biological roles, and the characteristics that distinguish them from single promoters (SPs), remain poorly understood. We constructed a large data set of evolutionarily conserved promoters, and used it to identify sequence features, functional associations, and expression patterns that differ by promoter type. The four promoter categories CpG-rich APs, CpG-poor APs, CpG-rich SPs, and CpG-poor SPs each show characteristic strengths and patterns of sequence conservation, frequencies of putative transcription-related motifs, and tissue and developmental stage expression preferences. APs display substantially higher sequence conservation than SPs and CpG-poor promoters than CpG-rich promoters. Among CpG-poor promoters, APs and SPs show sharply contrasting developmental stage preferences and TATA box frequencies. We developed a discriminator to computationally predict promoter type, verified its accuracy through experimental tests that incorporate a novel method for deconvolving mixed sequence traces, and used it to find several new APs. The discriminator predicts that almost half of all mammalian genes have evolutionarily conserved APs. This high frequency of APs, together with the strong purifying selection maintaining them, implies a crucial role in expanding the expression diversity of the mammalian genome.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Average sequence conservation score (using UCSC 17-vertebrate alignment [Siepel et al. 2005]) at each nucleotide position in promoter, first exon, and first intron regions for each promoter type. Exon scores were computed for 50 exonic bases (or half the exon size, for exons <100 bp) from the 5′ or 3′ exon end. APs show on average higher conservation than SPs, and CpG-poor promoters higher conservation than CpG-rich promoters, over several hundred bases in the promoter and first intron, and in the 5′ half of first exon. Conservation patterns in the 3′ half of first exon largely reflect protein-coding constraints. The difference in sequence conservation becomes negligible further upstream of the TSS (which effectively eliminates the possibility that sequence-conservation differences near the TSS reflect large-scale variation in mutational rate rather than purifying selection). Uncertain TSS placement due to variable start sites, common in CpG-rich promoters, may cause some smearing of the conservation pattern for such promoters, but cannot by itself cause the overall weaker pattern. The boxed core promoter is bases −35 to +35 relative to the TSS.
Figure 2.
Figure 2.
Sequence conservation in bases 16–100 upstream of TSS (A) and bases 16–100 in first intron (B) as a function of promoter cluster size (estimated number of promoters in gene, averaged between human and mouse) in 12,025 conserved promoters. Cluster size <2.0 includes SPs.
Figure 3.
Figure 3.
Donor site score (A), CpG island score (A), and promoter usage rate (B) as a function of relative promoter position, and donor-site score as a function of intron size (C). In A and B, AP cases with the promoter cluster size of ≥2 in both human and mouse were analyzed.
Figure 4.
Figure 4.
Positional distribution of TATA box motif, TATA (Butler and Kadonaga 2002) (A), initiator motif, YYANWYY (Butler and Kadonaga 2002) (B), downstream promoter element motif, RGWYV (Butler and Kadonaga 2002) (C), and CTCF binding site (combined count of CCCTCC [Filippova et al. 1996] and its complement) (D). For each nucleotide position, the number of promoters of a given type having a motif copy spanning that position is divided by the total number of promoters of that type. We used known promoters filtered as described in the Methods (Motif discovery and TRANSFAC search).
Figure 5.
Figure 5.
Relative expression in early vs. late developmental stages (A) and in cancer vs. non-cancer cells (B) by promoter type. Expression level Ec in category c for each promoter type was measured by counting the number of aligned ESTs for that promoter type, and dividing by the sum over all four promoter types. Relative expression was computed by (EPrenatal-EPostnatal)/(EPostnatal) in A and (ECancer-ENon-cancer)/(ENon-cancer) in B. We used all conserved promoters that were strongly predicted by our discriminator to be AP or SP (having aLLRs in the top and bottom quartiles of the aLLR distribution, respectively).
Figure 6.
Figure 6.
Relative prevalence of putative housekeeping promoters (A), and the number of expressed tissue types and expression level (in noncancer cells) per promoter (B) by promoter type. CpG-rich SPs are enriched for putative housekeeping promoters (A) and tend to be expressed more broadly (B). We used all conserved promoters that were strongly predicted by our discriminator to be AP or SP (having aLLRs in the top and bottom quartiles of the aLLR distribution, respectively).
Figure 7.
Figure 7.
Summary of functions and expression specificity of different mammalian promoter types.
Figure 8.
Figure 8.
A novel AP on human chromosome 16 discovered by our computational prediction and experimental verification. B is an expanded view of the red rectangle region in A (images captured from the UCSC Genome Browser). A conserved promoter with a representative isoform NM_138383 (shown as LOC92154 above) was predicted to be an AP with an approximate log-likelihood ratio of +21.2, although aligned cDNAs and ESTs only supported a single promoter (BC110072 lacks any exonic overlap with this gene, and thus, is unlikely to represent an upstream AP). The top black box denoted by “YourSeq” represents the genomic alignment of a first exonic sequence found in our oligo-capped RACE reads. In B, the bottom rows indicate highly conserved blocks in the region immediately upstream of the detected first exon. The direction of transcription is from right to left.

References

    1. Baek D., Green P., Green P. Sequence conservation, relative isoform frequencies, and nonsense-mediated decay in evolutionarily conserved alternative splicing. Proc. Natl. Acad. Sci. 2005;102:12813–12818. - PMC - PubMed
    1. Bird A.P. DNA methylation–how important in gene control? Nature. 1984;307:503–504. - PubMed
    1. Butler J.E., Kadonaga J.T., Kadonaga J.T. The RNA polymerase II core promoter: A key component in the regulation of gene expression. Genes & Dev. 2002;16:2583–2592. - PubMed
    1. Carninci P., Sandelin A., Lenhard B., Katayama S., Shimokawa K., Ponjavic J., Semple C.A., Taylor M.S., Engstrom P.G., Frith M.C., Sandelin A., Lenhard B., Katayama S., Shimokawa K., Ponjavic J., Semple C.A., Taylor M.S., Engstrom P.G., Frith M.C., Lenhard B., Katayama S., Shimokawa K., Ponjavic J., Semple C.A., Taylor M.S., Engstrom P.G., Frith M.C., Katayama S., Shimokawa K., Ponjavic J., Semple C.A., Taylor M.S., Engstrom P.G., Frith M.C., Shimokawa K., Ponjavic J., Semple C.A., Taylor M.S., Engstrom P.G., Frith M.C., Ponjavic J., Semple C.A., Taylor M.S., Engstrom P.G., Frith M.C., Semple C.A., Taylor M.S., Engstrom P.G., Frith M.C., Taylor M.S., Engstrom P.G., Frith M.C., Engstrom P.G., Frith M.C., Frith M.C., et al. Genome-wide analysis of mammalian promoter architecture and evolution. Nat. Genet. 2006;38:626–635. - PubMed
    1. Cooper S.J., Trinklein N.D., Anton E.D., Nguyen L., Myers R.M., Trinklein N.D., Anton E.D., Nguyen L., Myers R.M., Anton E.D., Nguyen L., Myers R.M., Nguyen L., Myers R.M., Myers R.M. Comprehensive analysis of transcriptional promoter structure and function in 1% of the human genome. Genome Res. 2006;16:1–10. - PMC - PubMed

Publication types

LinkOut - more resources