Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 1998 Mar;8(3):319-26.
doi: 10.1101/gr.8.3.319.

Identification of human gene core promoters in silico

Affiliations

Identification of human gene core promoters in silico

M Q Zhang. Genome Res. 1998 Mar.

Abstract

Identification of the 5'-end of human genes requires identification of functional promoter elements. In silico identification of those elements is difficult because of the hierarchical and modular nature of promoter architecture. To address this problem, I propose a new stepwise strategy based on initial localization of a functional promoter into a 1- to 2-kb (extended promoter) region from within a large genomic DNA sequence of 100 kb or larger and further localization of a transcriptional start site (TSS) into a 50- to 100-bp (corepromoter) region. Using positional dependent 5-tuple measures, a quadratic discriminant analysis (QDA) method has been implemented in a new program-CorePromoter. Our experiments indicate that when given a 1- to 2-kb extended promoter, CorePromoter will correctly localize the TSS to a 100-bp interval approximately 60% of the time. [Figure 3 can be found in its entirety as an online supplement at http://www.genome.org.]

PubMed Disclaimer

Figures

Figure 1
Figure 1
Core promoter organization: (UPE and DPE) Upstream and downstream promoter elements; (X) a UPE-binding TF; (CIF) a Co–Inr TF; (A,B,D,F,E,H) TFIIA, TFIIB, etc.; (TBP) the TATA box-binding protein; (TF150 and TF250) TBP-associated factors (TAFs) 150 and 250, respectively.
Figure 2
Figure 2
Top panels are LDA profiles for 10 extended EPD48 human promoter sequences. Bottom panels are QDA profiles for the same 10 sequences. The EPD entry-ID is indicated for each sequence. The vertical lines are the true TSS positions. The sequence range is (−600, +600). A peak in the profile indicates a high likelihood for a TSS.
Figure 3
Figure 3
QDA profiles (in log10 scale) for a newly constructed nonredundant human promoter database LEDB (673 sequences with 55 identical to EPD sequences) were depicted by up to the three highest peaks (see text for details). The GenBank accession no. and the TSS position are indicated at left. Each peak is also indicated by its rank number: (1) the highest peak in the whole profile, (2) the second highest, etc. The sequence range is (−600, +600), and the true TSS position is indicated by a vertical dotted line.
Figure 4
Figure 4
Similar QDA profiles to those in Fig. 3 for the 122 extended EPD promoters that were not used as the training set. (1) A strong TATA promoter.
Figure 5
Figure 5
The scatter plot of QDA scores (in log10 scale) for the peaks in (−50, +50) in Fig. 4 vs. the TATA scores (see text for detail).
Figure 6
Figure 6
Feature variables in discriminant analyses.

References

    1. Bucher P. Weight matrix descriptions of four eukaryotic RNA polymerase II promoter elements derived from 502 unrelated promoter sequences. J Mol Biol. 1990;212:563–578. - PubMed
    1. Bucher P, Trifonov EN. Compilation and analysis of eukaryotic POL II promoter sequences. Nucleic Acids Res. 1986;14:10009–10026. - PMC - PubMed
    1. Business Week. Sept. 2, 1996. Hunting through the “garbage” for DNA.Business Week. Sept. 2, 1996. Hunting through the “garbage” for DNA.
    1. Claverie J-M. Computational methods for the identification of genes in vertebrate genomic sequences. Hum Mol Genet. 1997;6:1735–1744. - PubMed
    1. Fickett JW, Hatzigeorgiou AG. Eukaryotic promoter recognition. Genome Res. 1997;7:861–878. - PubMed

Publication types

LinkOut - more resources