Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2002 Mar;12(3):458-61.
doi: 10.1101/gr.216102.

Computational detection and location of transcription start sites in mammalian genomic DNA

Affiliations

Computational detection and location of transcription start sites in mammalian genomic DNA

Thomas A Down et al. Genome Res. 2002 Mar.

Abstract

Transcription, the process whereby RNA copies are made from sections of the DNA genome, is directed by promoter regions. These define the transcription start site, and also the set of cellular conditions under which the promoter is active. At least in more complex species, it appears to be common for genes to have several different transcription start sites, which may be active under different conditions. Eukaryotic promoters are complex and fairly diffuse structures, which have proven hard to detect in silico. We show that a novel hybrid machine-learning method is able to build useful models of promoters for >50% of human transcription start sites. We estimate specificity to be >70%, and demonstrate good positional accuracy. Based on the structure of our learned models, we conclude that a signal resembling the well known TATA box, together with flanking regions of C-G enrichment, are the most important sequence-based signals marking sites of transcriptional initiation at a large class of typical promoters.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Schematic of Eponine core promotor model, showing the constraint distributions and weight-matrix consensus sequences.
Figure 2
Figure 2
Density of predictions from Eponine relative to the annotated TSSs of (a) EPD entries and (b) chromosome 22 mRNAs. In the latter case, directionality of predictions was ignored (in common with the rest of the chromosome-scale evaluation in this paper).
Figure 2
Figure 2
Density of predictions from Eponine relative to the annotated TSSs of (a) EPD entries and (b) chromosome 22 mRNAs. In the latter case, directionality of predictions was ignored (in common with the rest of the chromosome-scale evaluation in this paper).
Figure 3
Figure 3
Construction of the pseudochromosome, selecting only those regions where a full mRNA (transcript) is annotated. In the case where an mRNA-annotated gene is followed by a coding-sequence-only gene in the same orientation, the sequence is cut at the midpoint between the two genes.
Figure 4
Figure 4
Intersection of ‘correct’ predictions of promoters by Eponine, PromoterInspector and CpG islands of chromosome 22 mRNAs.

References

    1. Audic S, Claverie JM. Detection of eukaryotic promoters using Markov transition matrices. Comput Chem. 1997;21:223–227. - PubMed
    1. Bucher P. Weight matrix descriptions of four eukaryotic RNA polymerase II promoter elements derived from 502 unrelated promoter sequences. J Mol Biol. 1990;212:563–578. - PubMed
    1. Dunham I, Hunt AR, Collins JE, Bruskiewich R, Beare DM, Clamp M, Smink LJ, Ainscough R, Almeida JP, Babbage A, et al. The DNA sequence of human chromosome 22. Nature. 1999;402:489–495. - PubMed
    1. Fickett JW, Hatzigeorgiou AG. Eukaryotic promoter recognition. Genome Res. 1997;7:861–878. - PubMed
    1. Grundy WN, Bailey TL, Elkan CP, Baker ME. Meta-MEME: Motif-based hidden Markov models of protein families. Comput Appl Biosci. 1997;13:397–406. - PubMed

Publication types

LinkOut - more resources