Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2005 Jul 27;33(13):4255-64.
doi: 10.1093/nar/gki737. Print 2005.

Large-scale structural analysis of the core promoter in mammalian and plant genomes

Affiliations

Large-scale structural analysis of the core promoter in mammalian and plant genomes

Kobe Florquin et al. Nucleic Acids Res. .

Abstract

DNA encodes at least two independent levels of functional information. The first level is for encoding proteins and sequence targets for DNA-binding factors, while the second one is contained in the physical and structural properties of the DNA molecule itself. Although the physical and structural properties are ultimately determined by the nucleotide sequence itself, the cell exploits these properties in a way in which the sequence itself plays no role other than to support or facilitate certain spatial structures. In this work, we focus on these structural properties, comparing them between different organisms and assessing their ability to describe the core promoter. We prove the existence of distinct types of core promoters, based on a clustering of their structural profiles. These results indicate that the structural profiles are much conserved within plants (Arabidopsis and rice) and animals (human and mouse), but differ considerably between plants and animals. Furthermore, we demonstrate that these structural profiles can be an alternative way of describing the core promoter, in addition to more classical motif or IUPAC-based approaches. Using the structural profiles as discriminatory elements to separate promoter regions from non-promoter regions, reliable models can be built to identify core-promoter regions using a strictly computational approach.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Sequence information is converted to numerical profiles. In this example, the trinucleotide bendability model by Brukner et al. (11) is used, based on Dnase I cutting frequencies. The enzyme Dnase I preferably binds to the minor groove and cuts DNA that is bent, or bendable toward the major groove. Therefore, Dnase I cutting frequencies on naked DNA can be interpreted as a quantitative measure of major groove compressibility or bendability. These frequencies allow for the derivation of bendability parameters for 32 complementary trinucleotide pairs and range from −0.280 to +0.194. To evaluate different smoothings of the raw profile signal (see text), a sliding window approach was used with steps of 1 and a window size of 1–10, respectively.
Figure 2
Figure 2
For Arabidopsis and rice, in-house core-promoter datasets were constructed. The ARAPROM dataset (7088 promoter sequences) was constructed by aligning full-length cDNA sequences, generated by Seki et al. (31), with the genomic sequence. The RICEPROM dataset consists of 2195 putative promoter sequences. From each original promoter sequence, 100 bp upstream and 50 bp downstream of the TSS were selected. As negative datasets, we extracted 150 bp from the non-promoter sequence part, including intron, exon and intergenic sequences. In addition, three randomized datasets were constructed, based on randomizing the core-promoter sequences.
Figure 3
Figure 3
Profiles based on the structural model ‘duplex disrupt energy’ and window size 10 are shown for the four highest quality value clusters for Arabidopsis, rice, human and mouse (42). The position of the transcription start site is shown on the different structural profiles.
Figure 4
Figure 4
Influence of the window size on the classification results. This shows the discriminative power to distinguish core-promoter sequences from non core-promoter sequences—for all structural models and for window sizes 1–10. For each structural model, all core-promoter sequences from the clusters with the highest quality value were mixed with 75% sequences coming from the dinucleotide-randomized dataset. The F-value, which combines sensitivity and specificity, is a measure for the overall performance of discriminating between core-promoter sequences and non-core-promoter sequences. Classification results were based on applying the LSVM classification method.
Figure 5
Figure 5
(a–j) The first 10 clusters, as inferred by the AQBC method, of human structural profiles obtained using bendability as a structural model with window size 10 are shown. All the core promoters are aligned based on the TSS and each profile corresponds to 100 bp downstream of the TSS and 50 bp upstream.

Similar articles

Cited by

References

    1. Sinden R.R. DNA: Structure and Function. San Diego, CA: Academic press; 1994.
    1. Pedersen A.G., Baldi P., Chauvin Y., Brunak S. DNA structure in human RNA polymerase II promoters. J. Mol. Biol. 1998;281:663–673. - PubMed
    1. Perez-Martin J., de Lorenzo V. Clues and consequences of DNA bending in transcription. Annu. Rev. Microbiol. 1997;51:593–628. - PubMed
    1. Liao G.C., Rehm E.J., Rubin G.M. Insertion site preferences of the P transposable element in Drosophila melanogaster. Proc. Natl Acad. Sci. USA. 2000;97:3347–3351. - PMC - PubMed
    1. Lu X.J., Shakked Z., Olson W.K. A-form conformational motifs in ligand-bound DNA structures. J. Mol. Biol. 2000;300:819–840. - PubMed

Publication types