Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 Mar 8:8:67.
doi: 10.1186/1471-2164-8-67.

Identification of plant promoter constituents by analysis of local distribution of short sequences

Affiliations

Identification of plant promoter constituents by analysis of local distribution of short sequences

Yoshiharu Y Yamamoto et al. BMC Genomics. .

Abstract

Background: Plant promoter architecture is important for understanding regulation and evolution of the promoters, but our current knowledge about plant promoter structure, especially with respect to the core promoter, is insufficient. Several promoter elements including TATA box, and several types of transcriptional regulatory elements have been found to show local distribution within promoters, and this feature has been successfully utilized for extraction of promoter constituents from human genome.

Results: LDSS (Local Distribution of Short Sequences) profiles of short sequences along the plant promoter have been analyzed in silico, and hundreds of hexamer and octamer sequences have been identified as having localized distributions within promoters of Arabidopsis thaliana and rice. Based on their localization patterns, the identified sequences could be classified into three groups, pyrimidine patch (Y Patch), TATA box, and REG (Regulatory Element Group). Sequences of the TATA box group are consistent with the ones reported in previous studies. The REG group includes more than 200 sequences, and half of them correspond to known cis-elements. The other REG subgroups, together with about a hundred uncategorized sequences, are suggested to be novel cis-regulatory elements. Comparison of LDSS-positive sequences between Arabidopsis and rice has revealed moderate conservation of elements and common promoter architecture. In addition, a dimer motif named the YR Rule (C/T A/G) has been identified at the transcription start site (-1/+1). This rule also fits both Arabidopsis and rice promoters.

Conclusion: LDSS was successfully applied to plant genomes and hundreds of putative promoter elements have been extracted as LDSS-positive octamers. Identified promoter architecture of monocot and dicot are well conserved, but there are moderate variations in the utilized sequences.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Examples of distribution of peaks. Several examples of hexamer analysis against Arabidopsis promoters are shown. The vertical axis indicates the total count of the whole promoter database. Gray and solid lines show raw and average with 15 bin (width of window), respectively. Instead of the promoter database, a set of 3,000 random fragments of 1 kb length from the Arabidopsis genome were used for the occurrence analysis as negative controls (shown as "random genome" in the bottom columns).
Figure 2
Figure 2
Parameters for peak detection. (A) Graph is a distribution profile of CACGTG in Arabidopsis promoters. Average with 15 bin is shown. The dotted line indicates the Base Line, which is an average of -1,000 to -500. The light grey area shows Peak Area. The dark grey area is Δarea, an indication of the fluctuation from the Base Line from -1,000 to -500. In addition, the following parameters have been defined: Relative Peak Area (RPA) = Peak Area/total area; Relative Peak Height (RPH) = peak height/Base Line; Peak Area/basal fluctuation = Peak Area/Δarea per peak width; Peak height/SD = peak height/standard deviation of occurrence from -1000 to -500. Several parameters of this graph are shown in Table 2 (CACGTG). (B) All the hexamers were analyzed to obtain various parameters, and (Peak Area/basal fluctuation) and peak position were calculated. The graph shows the results. Each dot shows the data of an individual hexamer. Among the 4,096 hexamers (grey dots), 247 peak positive hexamers have been selected (solid dots). The graph demonstrates that hexamers with a significant value have a peak position from -200 to -13 (the most downstream position after smoothing).
Figure 3
Figure 3
Directional preference of LDSS-positive hexamers. When the corresponding complementary sequence was not found in the LDSS-positive group, the hexamer was counted as "uniq", which means orientation-sensitive. When found, the sequence was counted as "comp", meaning direction-insensitive. The number of both hexamers were counted according to the peak position from the TSS, and summarized in a bar graph. The inset graph is an enlargement to show more detail around the TSS.
Figure 4
Figure 4
Comparison of Arabidopsis and rice octamers. (A) 987 octamers that are LDSS-positive in either Arabidopsis or rice promoters were selected and their Relative Peak Height (RPH) was compared and expressed as a scatter plot. Each dot is data from an individual octamer sequence. (B) LDSS-positive octamer sequences of Arabidopsis and rice were compared, and common sequences found in both sets were identified. The figure shows the number of octamer sequences. Classification into the Y and TATA groups were done based on distribution profiles as shown in Figure 5. The REG group has a peak position between -51 and -200.
Figure 5
Figure 5
Clustering of LDSS-positive sequences based on distribution profiles. Distribution profiles of each LDSS-positive octamer of Arabidopsis were subjected to hierarchical clustering. Three major clusters are shown.
Figure 6
Figure 6
REG-promoter clustering. For each Arabidopsis promoter, number of each octamer REG within a region from -400 to -40 bp was scored, and subjected to 2D hierarchical clustering. The vertical axis shows promoters and the horizontal axis does REGs. The matrix means number of REG sequences. Two small promoter clusters are shown in the figure together with the whole REGs. (A) A part of promoter cluster rich in GCCCA motif for meristematic expression. Ribosomal proteins are shown in blue. (B) A part of promoter cluster rich in ACGT motif for environmental response. Promoter names are expressed in color according to expression data from AtGenExpress. Red: abiotic stress-positive, orange: abiotic stress-negative, green: light-positive, black: no response to abiotic stress or light, grey: no expression data found. (C) An example of clustered REGs. A part of the ACGT cluster shown in the top of Panel A is enlarged. ACGT in the octamers are highlighted with orange.
Figure 7
Figure 7
Clustering of REGs. Aided by REG-promoter clustering, Arabidopsis REGs were subjected to classification. Colored dots in the figure mean presence of the corresponding motif in the REG sequence. The tree is the same as one in Figure 6A.
Figure 8
Figure 8
Identification of YR Rule. (A) Dinucleotide sequences at the -1/+1 position relative to Arabidopsis TSS, determined by information of the fl-cDNAs, were counted. As shown, most of the TSS have (C/T)(A/G), and this YR Rule applies to 77% of the analyzed TSSs. (B) Frequency of dinucleotide sequences fitting with YR Rule was scanned from -5 to +5 of Arabidopsis and rice TSS. Position of the downstream site of the dimer is shown. For example, the -1/+1 position is indicated as "1". Theoretically frequency of YR in non-biased sequence is 0.25.
Figure 9
Figure 9
Illustration of YR Rule, Y Patch, TATA box, and REG. (A) Expected appearance positions relative to the TSS are as follows: YR Rule (-1/+1), Y Patch (-100 to -1), TATA box (-50 to -20), REG (-20 to -400). Among them, only the REG is orientation-insensitive, and the other groups are sensitive. In many cases the Y Patch locates between the TATA boxes and the TSS, but it is also observed upstream of the TATA boxes. (B) An example of an Arabidopsis promoter that has a Y Patch and TATA box. At1g10960 is one of the promoters clustered in Figure 6B. The promoter sequence from -100 to +1 is shown together with octamer motifs. Marks on the sequence are the same as illustrated in (A).

References

    1. Carey M, Smale ST. Transcriptional regulation in eukaryotes. New York , Cold Spring Harbor Laboratory Press; 2001. Concepts and strategies: I. promoter and the general transcription machinery.
    1. Butler JE, Kadonaga JT. The RNA polymerase II core promoter: a key component in the regulation of gene expression. Genes Dev. 2002;16:2583–2592. doi: 10.1101/gad.1026202. - DOI - PubMed
    1. Smale ST, Kadonaga JT. The RNA polymerase II core promoter. Annu Rev Biochem. 2003;72:449–479. doi: 10.1146/annurev.biochem.72.121801.161520. - DOI - PubMed
    1. Antequera F, Bird A. Number of CpG islands and genes in human and mouse. Proc Natl Acad Sci USA. 1993;90:11995–11999. doi: 10.1073/pnas.90.24.11995. - DOI - PMC - PubMed
    1. Ioshikhes IP, Zhang MQ. Large-scale human promoter mapping using CpG islands. Nat Genet. 2000;26:61–63. doi: 10.1038/79189. - DOI - PubMed

Publication types

MeSH terms

Substances