Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2003 Jul;132(3):1162-76.
doi: 10.1104/pp.102.017715.

Computational approaches to identify promoters and cis-regulatory elements in plant genomes

Affiliations
Review

Computational approaches to identify promoters and cis-regulatory elements in plant genomes

Stephane Rombauts et al. Plant Physiol. 2003 Jul.

Abstract

The identification of promoters and their regulatory elements is one of the major challenges in bioinformatics and integrates comparative, structural, and functional genomics. Many different approaches have been developed to detect conserved motifs in a set of genes that are either coregulated or orthologous. However, although recent approaches seem promising, in general, unambiguous identification of regulatory elements is not straightforward. The delineation of promoters is even harder, due to its complex nature, and in silico promoter prediction is still in its infancy. Here, we review the different approaches that have been developed for identifying promoters and their regulatory elements. We discuss the detection of cis-acting regulatory elements using word-counting or probabilistic methods (so-called "search by signal" methods) and the delineation of promoters by considering both sequence content and structural features ("search by content" methods). As an example of search by content, we explored in greater detail the association of promoters with CpG islands. However, due to differences in sequence content, the parameters used to detect CpG islands in humans and other vertebrates cannot be used for plants. Therefore, a preliminary attempt was made to define parameters that could possibly define CpG and CpNpG islands in Arabidopsis, by exploring the compositional landscape around the transcriptional start site. To this end, a data set of more than 5,000 gene sequences was built, including the promoter region, the 5'-untranslated region, and the first introns and coding exons. Preliminary analysis shows that promoter location based on the detection of potential CpG/CpNpG islands in the Arabidopsis genome is not straightforward. Nevertheless, because the landscape of CpG/CpNpG islands differs considerably between promoters and introns on the one side and exons (whether coding or not) on the other, more sophisticated approaches can probably be developed for the successful detection of "putative" CpG and CpNpG islands in plants.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Graphical, simplified view of the different elements involved in transcription. The pre-initiation complex (PIC) situated at the nucleosome-free TSS is shown containing RNA polymerase II (large gray hatched oval), the TATA box-binding protein (gray sphere), and a number of general TFs (white circles). Gene regulatory proteins upstream or downstream of the TSS that stimulate gene-specific transcription and also contribute to the PIC assembly are shown as small gray circles.
Figure 2.
Figure 2.
Flow chart of the computational approaches to detect promoters and cis-regulatory elements. 1, Promoter prediction through sequence context and structural features, e.g. CpG islands; 2, CARE prediction through statistics on overrepresentation, such as word counting; 3, CARE prediction through comparative genomics (phylogenetic footprinting); 4, CARE prediction through analysis of co-expressed gene clusters, for instance by Gibbs sampling (for details, see text); 5, Promoter prediction through the identification of CAREs; and 6, CARE motif prediction through comparative analysis of expression profiles. These approaches are not described in the text.
Figure 3.
Figure 3.
CpG island landscape exploration of Arabidopsis gene sequences over a range of CG content and CpG relative frequency. For the various gene elements, on the z axis, the number of CpG islands found in the ARAPROM gene set is plotted against the thresholds defined on the x and y axes, being the CG percentage and the o/e CpG ratio, respectively. The window size was 200 bp. Similar landscapes are obtained for other window sizes (100 and 400 bp) and are available at http://www.psb.rug.ac.be/bioinformatics/.
Figure 4.
Figure 4.
CpNpG island landscape exploration of Arabidopsis gene sequences over a range of CG content and CpNpG relative frequency. For the various gene elements, on the z axis, the number of CpNpG islands found in the ARAPROM gene set is plotted against the thresholds defined on the x and y axes, being the CG percentage and the o/e CpNpG ratio, respectively. The window size was 200 bp. Similar landscapes are obtained for other window sizes (100 and 400 bp) and are available at http://www.psb.rug.ac.be/bioinformatics/.
Figure 5.
Figure 5.
Schematic representation of a set of intergenic sequences upstream of the ATG translation initiation site, with a common motif shown as black boxes. On the basis of such a data set, “words” can be counted and statistically evaluated for their overrepresentation. On the other hand, the “putative” motifs can be aligned and frequencies of occurrence of each nucleotide can be calculated for each column within the generated alignment, producing a position weight matrix. See text for details.

Similar articles

Cited by

References

    1. Arabidopsis Genome Initiative (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408: 796–815 - PubMed
    1. Altschmied J, Delfgaauw J, Wilde B, Duschl J, Bouneau L, Volff JN, Schartl M (2002) Subfunctionalization of duplicate mitf genes associated with differential degeneration of alternative exons in fish. Genetics 161: 259–267 - PMC - PubMed
    1. Antequera F, Bird A (1999) CpG islands as genomic footprints of promoters that are associated with replication origins. Curr Biol 9: R661–R667 - PubMed
    1. Aparicio S, Chapman J, Stupka E, Putnam N, Chia JM, Dehal P, Christoffels A, Rash S, Hoon S, Smit A et al. (2002) Whole-genome shotgun assembly and analysis of the genome of Fugu rubripes. Science 23: 1301–1310 - PubMed
    1. Ashikawa I (2001) Gene-associated CpG islands in plants as revealed by analyses of genomic sequences. Plant J 26: 617–625 - PubMed

Publication types

MeSH terms