Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2002:3:7.
doi: 10.1186/1471-2105-3-7. Epub 2002 Feb 14.

Correlating overrepresented upstream motifs to gene expression: a computational approach to regulatory element discovery in eukaryotes

Affiliations
Comparative Study

Correlating overrepresented upstream motifs to gene expression: a computational approach to regulatory element discovery in eukaryotes

Michele Caselle et al. BMC Bioinformatics. 2002.

Abstract

Background: Gene regulation in eukaryotes is mainly effected through transcription factors binding to rather short recognition motifs generally located upstream of the coding region. We present a novel computational method to identify regulatory elements in the upstream region of eukaryotic genes. The genes are grouped in sets sharing an overrepresented short motif in their upstream sequence. For each set, the average expression level from a microarray experiment is determined: If this level is significantly higher or lower than the average taken over the whole genome, then the overerpresented motif shared by the genes in the set is likely to play a role in their regulation.

Results: The method was tested by applying it to the genome of Saccharomyces cerevisiae, using the publicly available results of a DNA microarray experiment, in which expression levels for virtually all the genes were measured during the diauxic shift from fermentation to respiration. Several known motifs were correctly identified, and a new candidate regulatory sequence was determined.

Conclusions: We have described and successfully tested a simple computational method to identify upstream motifs relevant to gene regulation in eukaryotes by studying the statistical correlation between overepresented upstream motifs and gene expression levels.

PubMed Disclaimer

Figures

Figure 1
Figure 1
expression of the genes in the set S(GATGAG)The average expression of the genes in the set S(GATGAG) (solid red line) compared to the genome-wide average expression (dashed green line) at the seven time points of the diauxic shift experiment. The expression data are the log2 of the ratio between mRNA levels at each timepoint and the initial mRNA level.
Figure 2
Figure 2
statistical significance of the set S(GATGAG) The statistical significance sig(i, w) as defined in Eq. (10) for the word w = GATGAG and timepoints i = 1,..., 7 in the diauxic shift experiment. The dashed line is the significance threshold |sig| = 6.
Figure 3
Figure 3
expression of the genes in the set S(ATAAGGG) Same as Fig. 1 for the genes in the set S(ATAAGGG), our new candidate regulatory motif.
Figure 4
Figure 4
statistical significance of the set S(ATAAGGG) Same as Fig. 2 for the genes in the set S(ATAAGGG).
Figure 5
Figure 5
expression as a function of occurrences of the word GGCTAAG The average expression of genes presenting n occurrences of the word GGC-TAAG as a function of n in the 14 min. time point of the α-synchronized cell-cycle experiment of Spellmann et al., Ref. [17]. In parentheses is the number of genes with n occurrences of GGCTAAG in the upstream region. The horizontal line represents the average expression for the whole genome.
Figure 6
Figure 6
expression as a function of occurrences of the word AAAATTT Same as Fig. 5 for AAAATTT.

References

    1. DeRisi JL, Iyer VR, Brown PO. Exploring the metabolic and genetic control of gene expression on a genomic scale. Science. 1997;278:680–686. doi: 10.1126/science.278.5338.680. http://cmgm.stanford.edu/pbrown/explore/ - DOI - PubMed
    1. van Helden J, André B, Collado-Vides J. Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. J Mol Biol. 1998;281:827–842. doi: 10.1006/jmbi.1998.1947. - DOI - PubMed
    1. Wagner A. A computational genomics approach to the identification of gene networks. Nucleic Acids Research. 1997;25:3594–3604. doi: 10.1093/nar/25.18.3594. - DOI - PMC - PubMed
    1. Tavazoie S, Hughes JD, Campbell MJ, Cho RJ, Church GM. Systematic determination of genetic network architecture. Nature Genetics. 1999;22:281–285. doi: 10.1038/10343. - DOI - PubMed
    1. Bussemaker HJ, Li H, Siggia ED. Regulatory element detection using correlation with expression. Nature Genetics. 2001;27:167–171. doi: 10.1038/84792. - DOI - PubMed

Publication types

MeSH terms

LinkOut - more resources