Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation

Inferring gene expression from ribosomal promoter sequences, a crowdsourcing approach

Pablo Meyer et al. Genome Res. 2013 Nov.

Abstract

The Gene Promoter Expression Prediction challenge consisted of predicting gene expression from promoter sequences in a previously unknown experimentally generated data set. The challenge was presented to the community in the framework of the sixth Dialogue for Reverse Engineering Assessments and Methods (DREAM6), a community effort to evaluate the status of systems biology modeling methodologies. Nucleotide-specific promoter activity was obtained by measuring fluorescence from promoter sequences fused upstream of a gene for yellow fluorescence protein and inserted in the same genomic site of yeast Saccharomyces cerevisiae. Twenty-one teams submitted results predicting the expression levels of 53 different promoters from yeast ribosomal protein genes. Analysis of participant predictions shows that accurate values for low-expressed and mutated promoters were difficult to obtain, although in the latter case, only when the mutation induced a large change in promoter activity compared to the wild-type sequence. As in previous DREAM challenges, we found that aggregation of participant predictions provided robust results, but did not fare better than the three best algorithms. Finally, this study not only provides a benchmark for the assessment of methods predicting activity of a specific set of promoters from their sequence, but it also shows that the top performing algorithm, which used machine-learning approaches, can be improved by the addition of biological features such as transcription factor binding sites.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Overview of the experimental system and results. (A) Illustration of the master strain into which we integrated all the tested promoters. At a fixed chromosomal location, the master strain contains a gene that encodes a red fluorescent protein (mCherry), followed by the promoter for TEF2, and a gene that encodes for a yellow fluorescent protein (YFP). Every tested promoter is integrated into this strain, together with a selection marker, between the TEF2 promoter and the YFP gene. (B) Strains with different promoters have highly similar growth rates. Shown is the growth of 71 different promoter strains, measured as optical density (OD). Measurements were obtained from a single 96-well plate, with glucose-rich media and a small number of cells from each strain inserted into each well at time zero. The exponential growth phase is indicated (vertical dashed gray lines). (C) Same as B, but where the measurements correspond to mCherry intensity. Note the small variability in the intensity of mCherry, which is driven by the same control promoter across the different strains. (D) Same as C, but where the measurements correspond to YFP intensity. Note the large variability in the intensity of YFP, which is driven by a different promoter in each strain. (Adapted with permission from Zeevi et al. [2011].) (E) Black line shows the scores from different participating teams plotted in descending order, and red line shows scores of aggregated teams starting with the score obtained from averaging the prediction results of the two best-performing teams, followed by the three best-performing teams, and so on until all 21 teams are included. The stand-alone dot represents the post-hoc model combining SVM and biological features.
Figure 2.
Figure 2.
Analysis of promoter prediction results. (A) Promoters are ordered by increasing formula image, where formula image is the predicted value of promoter i and participant p = 1,2…21 , and formula image is the measured value for promoter i = 1,2…53. Green dots represent the 30 best predictions, and red dots the 23 worst predictions. Empty dots represent the 20 wild-type promoters; full dots represent the 33 mutated promoters. (B) The Pearson correlation of each of the participating teams is shown in green dots for the best predictions and in red dots for the worst predictions as defined in A. Teams are ordered by rank based on their final score. (C) For each promoter, χi is plotted in logarithmic scale against the promoter activity value. Empty dots represent wild-type promoters and full dots mutant promoters.
Figure 3.
Figure 3.
Analysis of prediction results for mutated promoters. (A) Promoters were divided into two groups depending on whether they were wild type (empty dots) or contained mutations (full dots) and plotted according to formula image, where formula image is the predicted value of promoter i and participant p = 1,2…21, and formula image is the measured value for promoter i = 1,2…53. (B) Mutant promoter expression values were grouped according to the nature of the mutation and ordered by mean formula image value for each group. The six groups consist of mutations of TATA boxes (Δtata), of binding sites for Fhl1 (Δfhl1) and Sfp1 (Δsfp1), mutations to nucleosome disfavoring sequences (ΔNucDisf), random mutations (Random), and finally, sequences mutated intentionally with additional random mutations (Addition). The formula image value for each promoter is indicated by full dots; the mean value of formula image for each of the six grouped mutations is indicated by a thick bar. (C) For each mutated promoter i, formula image is plotted as a function of the percentage of expression value change induced in the wild-type promoter by the mutation. The vertical scale is logarithmic.

References

    1. Badis G, Chan ET, van Bakel H, Pena-Castillo L, Tillo D, Tsui K, Carlson CD, Gossett AJ, Hasinoff MJ, Warren CL, et al. 2008. A library of yeast transcription factor motifs reveals a widespread function for Rsc3 in targeting nucleosome exclusion at promoters. Mol Cell 32: 878–887 - PMC - PubMed
    1. Basehoar AD, Zanton SJ, Pugh BF 2004. Identification and distinct regulation of yeast TATA box-containing genes. Cell 116: 699–709 - PubMed
    1. Beer MA, Tavazoie S 2004. Predicting gene expression from sequence. Cell 117: 185–198 - PubMed
    1. Gertz J, Cohen BA 2009. Environment-specific combinatorial cis-regulation in synthetic promoters. Mol Syst Biol 5: 244. - PMC - PubMed
    1. Gietz RD, Schiestl RH 2007. Microtiter plate transformation using the LiAc/SS carrier DNA/PEG method. Nat Protoc 2: 5–8 - PubMed

MeSH terms

Substances

LinkOut - more resources