Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009:5:268.
doi: 10.1038/msb.2009.24. Epub 2009 Apr 28.

Discovering structural cis-regulatory elements by modeling the behaviors of mRNAs

Affiliations

Discovering structural cis-regulatory elements by modeling the behaviors of mRNAs

Barrett C Foat et al. Mol Syst Biol. 2009.

Abstract

Gene expression is regulated at each step from chromatin remodeling through translation and degradation. Several known RNA-binding regulatory proteins interact with specific RNA secondary structures in addition to specific nucleotides. To provide a more comprehensive understanding of the regulation of gene expression, we developed an integrative computational approach that leverages functional genomics data and nucleotide sequences to discover RNA secondary structure-defined cis-regulatory elements (SCREs). We applied our structural cis-regulatory element detector (StructRED) to microarray and mRNA sequence data from Saccharomyces cerevisiae, Drosophila melanogaster, and Homo sapiens. We recovered the known specificities of Vts1p in yeast and Smaug in flies. In addition, we discovered six putative SCREs in flies and three in humans. We characterized the SCREs based on their condition-specific regulatory influences, the annotation of the transcripts that contain them, and their locations within transcripts. Overall, we show that modeling functional genomics data in terms of combined RNA structure and sequence motifs is an effective method for discovering the specificities and regulatory roles of RNA-binding proteins.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no conflict of interest.

Figures

Figure 1
Figure 1
The flow of data for StructRED. As input, StructRED takes one or more multicolumn tables of microarray data and one or more associated FASTA sequence files containing mRNA sequences for the spots on the arrays. The other major input to StructRED is a motif model that defines the search space in which explanatory SCREs can exist. StructRED fits a simple physical model using each motif in the search space to identify the SCREs that best explain the observed microarray measurements. The outputs of StructRED are SCRE matrices and their correlations with each experiment in the input set, called trans-factor activity profiles (TFAPs), as they reflect changes in the activities of the regulators that bind to the SCREs.
Figure 2
Figure 2
Vts1p and Smaug specificities. Vts1p and Smaug (Smg) are sterile alpha motif (SAM)-containing RNA-binding proteins in Saccharomyces cerevisiae and Drosophila melanogaster, respectively. These proteins are known to bind stem–loop RNA motifs with loops of four or five nucleotides. The structural logos shown here were discovered using StructRED on relevant microarray data and mRNA sequences without any prior information. The specificities of the SAM-containing proteins are in good agreement with each other and with the known specificities of these proteins. Here, the length four loops are represented by length six loops with a bias toward G-C on the first and last nucleotides of the loop, perhaps indicating a need for a strong G-C pair to close the shorter loop.
Figure 3
Figure 3
Vts1p and Smaug activities. Each square represents the strength of the correlation between genome-wide occurrences of a SCRE and genome-wide mRNA measurements for a particular microarray experiment. Yellow represents a positive correlation and blue represents a negative correlation. An absolute t-value of about 6.7 corresponds to a P-value of 0.01, when strictly correcting for the number of motifs tested. (A) The Vts1p specificities for the length four loop (Vts1–4) and length five loop (Vts1–5) were discovered using microarray data measured mRNA association with Vts1p in a pull-down experiment in four trials (Aviv et al, 2006b). (B) The Smaug specificities for the length four (Smg-4) and length five (Smg-5) loops were discovered using mRNA expression microarray data performed over Drosophila melanogaster embryonic development. The first two time courses measured the first 6 h of development in Δsmg and wild-type (WT) activated eggs (Tadros et al, 2007). The third time course (Pilot et al, 2006) compares the slow phase (T1), fast phase (T2), cellularization and beginning gastrulation (T3), and end of gastrulation (T4) to embryos before zygotic transcription begins in wild-type (WT) embryos. (C) Occurrences of the Smg-4 and Smg-5 specificities also had strong negative correlations (corrected P-value <0.001) with ribosome association in the first 2 h of development (Qin et al, 2007). Triangles represent increasing density of sucrose gradient fractions, corresponding to increasing numbers of ribosomes.
Figure 4
Figure 4
Putative Drosophila structural cis-regulatory elements. (A) The structural logos of the six putative Drosophila SCREs. (B) Dm1, Dm2, Dm3, and Dm4 were detected using mRNA expression microarray data. Dm1 and Dm2 had strong negative correlations with mRNA levels over early Drosophila development. Dm1 and Dm2 did not correlate with mRNA levels in similarly treated Δsmg eggs (not shown). Dm3 and Dm4 correlated with mRNA levels changing between wild-type and Δkep1 flies (GEO accession GSE6086), suggesting that Dm3 and Dm4 may reflect the specificity of Kep1, an RNA-binding protein. (C) Dm5 and Dm6 were detected from microarray data measuring mRNA association with ribosomes in early drosophila development (Qin et al, 2007). Triangles represent increasing density of sucrose gradient fractions, corresponding to increasing numbers of ribosomes.
Figure 5
Figure 5
Explanatory structural cis-regulatory element content of mRNA regions. These trans-factor activity profiles (TFAPs) are for all of the Drosophila SCREs over all of the same conditions shown in Figures 3 and 4. However, these TFAPs display how well each SCRE explained the measured RNA levels when occurrences of the SCREs are only scored in the 5′ untranslated regions (UTRs), 3′ UTRs, coding sequences (CDS), or full-length mRNAs. Thus, by comparing each subsequence TFAP to the full-length mRNA TFAP, one can see in which region of mRNAs functional instances of the SCRE tend to exist. Most of the SCREs have their strongest signal in the CDSs, followed by the 3′ UTRs.
Figure 6
Figure 6
Explaining Smaug-independent mRNA degradation. Not all maternally deposited mRNAs degrade in a Smaug-dependent manner. (A) These are the logos of two single-stranded RNA motifs that had strong correlations with decreasing mRNA levels in early embryogenesis in Drosophila. The Dm7 motif likely represents the specificity of an unknown trans-factor. The Pum motif is likely the specificity of Pumilio. (B) Both of these motifs correlated with decreasing mRNA levels in Δsmg activated eggs.
Figure 7
Figure 7
Putative human structural cis-regulatory elements. (A) We discovered three putative SCREs when applying the StructRED algorithm to human data (Hs1–3). (B) Hs1 was discovered using data that measured ribosome association in a metastatic colorectal cancer cell line, SW620, versus a non-metastatic cell line, SW480 (Provenzani et al, 2006). Hs2 correlated with decreased mRNA levels in U937 cells that have been exposed to 12-myristate 13-acetate (PMA; Kitamura et al, 2004). Hs3 was discovered through a positive correlation with ribosome association in human mammary epithelial cells (with and without overexpressed translation initiation factor 4F, eIF4E; Larsson et al, 2007). (C) We did not discover a Smaug/Vts1p-like specificity from the human data. However, when the Drosophila Smaug specificities were scored against the human data, we observed significant correlations with the RBP pull-down microarray data for poly-pyrimidine tract binding protein (PTB; GEO accession GSE6021; Gama-Carvalho et al, 2006) and Staufen1 and Staufen2 (Furic et al, 2008; GEO accessions GSE8437, GSE8438), suggesting that hSmaug shares many target mRNAs with these RBPs.

Similar articles

Cited by

References

    1. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215: 403–410 - PubMed
    1. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25: 25–29 - PMC - PubMed
    1. Aviv T, Amborski AN, Zhao XS, Kwan JJ, Johnson PE, Sicheri F, Donaldson LW (2006a) The NMR and X-ray structures of the Saccharomyces cerevisiae Vts1 SAM domain define a surface for the recognition of RNA hairpins. J Mol Biol 356: 274–279 - PubMed
    1. Aviv T, Lin Z, Ben-Ari G, Smibert CA, Sicheri F (2006b) Sequence-specific recognition of RNA hairpins by the SAM domain of Vts1p. Nat Struct Mol Biol 13: 168–176 - PubMed
    1. Aviv T, Lin Z, Lau S, Rendl LM, Sicheri F, Smibert CA (2003) The RNA-binding SAM domain of Smaug defines a new family of post-transcriptional regulators. Nat Struct Biol 10: 614–621 - PubMed

Publication types

MeSH terms