Using RNA secondary structures to guide sequence motif finding towards single-stranded regions

Michael Hiller¹, Rainer Pudimat, Anke Busch, Rolf Backofen

Affiliations

PMID: 16987907
PMCID: PMC1903381
DOI: 10.1093/nar/gkl544

Using RNA secondary structures to guide sequence motif finding towards single-stranded regions

Michael Hiller et al. Nucleic Acids Res. 2006.

. 2006;34(17):e117.

doi: 10.1093/nar/gkl544. Epub 2006 Sep 20.

Authors

Michael Hiller¹, Rainer Pudimat, Anke Busch, Rolf Backofen

Affiliation

¹ Institute of Computer Science, Chair for Bioinformatics, Albert-Ludwigs-University Freiburg, Georges-Koehler-Allee 106, 79110 Freiburg, Germany.

PMID: 16987907
PMCID: PMC1903381
DOI: 10.1093/nar/gkl544

Abstract

RNA binding proteins recognize RNA targets in a sequence specific manner. Apart from the sequence, the secondary structure context of the binding site also affects the binding affinity. Binding sites are often located in single-stranded RNA regions and it was shown that the sequestration of a binding motif in a double-strand abolishes protein binding. Thus, it is desirable to include knowledge about RNA secondary structures when searching for the binding motif of a protein. We present the approach MEMERIS for searching sequence motifs in a set of RNA sequences and simultaneously integrating information about secondary structures. To abstract from specific structural elements, we precompute position-specific values measuring the single-strandedness of all substrings of an RNA sequence. These values are used as prior knowledge about the motif starts to guide the motif search. Extensive tests with artificial and biological data demonstrate that MEMERIS is able to identify motifs in single-stranded regions even if a stronger motif located in double-strand parts exists. The discovered motif occurrences in biological datasets mostly coincide with known protein-binding sites. This algorithm can be used for finding the binding motif of single-stranded RNA-binding proteins in SELEX or other biological sequence data.

PubMed Disclaimer

Figures

**Figure 3**
Effect of varying the pseudocount. The figure shows the information content of the motif matrix found by MEMERIS in bits (black curve) and its average single-strandedness (average PU values of all motif occurrences, blue curve) for pseudocounts from 0 to 0.5 in steps of 0.01. Test set 5 that contain sequences with only one dsMotif (10.6 bits, average single-strandedness 0.003) was used. This motif is found by MEMERIS for a pseudocount greater than 0.22. In general, the lower the pseudocount, the higher is the average single-strandedness.

**Figure 4**
Comparison of MEME and MEMERIS for test set 8 (testing the TCM model). The figure shows 20 sequences that contain ssMotifs (highlighted yellow) and/or dsMotifs (highlighted light blue). The optimal structure is shown below each sequence. Red and blue bars indicate the position of the motif occurrences found by MEMERIS and MEME, respectively. While MEMERIS detects all ssMotifs and no dsMotif leading to an information content of the motif matrix of 10.4 bits, MEME identifies a stronger motif (11.1 bits) but detects eight dsMotif occurrences. MEMERIS results are shown for PU values and a pseudocount of 0.01. The number of motif hits was set to 21 for MEME and MEMERIS.

**Figure 5**
Comparison of MEME and MEMERIS for the SELEX sequences of the Nova-1 protein. The figure shows the sequences and labels of the individual clones described in (6). The random oligonucleotides are in blue letters. The optimal secondary structure is shown below each sequence. The primer binding sites (black letters) were included in the RNA secondary structure prediction but not in the motif search. Yellow bars represent the TCAT and ACAT motifs identified in (6). Blue and green bars indicate the position of the motif hits found by MEME and MEMERIS, respectively. The motif matrix found by MEME has an information content of 7.6 bits, the MEMERIS motif matrix has 7.4 bits. MEME and MEMERIS were run with the TCM model and the number of motif hits was set to 33. MEMERIS results are shown for PU values and a pseudocount of 0.01.

**Figure 6**
Results of MEME and MEMERIS for the PIE Rfam (RF00460) dataset. The figure shows the consensus sequence and structure of the PIE RNA. The U1A protein binds the single-stranded sequences in the two asymmetrical internal loops in a cooperative manner (A). Using the OOPS model, MEME finds two motifs (14 and 13.3 bits, respectively) that do not overlap the real binding site (B) while MEMERIS finds the real upstream binding site exactly (11.8 bits) and the downstream site (10.5 bits) with a shift of one position. (C) Since both individual binding sites are very similar, we used the TCM model to search for a motif with two occurrences in each sequence. Again MEME finds a different motif (11.6 bits) (D) while MEMERIS detects the correct protein-binding sites (10.7 bits) (E). The known binding sites and the predicted motifs are highlighted in blue. The motif length was set to 7 nt. For MEMERIS, the PU values were used with a pseudocount of 0.01.

**Figure 7**
Results of MEME and MEMERIS for the TAR Rfam (RF00250) dataset. The figure shows the consensus sequence and structure of the TAR element. The hairpin loop is bound by the Tat protein (A). We searched for one binding site in each sequence (OOPS model) with MEME (B) and MEMERIS using PU values (C). MEME detects a motif (12 bits) that does not overlap the known binding site, while MEMERIS identifies the binding site, although the respective motif is noticeable weaker (10 bits). The known binding sites and the predicted motifs are highlighted in blue. The motif length was set to 6 nt. For MEMERIS, the PU values were used with a pseudocount of 0.01.

**Figure 8**
Results of MEME and MEMERIS for the SLDE Rfam (RF00183) dataset. The figure shows the consensus sequence and structure of the SLDE element. The hairpin loop of the essential third stem is bound by an unknown protein factor (A). MEME detects a CAG motif which does not overlap the binding site (B). In contrast, MEMERIS identifies the TAT sequence of the hairpin loop as the motif (C). Both motif matrices have an information content of 6 bits. The known binding sites and the predicted motifs are highlighted in blue. The motif length was set to 3 nt. For MEMERIS, the PU values were used with a pseudocount of 0.01.

See this image and copyright information in PMC

References

1. Mignone F., Gissi C., Liuni S., Pesole G. Untranslated regions of mRNAs. Genome Biol. 2002;3 - PMC - PubMed
1. Messias A.C., Sattler M. Structural basis of single-stranded RNA recognition. Acc. Chem. Res. 2004;37:279–287. - PubMed
1. Hall K.B. RNA-protein interactions. Curr. Opin. Struct. Biol. 2002;12:283–288. - PubMed
1. Hori T., Taguchi Y., Uesugi S., Kurihara Y. The RNA ligands for mouse proline-rich RNA-binding protein (mouse Prrp) contain two consensus sequences in separate loop structure. Nucleic Acids Res. 2005;33:190–200. - PMC - PubMed
1. Thisted T., Lyakhov D.L., Liebhaber S.A. Optimized RNA targets of two closely related triple KH domain proteins, heterogeneous nuclear ribonucleoprotein K and alphaCP-2KL, suggest distinct modes of RNA recognition. J. Biol. Chem. 2001;276:17484–17496. - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Using RNA secondary structures to guide sequence motif finding towards single-stranded regions

Affiliation

Using RNA secondary structures to guide sequence motif finding towards single-stranded regions

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources