. 2004 Jun 15;32(10):3258-69.

doi: 10.1093/nar/gkh650. Print 2004.

RNAProfile: an algorithm for finding conserved secondary structure motifs in unaligned RNA sequences

Giulio Pavesi¹, Giancarlo Mauri, Marco Stefani, Graziano Pesole

Affiliations

PMID: 15199174
PMCID: PMC434454
DOI: 10.1093/nar/gkh650

RNAProfile: an algorithm for finding conserved secondary structure motifs in unaligned RNA sequences

Giulio Pavesi et al. Nucleic Acids Res. 2004.

. 2004 Jun 15;32(10):3258-69.

doi: 10.1093/nar/gkh650. Print 2004.

Authors

Giulio Pavesi¹, Giancarlo Mauri, Marco Stefani, Graziano Pesole

Affiliation

¹ Department of Computer Science and Communication-(D.I.Co.), University of Milan, Via Comelico 39, 20135 Milan, Italy.

PMID: 15199174
PMCID: PMC434454
DOI: 10.1093/nar/gkh650

Abstract

The recent interest sparked due to the discovery of a variety of functions for non-coding RNA molecules has highlighted the need for suitable tools for the analysis and the comparison of RNA sequences. Many trans-acting non-coding RNA genes and cis-acting RNA regulatory elements present motifs, conserved both in structure and sequence, that can be hardly detected by primary sequence analysis alone. We present an algorithm that takes as input a set of unaligned RNA sequences expected to share a common motif, and outputs the regions that are most conserved throughout the sequences, according to a similarity measure that takes into account both the sequence of the regions and the secondary structure they can form according to base-pairing and thermodynamic rules. Only a single parameter is needed as input, which denotes the number of distinct hairpins the motif has to contain. No further constraints on the size, number and position of the single elements comprising the motif are required. The algorithm can be split into two parts: first, it extracts from each input sequence a set of candidate regions whose predicted optimal secondary structure contains the number of hairpins given as input. Then, the regions selected are compared with each other to find the groups of most similar ones, formed by a region taken from each sequence. To avoid exhaustive enumeration of the search space and to reduce the execution time, a greedy heuristic is introduced for this task. We present different experiments, which show that the algorithm is capable of characterizing and discovering known regulatory motifs in mRNA like the iron responsive element (IRE) and selenocysteine insertion sequence (SECIS) stem-loop structures. We also show how it can be applied to corrupted datasets in which a motif does not appear in all the input sequences, as well as to the discovery of more complex motifs in the non-coding RNA.

PubMed Disclaimer

Figures

**Figure 1**
Schematic diagram of the structure of the algorithm.

**Figure 2**
The two canonical forms of the IRE.

**Figure 3**
Highest scoring motif occurrences output by RNAProfile on the IRE dataset with their respective energy and fitness value. Note that the last three regions (reported in pseudogenes) have a much lower fitness value, thus very unlikely to be real IRE instances.

**Figure 4**
The results on the atypical IRE dataset. The first four instances, corresponding to the IREs, were included in the highest-scoring profile of four regions (and also the best two and three regions groups contained IRE instances). In the following two iterations, the best profile still contained the same four regions, plus the two shown at the bottom of the figure, unlikely to be IRE instances given the low fitness value.

**Figure 5**
Secondary structure models of SECIS stem–loop elements: type I (left) and type II (right).

**Figure 6**
Highest scoring motif occurrences output by RNAProfile on the GPX4 dataset. The last region, with a significantly low fitness value, comes from a non-selenoprotein 3′-UTR.

**Figure 7**
Highest scoring motif occurrences in the GPX4 sequences combined with the selenoprotein M dataset.

**Figure 8**
Highest scoring motif occurrences output by RNAProfile on the *Drosophila* nanos dataset.

**Figure 9**
Highest scoring motif occurrences output by RNAProfile on the RNAse P RNA dataset, corresponding to consensus helices 8 and 9 (see also Figure 10).

**Figure 10**
Secondary structure of human RNase P RNA [structure adapted from (4)]. Numbered helices form the consensus secondary structure of the molecule, conserved (despite no sequence conservation) in bacteria throughout eukaryotes.

See this image and copyright information in PMC

Cited by

RScan: fast searching structural similarities for structured RNAs in large databases.
Xue C, Liu GP. Xue C, et al. BMC Genomics. 2007 Jul 31;8:257. doi: 10.1186/1471-2164-8-257. BMC Genomics. 2007. PMID: 17663795 Free PMC article.
Can Clustal-style progressive pairwise alignment of multiple sequences be used in RNA secondary structure prediction?
Bellamy-Royds AB, Turcotte M. Bellamy-Royds AB, et al. BMC Bioinformatics. 2007 Jun 8;8:190. doi: 10.1186/1471-2105-8-190. BMC Bioinformatics. 2007. PMID: 17559658 Free PMC article.
Identification of consensus RNA secondary structures using suffix arrays.
Anwar M, Nguyen T, Turcotte M. Anwar M, et al. BMC Bioinformatics. 2006 May 5;7:244. doi: 10.1186/1471-2105-7-244. BMC Bioinformatics. 2006. PMID: 16677380 Free PMC article.
Identification of sequence-structure RNA binding motifs for SELEX-derived aptamers.
Hoinka J, Zotenko E, Friedman A, Sauna ZE, Przytycka TM. Hoinka J, et al. Bioinformatics. 2012 Jun 15;28(12):i215-23. doi: 10.1093/bioinformatics/bts210. Bioinformatics. 2012. PMID: 22689764 Free PMC article.
RNA motif discovery: a computational overview.
Achar A, Sætrom P. Achar A, et al. Biol Direct. 2015 Oct 9;10:61. doi: 10.1186/s13062-015-0090-5. Biol Direct. 2015. PMID: 26453353 Free PMC article. Review.

See all "Cited by" articles

References

1. Bonnal S., Boutonnet,C., Prado-Lourenco,L. and Vagner,S. (2003) IRESdb: the Internal Ribosome Entry Site database. Nucleic Acids Res., 31, 427–428. - PMC - PubMed
1. Brown J.W. (1999) The Ribonuclease P Database. Nucleic Acids Res., 27, 314. - PMC - PubMed
1. Brown J.W., Echeverria,M., Qu,L.H., Lowe,T.M., Bachellerie,J.P., Huttenhofer,A., Kastenmayer,J.P., Green,P.J., Shaw,P. and Marshall,D.F. (2003) Plant snoRNA database. Nucleic Acids Res., 31, 432–435. - PMC - PubMed
1. Griffiths-Jones S., Bateman,A., Marshall,M., Khanna,A. and Eddy,S.R. (2003) Rfam: an RNA family database. Nucleic Acids Res., 31, 439–441. - PMC - PubMed
1. Rosenblad M.A., Gorodkin,J., Knudsen,B., Zwieb,C. and Samuelsson,T. (2003) SRPDB: Signal Recognition Particle Database. Nucleic Acids Res., 31, 363–364. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

GP0101Y01/TI_/Telethon/Italy

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Molecular Biology Databases
- FlyBase
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

RNAProfile: an algorithm for finding conserved secondary structure motifs in unaligned RNA sequences

Affiliation

RNAProfile: an algorithm for finding conserved secondary structure motifs in unaligned RNA sequences

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases