. 2014 Dec 31;15(1):404.

doi: 10.1186/s12859-014-0404-0.

ExpaRNA-P: simultaneous exact pattern matching and folding of RNAs

Christina Otto¹, Mathias Möhl², Steffen Heyne³, Mika Amit⁴, Gad M Landau^{5

6}, Rolf Backofen^{7

8

9

10}, Sebastian Will^{11

12}

Affiliations

¹ Bioinformatics, Institute of Computer Science, University of Freiburg, Freiburg, Germany. schmiedc@informatik.uni-freiburg.de.
² Bioinformatics, Institute of Computer Science, University of Freiburg, Freiburg, Germany. mamo181@googlemail.com.
³ Max Planck Institute of Immunobiology and Epigenetics, Stuebeweg 51, Freiburg, 79108, Germany. heyne@informatik.uni-freiburg.de.
⁴ Department of Computer Science, University of Haifa, Mount Carmel, Haifa, Israel. mika.amit2@gmail.com.
⁵ Department of Computer Science, University of Haifa, Mount Carmel, Haifa, Israel. landau@univ.haifa.ac.il.
⁶ Department of Computer Science and Engineering, NYU-Poly, Brooklyn, NY, USA. landau@univ.haifa.ac.il.
⁷ Bioinformatics, Institute of Computer Science, University of Freiburg, Freiburg, Germany. backofen@informatik.uni-freiburg.de.
⁸ Center for Biological Signaling Studies (BIOSS), University of Freiburg, Freiburg, Germany. backofen@informatik.uni-freiburg.de.
⁹ Centre for Biological Systems Analysis (ZBSA), University of Freiburg, Freiburg, Germany. backofen@informatik.uni-freiburg.de.
¹⁰ Center for non-coding RNA in Technology and Health, University of Copenhagen, Grønnegårdsvej 3, Frederiksberg C, DK-1870, Denmark. backofen@informatik.uni-freiburg.de.
¹¹ Bioinformatics, Institute of Computer Science, University of Freiburg, Freiburg, Germany. swill@csail.mit.edu.
¹² Bioinformatics, Department of Computer Science, University of Leipzig, Leipzig, Germany. swill@csail.mit.edu.

PMID: 25551362
PMCID: PMC4302096
DOI: 10.1186/s12859-014-0404-0

ExpaRNA-P: simultaneous exact pattern matching and folding of RNAs

Christina Otto et al. BMC Bioinformatics. 2014.

. 2014 Dec 31;15(1):404.

doi: 10.1186/s12859-014-0404-0.

Authors

Christina Otto¹, Mathias Möhl², Steffen Heyne³, Mika Amit⁴, Gad M Landau^{5

6}, Rolf Backofen^{7

8

9

10}, Sebastian Will^{11

12}

Affiliations

¹ Bioinformatics, Institute of Computer Science, University of Freiburg, Freiburg, Germany. schmiedc@informatik.uni-freiburg.de.
² Bioinformatics, Institute of Computer Science, University of Freiburg, Freiburg, Germany. mamo181@googlemail.com.
³ Max Planck Institute of Immunobiology and Epigenetics, Stuebeweg 51, Freiburg, 79108, Germany. heyne@informatik.uni-freiburg.de.
⁴ Department of Computer Science, University of Haifa, Mount Carmel, Haifa, Israel. mika.amit2@gmail.com.
⁵ Department of Computer Science, University of Haifa, Mount Carmel, Haifa, Israel. landau@univ.haifa.ac.il.
⁶ Department of Computer Science and Engineering, NYU-Poly, Brooklyn, NY, USA. landau@univ.haifa.ac.il.
⁷ Bioinformatics, Institute of Computer Science, University of Freiburg, Freiburg, Germany. backofen@informatik.uni-freiburg.de.
⁸ Center for Biological Signaling Studies (BIOSS), University of Freiburg, Freiburg, Germany. backofen@informatik.uni-freiburg.de.
⁹ Centre for Biological Systems Analysis (ZBSA), University of Freiburg, Freiburg, Germany. backofen@informatik.uni-freiburg.de.
¹⁰ Center for non-coding RNA in Technology and Health, University of Copenhagen, Grønnegårdsvej 3, Frederiksberg C, DK-1870, Denmark. backofen@informatik.uni-freiburg.de.
¹¹ Bioinformatics, Institute of Computer Science, University of Freiburg, Freiburg, Germany. swill@csail.mit.edu.
¹² Bioinformatics, Department of Computer Science, University of Leipzig, Leipzig, Germany. swill@csail.mit.edu.

PMID: 25551362
PMCID: PMC4302096
DOI: 10.1186/s12859-014-0404-0

Abstract

Background: Identifying sequence-structure motifs common to two RNAs can speed up the comparison of structural RNAs substantially. The core algorithm of the existent approach ExpaRNA solves this problem for a priori known input structures. However, such structures are rarely known; moreover, predicting them computationally is no rescue, since single sequence structure prediction is highly unreliable.

Results: The novel algorithm ExpaRNA-P computes exactly matching sequence-structure motifs in entire Boltzmann-distributed structure ensembles of two RNAs; thereby we match and fold RNAs simultaneously, analogous to the well-known "simultaneous alignment and folding" of RNAs. While this implies much higher flexibility compared to ExpaRNA, ExpaRNA-P has the same very low complexity (quadratic in time and space), which is enabled by its novel structure ensemble-based sparsification. Furthermore, we devise a generalized chaining algorithm to compute compatible subsets of ExpaRNA-P's sequence-structure motifs. Resulting in the very fast RNA alignment approach ExpLoc-P, we utilize the best chain as anchor constraints for the sequence-structure alignment tool LocARNA. ExpLoc-P is benchmarked in several variants and versus state-of-the-art approaches. In particular, we formally introduce and evaluate strict and relaxed variants of the problem; the latter makes the approach sensitive to compensatory mutations. Across a benchmark set of typical non-coding RNAs, ExpLoc-P has similar accuracy to LocARNA but is four times faster (in both variants), while it achieves a speed-up over 30-fold for the longest benchmark sequences (≈400nt). Finally, different ExpLoc-P variants enable tailoring of the method to specific application scenarios. ExpaRNA-P and ExpLoc-P are distributed as part of the LocARNA package. The source code is freely available at http://www.bioinf.uni-freiburg.de/Software/ExpaRNA-P .

Conclusions: ExpaRNA-P's novel ensemble-based sparsification reduces its complexity to quadratic time and space. Thereby, ExpaRNA-P significantly speeds up sequence-structure alignment while maintaining the alignment quality. Different ExpaRNA-P variants support a wide range of applications.

PubMed Disclaimer

Figures

**Figure 1**
**The** **ExpLoc-P** **pipeline.** Using EPMs as anchor constraints to speed up RNA structure alignments.

**Figure 2**
**Visualization of the pattern matching definition.** Two different illustrations of the notion pattern matching are shown in **(A)** and **(B)**. For the light gray pattern matching, we have $ℳ = {2 \sim 2, 3 \sim 3, 9 \sim 13, 10 \sim 14, 11 \sim 15}$ and $S = {(2, 11 \sim 2, 15), (3, 10 \sim 3, 14)}$ . Note that the two separated regions in both sequences are connected through base pairs. Furthermore, the set of structure matches is $ℳ |_{S} = {2 \sim 2, 3 \sim 3, 10 \sim 14, 11 \sim 15}$ and the set of sequence matches is $ℳ ∖ ℳ |_{S} = {9 \sim 13}$ . The pattern matching can be extended by the base pair match shown in dark gray, i.e., $ℳ^{'} = ℳ \cup {1 \sim 1, 12 \sim 16}$ and $S^{'} = S \cup {(1, 12 \sim 1, 16)}$ . $(ℳ, S)$ , the EPM shown in light gray, is a strict EPM, whereas $(ℳ^{'}, S^{'})$ , the EPM extended by the dark gray part, is a relaxed EPM as mismatches in structure matches occur. **(C)** shows an example of an invalid matching. Separately, both the small and the big matched gray parts are valid EPMs, but together they do not form a valid EPM as the two individual parts are not connected.

**Figure 3**
**Visualization of maximal EPMs.** Matches of green bases refer to exact matches and red ones to inexact (structure) matches. **(A-C)** EPM A is not maximal since there exists a larger (strict) EPM **(B or C)**. EPMs B and C can be maximal simultaneously since in each case some base matches have different parents. **(A and D-F)** EPM D is generated from A by appending an inexact structure match and has a lower score than A. Further extending the EPM leads to higher scores again **(E and F)**. D is not maximal since A has the same parents and a higher score. A is not maximal because there exist (relaxed) EPMs E and F with the same parents and higher scores. Among A, D, E, and F, only F is maximal.

**Figure 4**
**Recursion equations.** Recursions for computing the significant strict EPMs and relaxed EPMs, respectively. These equations are visualized in Figure 5.

**Figure 5**
**Recursion visualization.** Visualization of the recursions to compute the matrix entries $L^{ijkl} (j^{'}, l^{'}), G_{A}^{ijkl} (j^{'}, l^{'}), G_{AB}^{ijkl} (j^{'}, l^{'}), L R^{ijkl} (j^{'}, l^{'}), D (ij, kl), F (j^{'}, l^{'})$ and the auxiliary matrix H ^ijkl(j ^′,l ^′).

**Figure 6**
**Comparison of** **ExpLoc-P** **variants.** **(A)** Alignment quality (SPS) vs. sequence identity. **(B)** Coverage vs. sequence identity. Dependencies are visualized as LOWESS curves.

**Figure 7**
**Comparison with sequence-structure alignment methods.** **(A)** Comparison of alignment quality vs. sequence identity. **(B)** Comparison of speedup over LocARNA vs. length (LOWESS).

See this image and copyright information in PMC

Cited by

The BRaliBase dent-a tale of benchmark design and interpretation.
Löwes B, Chauve C, Ponty Y, Giegerich R. Löwes B, et al. Brief Bioinform. 2017 Mar 1;18(2):306-311. doi: 10.1093/bib/bbw022. Brief Bioinform. 2017. PMID: 26984616 Free PMC article.
Fast and accurate structure probability estimation for simultaneous alignment and folding of RNAs with Markov chains.
Miladi M, Raden M, Will S, Backofen R. Miladi M, et al. Algorithms Mol Biol. 2020 Nov 13;15(1):19. doi: 10.1186/s13015-020-00179-w. Algorithms Mol Biol. 2020. PMID: 33292340 Free PMC article.
RNAscClust: clustering RNA sequences using structure conservation and graph based motifs.
Miladi M, Junge A, Costa F, Seemann SE, Havgaard JH, Gorodkin J, Backofen R. Miladi M, et al. Bioinformatics. 2017 Jul 15;33(14):2089-2096. doi: 10.1093/bioinformatics/btx114. Bioinformatics. 2017. PMID: 28334186 Free PMC article.
RNA motif search with data-driven element ordering.
Rampášek L, Jimenez RM, Lupták A, Vinař T, Brejová B. Rampášek L, et al. BMC Bioinformatics. 2016 May 18;17(1):216. doi: 10.1186/s12859-016-1074-x. BMC Bioinformatics. 2016. PMID: 27188396 Free PMC article.
Freiburg RNA tools: a central online resource for RNA-focused research and teaching.
Raden M, Ali SM, Alkhnbashi OS, Busch A, Costa F, Davis JA, Eggenhofer F, Gelhausen R, Georg J, Heyne S, Hiller M, Kundu K, Kleinkauf R, Lott SC, Mohamed MM, Mattheis A, Miladi M, Richter AS, Will S, Wolff J, Wright PR, Backofen R. Raden M, et al. Nucleic Acids Res. 2018 Jul 2;46(W1):W25-W29. doi: 10.1093/nar/gky329. Nucleic Acids Res. 2018. PMID: 29788132 Free PMC article.

See all "Cited by" articles

References

1. The FANTOM Consortium The transcriptional landscape of the mammalian genome. Science. 2005;309(5740):1559–63. doi: 10.1126/science.1112014. - DOI - PubMed
1. Cheng J, Kapranov P, Drenkow J, Dike S, Brubaker S, Patel S, Long J, Stern D, Tammana H, Helt G, Sementchenko V, Piccolboni A, Bekiranov S, Bailey DK, Ganesh M, Ghosh S, Bell I, Gerhard DS, Gingeras TR. Transcriptional maps of 10 human chromosomes at 5-nucleotide resolution. Science. 2005;308:1149–1154. doi: 10.1126/science.1108625. - DOI - PubMed
1. Bertone P, Stoc V, Royce TE, Rozowsky JS, Urban AE, Zhu X, Rinn JL, Tongprasit W, Samanta M, Weissman S, Gerstein M, Snyder M. Global identification of human transcribed sequences with genome tiling arrays. Science. 2004;306:2242–2246. doi: 10.1126/science.1103388. - DOI - PubMed
1. The ENCODE Project Consortium An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489(7414):57–74. doi: 10.1038/nature11247. - DOI - PMC - PubMed
1. Mattick JS, Taft RJ, Faulkner GJ. A global view of genomic information - moving beyond the gene and the master regulator. Trends Genet. 2010;26(1):21–8. doi: 10.1016/j.tig.2009.11.002. - DOI - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

ExpaRNA-P: simultaneous exact pattern matching and folding of RNAs

Affiliations

ExpaRNA-P: simultaneous exact pattern matching and folding of RNAs

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases