Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2016 Apr 23:11:7.
doi: 10.1186/s13015-016-0071-y. eCollection 2016.

Sparse RNA folding revisited: space-efficient minimum free energy structure prediction

Affiliations
Review

Sparse RNA folding revisited: space-efficient minimum free energy structure prediction

Sebastian Will et al. Algorithms Mol Biol. .

Abstract

Background: RNA secondary structure prediction by energy minimization is the central computational tool for the analysis of structural non-coding RNAs and their interactions. Sparsification has been successfully applied to improve the time efficiency of various structure prediction algorithms while guaranteeing the same result; however, for many such folding problems, space efficiency is of even greater concern, particularly for long RNA sequences. So far, space-efficient sparsified RNA folding with fold reconstruction was solved only for simple base-pair-based pseudo-energy models.

Results: Here, we revisit the problem of space-efficient free energy minimization. Whereas the space-efficient minimization of the free energy has been sketched before, the reconstruction of the optimum structure has not even been discussed. We show that this reconstruction is not possible in trivial extension of the method for simple energy models. Then, we present the time- and space-efficient sparsified free energy minimization algorithm SparseMFEFold that guarantees MFE structure prediction. In particular, this novel algorithm provides efficient fold reconstruction based on dynamically garbage-collected trace arrows. The complexity of our algorithm depends on two parameters, the number of candidates Z and the number of trace arrows T; both are bounded by [Formula: see text], but are typically much smaller. The time complexity of RNA folding is reduced from [Formula: see text] to [Formula: see text]; the space complexity, from [Formula: see text] to [Formula: see text]. Our empirical results show more than 80 % space savings over RNAfold [Vienna RNA package] on the long RNAs from the RNA STRAND database (≥2500 bases).

Conclusions: The presented technique is intentionally generalizable to complex prediction algorithms; due to their high space demands, algorithms like pseudoknot prediction and RNA-RNA-interaction prediction are expected to profit even stronger than "standard" MFE folding. SparseMFEFold is free software, available at http://www.bioinf.uni-leipzig.de/~will/Software/SparseMFEFold.

Keywords: Pseudoknot-free RNA folding; RNA secondary structure prediction; Space efficient sparsification.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Example of a pseudoknot-free secondary structure a in 2D-graphical layout and b as linear arc diagram. Both representations show the backbone connections of bases in sequence and connect pairing bases
Fig. 2
Fig. 2
a Graphical representation of the sparse bp-based energy minimization recursions. A minimum energy general substructure (L lined pattern) over region [ij] is a closed structure (Lc solid arcs) or it is partitionable into two substructures (L^p dotted arcs). Sparsification restricts the minimization over the partitions in the second row to consider only candidates [k,j] for the second fragment. b Justification of the candidate criterion for sparse bp-based energy minimization according to the recursion of subfigure a Candidates are defined as regions [ij] where Lc(i,j)<L^p(i,j)
Fig. 3
Fig. 3
Illustration of bp-based folding and traceback. a forward evaluation of the recursions by DP. b fold reconstruction by traceback
Fig. 4
Fig. 4
Ideas of the space-efficient backtracking procedure and its requirements. We illustrate concepts sketching the upper-triangular V-matrix in different algorithm phases. The dark-grey lower triangle is unused. The stored rows i..i+M are shown as light-grey area; the light-grey boxes represent the candidates. a Evaluation phase of sparse MFE folding without space-efficient trace-back; the algorithm stores the last M+1 rows and the candidates. b Space-efficient trace-back (red arrow) in the basic algorithm fails, since trace continuations cannot be efficiently retrieved in general; eventually, the trace reaches a non-candidate entry (filled circle), which is not in memory. c Naive sparse MFE folding with trace-back stores all trace arrows (black arrows) to candidates (boxes) and non-candidates (circles). d SparseMFEFold removes arrows to candidates and applies garbage collection to fundamentally reduce the space requirements
Fig. 5
Fig. 5
Time and space consumption in dependency of sequence length. The plot illustrates results for all RNA STRAND instances with single molecule folds. We compare RNAfold, SparseMFEFold, and—to show the effect of garbage collection—SparseMFEFold without this feature. The best fitted curves are shown as dashed lines
Fig. 6
Fig. 6
Sparsification of true and shuffled RNAs. We compare the time and space consumption of SparseMFEFold for RNA STRAND instances versus di-nucleotide shuffled RNA STRAND instances. The dashed lines show the identity (black) and linear fit (orange), which are almost indistinguishable

Similar articles

Cited by

References

    1. Andronescu M, Bereg V, Hoos HH, Condon A. RNA STRAND: The RNA secondary structure and statistical analysis database. BMC Bioinform. 2008;9(1):1. doi: 10.1186/1471-2105-9-340. - DOI - PMC - PubMed
    1. Backofen R, Tsur D, Zakov S, Ziv-Ukelson M. Sparse RNA folding: time and space efficient algorithms. J Discret Algorithms. 2011;9(1):12–31. doi: 10.1016/j.jda.2010.09.001. - DOI
    1. Dennis C. The brave new world of RNA. Nature. 2002;418(6894):122–124. doi: 10.1038/418122a. - DOI - PubMed
    1. Dimitrieva S, Bucher P. Practicality and time complexity of a sparsified RNA folding algorithm. J Bioinform Comput Biol. 2012;10(2):1241007. doi: 10.1142/S0219720012410077. - DOI - PubMed
    1. Hale BJ, Yang CX, Ross JW. Small RNA regulation of reproductive function. Mol Reprod Dev. 2014;81(2):148–159. doi: 10.1002/mrd.22272. - DOI - PubMed

LinkOut - more resources