Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Nov 11:9:474.
doi: 10.1186/1471-2105-9-474.

RNAalifold: improved consensus structure prediction for RNA alignments

Affiliations

RNAalifold: improved consensus structure prediction for RNA alignments

Stephan H Bernhart et al. BMC Bioinformatics. .

Abstract

Background: The prediction of a consensus structure for a set of related RNAs is an important first step for subsequent analyses. RNAalifold, which computes the minimum energy structure that is simultaneously formed by a set of aligned sequences, is one of the oldest and most widely used tools for this task. In recent years, several alternative approaches have been advocated, pointing to several shortcomings of the original RNAalifold approach.

Results: We show that the accuracy of RNAalifold predictions can be improved substantially by introducing a different, more rational handling of alignment gaps, and by replacing the rather simplistic model of covariance scoring with more sophisticated RIBOSUM-like scoring matrices. These improvements are achieved without compromising the computational efficiency of the algorithm. We show here that the new version of RNAalifold not only outperforms the old one, but also several other tools recently developed, on different datasets.

Conclusion: The new version of RNAalifold not only can replace the old one for almost any application but it is also competitive with other approaches including those based on SCFGs, maximum expected accuracy, or hierarchical nearest neighbor classifiers.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Possible results of treating gaps as bases. The consensus structure of the alignment in the middle is predicted once with gaps treated as if they were bases (old), and once by removing them before computing the energies (new). The predicted structures (highlighted in red) are shown to the left. As can be seen in 1, sequence 1 can form a perfect hairpin. In 2, the sterically impossible hairpin for the other two sequences is shown. Two of the three sequences cannot form the predicted structure. On the other hand, the new version of RNAalifold predicts a stem that has a bulge (3), but only in one sequence, the other two sequences can form the perfect stem shown in 4.
Figure 2
Figure 2
MCC on the CMfinder-SARSE dataset as a function of the β and δ parameters. It can be seen that except for β = 1.0, using RIBOSUM Matrices improves the performance of the new RNAalifold, which is in turn always better than the 2002 (old) variant. Furthermore, for the RIBOSUM variant, the size of the plateau, i.e. the subset of parameters with a MCC ≥ 0.93 is quite big, containing 36 of 100 combinations of parameters (80 are ≥ 0.9, 21 are ≥ 0.935 and 6 are 0.937). Top: 3d-plot of the MCC against the parameters β and δ. Bottom: Vertical section along the diagonals β = δ and δ + β = 1.1.
Figure 3
Figure 3
Dependence of RNAalifold on the weights β and δ.A: For all three RNAalifold variants, the accuracy of the structure prediction, measured here as MCC for the CMfinder-SARSE dataset (Table 1), depends on the weight β of the covariance term (δ = 0.6). B: The AUC value for the SCI computation also depends strongly on the values of β and δ. The green square indicates the optimal parameters (β = 1.55, δ = 0.6), the red dot is the default (1, 1). As the default is close to the maximum, there is little room for improvement.
Figure 4
Figure 4
Time series for the old, new and RIBOSUM RNAalifold variants.A: Folding different alignments with 4 sequences and different lengths. B: Folding a different number of random sequences from the same alignment (1716 nt).

References

    1. The ENCODE Project Consortium Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature. 2007;447:799–816. doi: 10.1038/nature05874. - DOI - PMC - PubMed
    1. The FANTOM Consortium The Transcriptional Landscape of the Mammalian Genome. Science. 2005;309:1159–1563. doi: 10.1126/science.1112014. - DOI - PubMed
    1. The Athanasius F Bompfünewerer RNA Consortium RNAs Everywhere: Genome-Wide Annotation of Structured RNAs. J Exp Zool B Mol Dev Evol. 2007;308B:1–25. doi: 10.1002/jez.b.21130. - DOI - PubMed
    1. Hofacker IL, Fekete M, Stadler PF. Secondary Structure Prediction for Aligned RNA Sequences. J Mol Biol. 2002;319:1059–1066. doi: 10.1016/S0022-2836(02)00308-X. - DOI - PubMed
    1. Sankoff D. Simultaneous solution of the RNA folding, alignment, and proto-sequence problems. SIAM J Appl Math. 1985;45:810–825. doi: 10.1137/0145048. - DOI

Publication types