Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Jan;16(1):26-42.
doi: 10.1261/rna.1689910. Epub 2009 Nov 20.

Improved free energy parameters for RNA pseudoknotted secondary structure prediction

Affiliations

Improved free energy parameters for RNA pseudoknotted secondary structure prediction

Mirela S Andronescu et al. RNA. 2010 Jan.

Abstract

Accurate prediction of RNA pseudoknotted secondary structures from the base sequence is a challenging computational problem. Since prediction algorithms rely on thermodynamic energy models to identify low-energy structures, prediction accuracy relies in large part on the quality of free energy change parameters. In this work, we use our earlier constraint generation and Boltzmann likelihood parameter estimation methods to obtain new energy parameters for two energy models for secondary structures with pseudoknots, namely, the Dirks-Pierce (DP) and the Cao-Chen (CC) models. To train our parameters, and also to test their accuracy, we create a large data set of both pseudoknotted and pseudoknot-free secondary structures. In addition to structural data our training data set also includes thermodynamic data, for which experimentally determined free energy changes are available for sequences and their reference structures. When incorporated into the HotKnots prediction algorithm, our new parameters result in significantly improved secondary structure prediction on our test data set. Specifically, the prediction accuracy when using our new parameters improves from 68% to 79% for the DP model, and from 70% to 77% for the CC model.

PubMed Disclaimer

Figures

FIGURE 1.
FIGURE 1.
Examples of simple pseudoknots. The structures have been drawn with the visualization web service Pseudoviewer (Byun and Han 2006). (A) An H-type pseudoknot with two bands (pseudoknotted stems). The unpaired bases form a pseudoloop. (B) A non-H-type pseudoknot with a nested closed region (the leftmost stem and the attached hairpin loop).
FIGURE 2.
FIGURE 2.
Trained parameters versus initial parameters for the MT and CC models. (A) MT model, correlation coefficient 0.786. All parameters appear in the training data. (B) CC entropy parameters, correlation coefficient 0.983. Only 188 out of 258 parameters appear in the training data. (C) Coaxial stacking free energy change parameters, correlation coefficient 0.718. Only 166 out of 288 parameters appear in the training data.
FIGURE 3.
FIGURE 3.
F-measures for every sequence in S-Test, as predicted by the initial DP parameters (DP03), the newly trained DP parameters (DP09), the initial CC parameters (CC06), and the newly trained CC parameters (CC09). (A) DP03 versus DP09. Correlation coefficient 0.542. DP09 is better than DP03 in 49% of the cases, equal in 34% of the cases, and worse in 17% of the cases. (B) CC06 versus CC09. Correlation coefficient 0.552. CC09 is better than CC06 in 46% of the cases, equal in 35% of the cases, and worse in 20% of the cases. (C) CC09 versus DP09. Correlation coefficient 0.921. DP09 is better than CC09 in 5% of the cases, equal in 93% of the cases, and worse in 2% of the cases.
FIGURE 4.
FIGURE 4.
Comparison of DP03 and DP09 predictions on the GaLV pseudoknot. (A) Reference structure, which is also the structure predicted using the DP09 parameters. (B) Structure predicted using the DP03 parameters. This structure is unnecessarily complex, and has an F-measure of 0.471.
FIGURE 5.
FIGURE 5.
The human telomerase RNA pseudoknot (wild type).

References

    1. Aalberts DP, Hodas NO. Asymmetry in RNA pseudoknots: Observation and theory. Nucleic Acids Res. 2005;33:2210–2214. - PMC - PubMed
    1. Andersen ES, Rosenblad MA, Larsen N, Westergaard JC, Burks J, Wower IK, Wower J, Gorodkin J, Samuelsson T, Zwieb C. The tmRDB and SRPDB resources. Nucleic Acids Res. 2006;34:163–168. - PMC - PubMed
    1. Andronescu M. University of British Columbia; Vancouver, Canada: 2003. “Algorithms for predicting the secondary structure of pairs and combinatorial sets of nucleic acid strands.”. MS thesis,
    1. Andronescu MS. University of British Columbia; Vancouver, Canada: 2008. “Computational approaches for RNA energy parameter estimation.”. PhD thesis, - PubMed
    1. Andronescu M, Condon A, Hoos HH, Mathews DH, Murphy KP. Efficient parameter estimation for RNA secondary structure prediction. Bioinformatics. 2007;23:19–28. - PubMed

Publication types

LinkOut - more resources