Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2019 Jun 1:162-163:60-67.
doi: 10.1016/j.ymeth.2019.04.003. Epub 2019 Apr 2.

How to benchmark RNA secondary structure prediction accuracy

Affiliations
Review

How to benchmark RNA secondary structure prediction accuracy

David H Mathews. Methods. .

Abstract

RNA secondary structure prediction is widely used. As new methods are developed, these are often benchmarked for accuracy against existing methods. This review discusses good practices for performing these benchmarks, including the choice of benchmarking structures, metrics to quantify accuracy, the importance of allowing flexibility for pairs in the accepted structure, and the importance of statistical testing for significance.

Keywords: Comparative sequence analysis; RNA folding.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
The confusion matrix for binary classification. Base pairs that are in the accepted structure and are predicted are true positives. Base pairs that are not in the accepted structure and also not in the prediction are true negatives. Base pairs that are in the accepted structure, but not predicted, are false negatives. Base pairs that are not in the accepted structure, but are predicted, are false positives. The true positives and true negatives are therefore correct predictions (labeled in green). The false negatives and false positives are incorrect predictions (labeled in red).
Figure 2.
Figure 2.
The comparison of folding free energy for two duplexes with a single U bulge. The top duplex has a single state. The second duplex has two states because the bulged U can migrate between two positions. The bulged U is assumed to not interrupt the helix, and therefore the two states in the second duplex have identical base pair stacks. In free energy, the entropic bonus for populating multiple states is [55, 69]: ΔGbonus=RT ln(number of states). For the lower duplex, the number of states populated by the bulge is 2, and the bonus is estimated as: ΔGbonus=(1.9859 cal K1 mol1) (310.15 K) ln(2)=0.43 kcal/mol, which is consistent with the observed difference of stability of −0.84 ± 0.64 kcal/mol [68, 97]. The average stability of all measured single nucleotide bulges in one state is 3.7 ± 0.2 kcal/mol at 37 °C [, –103]. The average stability of all measured single nucleotide bulges in two states, excluding C bulges that are known to have additional stability [55], is 2.0 ± 0.1 kcal/mol. Although this neglects sequence-specific affects, the average additional stability for a single nucleotide bulge that populates two states is therefore −1.7 ± 0.2 kcal/mol.
Figure 3.
Figure 3.
The two NMR-observed conformations for the HIV-1 stem-loop 1 [104]. The left is the major conformation and the right is the minor conformation. Interestingly, RNAstructure predicts the left structure as the lowest free energy structure and the right structure as the maximum expected accuracy (MEA) structure [56, 78, 105]. In this case, the fluctuating pairs are displaced by two nucleotides on one side (i.e. pairs from nucleotides 9 and 10 to nucleotides 25 and 26 switch to pairs to nucleotides 27 and 28).
Figure 4.
Figure 4.
Example output for CircleCompare. Here, the predicted lowest free energy structure [78] for the Arabidopsis thaliana 5S rRNA is compared against the accepted structure [39]. The sensitivity of the prediction is 91% and the PPV is 86%. A key in the lower left explains the color code for the pairs.

References

    1. Crick F, Central dogma of molecular biology, Nature 227 (1970) 561–3. - PubMed
    1. Eddy SR, Non-coding RNA genes and the modern RNA world, Nat. Rev 2 (2001) 919–929. - PubMed
    1. Doudna JA, Cech TR, The chemical repertoire of natural ribozymes, Nature 418 (2002) 222–228. - PubMed
    1. Bachellerie JP, Cavaille J, Huttenhofer A, The expanding snoRNA world, Biochimie 84 (2002) 775–90. - PubMed
    1. Karijolich J, Yi C, Yu YT, Transcriptome-wide dynamics of RNA pseudouridylation, Nat. Rev. Mol. Cell Biol 16 (2015) 581–5. - PMC - PubMed

Publication types