Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Oct 13;51(18):9522-9532.
doi: 10.1093/nar/gkad726.

When will RNA get its AlphaFold moment?

Affiliations

When will RNA get its AlphaFold moment?

Bohdan Schneider et al. Nucleic Acids Res. .

Abstract

The protein structure prediction problem has been solved for many types of proteins by AlphaFold. Recently, there has been considerable excitement to build off the success of AlphaFold and predict the 3D structures of RNAs. RNA prediction methods use a variety of techniques, from physics-based to machine learning approaches. We believe that there are challenges preventing the successful development of deep learning-based methods like AlphaFold for RNA in the short term. Broadly speaking, the challenges are the limited number of structures and alignments making data-hungry deep learning methods unlikely to succeed. Additionally, there are several issues with the existing structure and sequence data, as they are often of insufficient quality, highly biased and missing key information. Here, we discuss these challenges in detail and suggest some steps to remedy the situation. We believe that it is possible to create an accurate RNA structure prediction method, but it will require solving several data quality and volume issues, usage of data beyond simple sequence alignments, or the development of new less data-hungry machine learning methods.

PubMed Disclaimer

Figures

Graphical Abstract
Graphical Abstract
Figure 1.
Figure 1.
Examples of interactions in an RNA molecule. Some of the most important interactions are highlighted in dashed lines: base pairing hydrogen bonds in dark red, sugar-base stacking in dark violet, phosphate-base hydrogen bond in yellow, water-formed hydrogen bonds in cyan (waters are depicted as cyan balls). The bottom pair is canonical Watson–Crick, the pair above is a G–U pair ‘locked’ by interaction with bridging water molecule. G2147 is in syn orientation and dinucleotide C2146–G2147 is in the left-handed Z-form conformation (note the inverted direction of the ribose of C2146 further stabilized by stacking its O4’ to the guanine aromatic ring). Displayed is a six nucleotide loop from 80 nucleotide long fragment of 23S RNA from Thermus thermophilus complexed with ribosomal protein L1 (PDB ID: 4qvi) (5).
Figure 2.
Figure 2.
Distribution of values of selected evaluation measures for the predictions submitted to RNA-Puzzles from inception to 2022. Numbers in parentheses next to each puzzle indicate the total number of nucleotides for all structures in each puzzle.
Figure 3.
Figure 3.
Numbers of RNA and protein structure predictions made in RNA-Puzzles and CASP competitions. The solid lines represent the numbers of groups competing in CASP and RNA-Puzzles; the dashed lines are for the number of protein/RNA targets. From 2010 to 2021, RNAs were predicted only in RNA-Puzzles and in 2022, CASP included also RNA targets, which is responsible for the recent spike in targets and groups involved in 3D RNA structure prediction.
Figure 4.
Figure 4.
Comparison of predicted and experimentally determined structures. Displayed is hammerhead ribozyme RNA: the structure determined experimentally by X-ray diffraction at the 2.9 Å resolution (PDB ID 5di4) (65) is shown in light blue, the model PZ15_Adamiak_15 is in red. Cartoon representation of the residues A9-U33 in panel (A) suggests that the prediction follows the overall topology of the ribozyme correctly but with local deviations. Panel (B) shows segments between residues G11 and G18. The overall backbone direction is predicted correctly but local deviations are large. They include differences in base orientations and subsequently in base pairing and also the distances between the corresponding phosphorous atoms are quite large; one such distance between Ps of adenosines 15 of the target and model is highlighted by the green rod. Segments in panel B on the left and right show the same atoms, the view is rotated by ∼90°.
Figure 5.
Figure 5.
Rfam versus Pfam alignments compared based on (A) a number of sequences, (B) a number of columns and (C) the average pairwise percent identity for each family. The points on the plots indicate the mean, and the vertical bars indicate the standard deviation.
Figure 6.
Figure 6.
Counts of Rfam families, seed sequences, full sequences and structures for all Rfam families organized by Rfam RNA type.

References

    1. NCBI Resource Coordinators Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2018; 46:D8–D13. - PMC - PubMed
    1. Cech T.R., Steitz J.A., Atkins J.F.. RNA worlds: New tools for deep exploration. 2019; NY: Cold Spring Harbor Laboratory Press.
    1. Matzov D., Bashan A., Yonath A.. A bright future for antibiotics. Ann. Rev. Biochem. 2017; 86:567–583. - PubMed
    1. n.a. Big pharma craves slice of AI-based RNA drug discovery. Nat. Biotechnol. 2023; 41:305. - PubMed
    1. Tishchenko S., Kostareva O., Gabdulkhakov A., Mikhaylina A., Nikonova E., Nevskaya N., Sarskikh A., Piendl W., Garber M., Nikonov S.. Protein–RNA affinity of ribosomal protein L1 mutants does not correlate with the number of intermolecular interactions. Acta Crystallogr. D. 2015; 71:376–386. - PubMed

Publication types