Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Aug 4;28(8):963-976.e6.
doi: 10.1016/j.str.2020.05.011. Epub 2020 Jun 11.

FARFAR2: Improved De Novo Rosetta Prediction of Complex Global RNA Folds

Affiliations

FARFAR2: Improved De Novo Rosetta Prediction of Complex Global RNA Folds

Andrew Martin Watkins et al. Structure. .

Abstract

Predicting RNA three-dimensional structures from sequence could accelerate understanding of the growing number of RNA molecules being discovered across biology. Rosetta's Fragment Assembly of RNA with Full-Atom Refinement (FARFAR) has shown promise in community-wide blind RNA-Puzzle trials, but lack of a systematic and automated benchmark has left unclear what limits FARFAR performance. Here, we benchmark FARFAR2, an algorithm integrating RNA-Puzzle-inspired innovations with updated fragment libraries and helix modeling. In 16 of 21 RNA-Puzzles revisited without experimental data or expert intervention, FARFAR2 recovers native-like structures more accurate than models submitted during the RNA-Puzzles trials. Remaining bottlenecks include conformational sampling for >80-nucleotide problems and scoring function limitations more generally. Supporting these conclusions, preregistered blind models for adenovirus VA-I RNA and five riboswitch complexes predicted native-like folds with 3- to 14 Å root-mean-square deviation accuracies. We present a FARFAR2 webserver and three large model archives (FARFAR2-Classics, FARFAR2-Motifs, and FARFAR2-Puzzles) to guide future applications and advances.

Keywords: RNA; blind prediction; fragment assembly; homology modeling; structure prediction.

PubMed Disclaimer

Conflict of interest statement

Declaration of Interests The authors declare no competing interests.

Figures

Figure 1.
Figure 1.
The FARFAR2 structure prediction algorithm. A 3D structure prediction problem is specified by RNA sequence; from that sequence, a consensus secondary structure is obtained from prior literature studies or covariance analysis of sequence alignments (left), and homologies may be identified to previously solved structures (right). The orange areas in the depicted secondary structure diagram represent the regions whose conformations are unknown a priori and whose solution would guide the tertiary structure prediction. Manually identified homologies can also furnish template structures, which are combined by automatic sampling from a base pair step and fragment database in a low-resolution fragment assembly stage. Subsequent models are filtered to omit trajectories with chainbreaks or poor scores, and passing models are subjected to minimization in an all-atom scoring function. Finally, models are chosen from the resulting ensemble through clustering.
Figure 2.
Figure 2.
Cases from the FARFAR2-Classics (A-E) and FARFAR2-Motifs (F-I) benchmarks that saw success from the application of FARFAR2 instead of FARNA and SWM, respectively. In each panel, FARFAR2 model, native structure, and overlay are shown from left to right. In (A-E), the FARFAR2 model is the best of 5 low-energy cluster centers after clustering 5000 models with a 3.0 Å radius; in (F-J), the FARFAR2 model is the best of 5 low-energy cluster centers after clustering 400 models with a 2.0 Å radius. In each case, model selection conditions reproduce the conditions used in the original publications (Das and Baker, 2007; Watkins et al., 2018). In overlays, the FARFAR2 model is colored in salmon and the experimental structure in blue; in individual structures, recovered noncanonical base pairs are colored in cyan, lime, orange, salmon, and ruby, and recovered bulged residues are colored in wheat. In (A-E), residues from pre-specified, flexible helices are colored white; in (F-J), fixed input residues (mostly helical) are colored white.
Figure 3.
Figure 3.
The best-of-top-1% RMSD prediction (pink) vs. native (blue) for each RNA-Puzzle in the FARFAR2-Puzzles benchmark. White regions are input template structures employed at the original time of modeling and employed in the FARFAR2 simulation.
Figure 4.
Figure 4.
Direct comparison between original RNA-Puzzles submissions from all groups (left points) and FARFAR2 models (right points) for each benchmark case. Among RNA-puzzles submissions, those from the Das lab (created using manually curated Rosetta models, mostly using FARFAR) are black points; others gray. Among FARFAR2 models, pink points are the top 1% of models by energy; dark red are cluster centers.
Figure 5.
Figure 5.
The method of secondary structure specification affects the quality of the resulting predicted models, as illustrated on RNA-Puzzle 15, a hammerhead ribozyme. For each method, the native structure is depicted in blue and the best model from ten lowest energy cluster centers is in pink. In the scatter plots, we show the top 1% by energy of generated models in pink and the ten automatically selected cluster centers in dark red. Only base pair step sampling (used in FARFAR2) can routinely sample models closer to native than 10 Å RMSD; only the previously developed method that samples from pre-generated ensembles for each helix can approach the quality of base pair step sampling, with cluster centers found below 10 Å; despite requiring a separate modeling step for each target, even this method is still substantially worse than base pair step sampling.
Figure 6.
Figure 6.
Convergence (the average pairwise RMSD among the top 10 models) is predictive of modeling accuracy (the average RMSD to native of the 10 lowest energy models or clusters) whether with electron density (DRRAFTER and Ribosolve models, gray) or without (FARFAR2, blue). Lines of best fit: Ribosolve y = 0.66 x + 1.65 Å (R2 = 0.94); FARFAR2 y = 0.81 x + 3.69 Å (R2 = 0.84). Error bars represent standard deviation of pairwise RMSD (x-error) and standard deviation of RMSD to native (y-error), which are themselves related as Ribosolve y-error = 0.76 x-error + 0.10 Å (R2 = 0.90); FARFAR2 y-error = 0.91 x-error + 0.09 Å (R2 = 0.64). In red are depicted points for the results of the six blind challenges in this work.
Figure 7.
Figure 7.
Blind predictions (salmon) of six complex RNA folds (blue) subsequently determined via cryo-electron microscopy: the (A) F. nucleatum and (B) V. cholerae full-length tandem glycine riboswitches, as well as the (C) Mycobacterium SAM-IV riboswitch, the (D) G. kaustophilus T-box riboswitch/tRNA-Gly, the (E) B. subtilis T-box riboswitch/tRNA-Gly, and the (F) adenoviral VA RNA I. Predictions generally achieve nucleotide accuracy. White regions are input template structures.

Similar articles

Cited by

References

    1. Akiyama BM, Laurence HM, Massey AR, Costantino DA, Xie X, Yang Y, Shi PY, Nix JC, Beckham JD, and Kieft JS (2016). Zika virus produces noncoding RNAs using a multi-pseudoknot structure that confounds a cellular exonuclease. Science 354, 1148–1152. - PMC - PubMed
    1. Amarasinghe GK, De Guzman RN, Turner RB, Chancellor KJ, Wu ZR, and Summers MF (2000). NMR structure of the HIV-1 nucleocapsid protein bound to stem-loop SL2 of the Ψ-RNA packaging signal. Implications for genome recognition. J. Mol. Biol 301, 491–511. - PubMed
    1. Baeyens KJ, De Bondt HL, Pardi A, and Holbrook SR (1996). A curved RNA helix incorporating an internal loop with G·A and A·A non-Watson-Crick base pairing. Proc. Natl. Acad. Sci. U. S. A 93, 12851–12855. - PMC - PubMed
    1. Baird NJ, Zhang J, Hamma T, and Ferré-D’Amaré AR (2012). YbxF and YlxQ are bacterial homologs of L7Ae and bind K-turns but not K-loops. RNA 18, 759–770. - PMC - PubMed
    1. Berglund JA, Rosbash M, and Schultz SC (2001). Crystal structure of a model branchpoint-U2 snRNA duplex containing bulged adenosines. RNA 7, 682–691. - PMC - PubMed

Publication types

LinkOut - more resources