Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jun 27;15(6):e1007059.
doi: 10.1371/journal.pcbi.1007059. eCollection 2019 Jun.

EternaBrain: Automated RNA design through move sets and strategies from an Internet-scale RNA videogame

Affiliations

EternaBrain: Automated RNA design through move sets and strategies from an Internet-scale RNA videogame

Rohan V Koodli et al. PLoS Comput Biol. .

Abstract

Emerging RNA-based approaches to disease detection and gene therapy require RNA sequences that fold into specific base-pairing patterns, but computational algorithms generally remain inadequate for these secondary structure design tasks. The Eterna project has crowdsourced RNA design to human video game players in the form of puzzles that reach extraordinary difficulty. Here, we demonstrate that Eterna participants' moves and strategies can be leveraged to improve automated computational RNA design. We present an eternamoves-large repository consisting of 1.8 million of player moves on 12 of the most-played Eterna puzzles as well as an eternamoves-select repository of 30,477 moves from the top 72 players on a select set of more advanced puzzles. On eternamoves-select, we present a multilayer convolutional neural network (CNN) EternaBrain that achieves test accuracies of 51% and 34% in base prediction and location prediction, respectively, suggesting that top players' moves are partially stereotyped. Pipelining this CNN's move predictions with single-action-playout (SAP) of six strategies compiled by human players solves 61 out of 100 independent puzzles in the Eterna100 benchmark. EternaBrain-SAP outperforms previously published RNA design algorithms and achieves similar or better performance than a newer generation of deep learning methods, while being largely orthogonal to these other methods. Our study provides useful lessons for future efforts to achieve human-competitive performance with automated RNA design algorithms.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Eterna and EternaBrain.
(A-C) Puzzle-solving interface presented to human players of Eterna including the state of the puzzle (whether it is solved or not) in the top left corner (red/green outline), the puzzle itself (in the middle), and the toolbar (bottom) with which the players can mutate the RNA sequence to make it fold into the desired state; yellow, blue, red, and green symbols represent A, U, G, and C nucleotides. (A) The desired target structure for the RNA molecule, as indicated by the bullseye in the bottom left (orange highlight). (B) Nature mode, as indicated by the leaf in the bottom left (orange highlight), gives the predicted minimum free energy structure for the current sequence. Since the bases in the top right should be paired with each other (orange circle), this puzzle is not yet folding correctly; this status is shown by the red indicator in the top left corner. (C) The solved puzzle. The nature-mode structure matches the target structure, and the indicator in the top left corner turns green, meaning the puzzle has been solved. (D) (left) Wide distribution of contributed Eterna solutions across different players. For preparing the eternamoves-select data set, we selected any player who had solved more than 3000 distinct puzzles, which left us with 72 players. (right) In EternaBrain, we tested whether information on players’ moves could be used to train a convolutional neural network. (E) For solving new puzzles, the final EternaBrain-SAP framework first uses the EternaBrain convolutional neural net model to predict sequence changes (‘moves’) for new RNA puzzles. In a second stage, the Single Action Playout (SAP), six additional hand-coded strategies are applied to complete the solution.
Fig 2
Fig 2. The 6 strategies included in the SAP.
(A) The original state of the puzzle before SAP. This represents a puzzle initiated with an arbitrary sequence of nucleotides; panel displays the target structure, where mismatched nucleotides (C-A) are highlighted. (B) The first step of the SAP is to correct mismatched pairs. Here, the cytosine nucleotides are switched to uracil to pair with adenine. (C) Changing end pairs to G-C. Changing base pairs that are at the edges of stems and flank loops to G-C pairs lowers the free energy of the molecule. (D) G-internal loop boost. The first nucleotide in an internal loop on either side is switched to a guanine. (E) U-G-U-G super boost. In an internal loop with 2 unpaired bases on either side, the 2 bases are changed to uracil and guanine, in that order, on either side. (F) G-hairpin boost. The first nucleotide in each strand of a hairpin loop is changed to a guanine. (G) Reorienting base pairs. Target base pairs that are not predicted to be folded correctly are ‘flipped’ to lower the energy of the structure. Here, alternating the A-U pairs lowers the energy of the stack. The 5’ end of each puzzle is at the top left, with the puzzle drawn counter-clockwise from that point.
Fig 3
Fig 3. EternaBrain performance.
(A) Performance of EternaBrain and 6 previously published algorithms on Eterna100 benchmark. EternaBrain solves 61/100, followed by MODENA (54/100), INFO-RNA (50/100), NUPACK (48/100), DSS-Opt (47/100), RNAinverse (28/100), and RNA-SSD (27/100). (B) Performance of Alternative Model Constructions. The CNN alone could solve only 20/100, and the SAP alone could solve 50/100. Removing various input features passed into the CNN resulted in drops in performance, confirming the importance of these features.
Fig 4
Fig 4. Example EternaBrain-SAP solutions to Eterna100 puzzles.
(A) U solution highlights the fact that the EternaBrain CNN alone can solve puzzles with short stems. (B) Chicken Tracks solution: EternaBrain-SAP can solve puzzles with three stems intersecting in one internal loop. (C) Thunderbolt solution demonstrates that EternaBrain-SAP can solve large puzzles (400 nucleotides long) and solve loops and stems in combination. (D) Shortie 4 solution shows EternaBrain-SAP can solve puzzles with multiple short stems (2 nucleotides long). (E) Shortie 6 is quite similar to Shortie 4, but with the same motif (short stems) repeated. The other algorithms mentioned could not solve Shortie 6 because of the repeated motifs. (F) Hard Y—target structure (left) vs nature-mode (right) structure. EternaBrain-SAP could not solve Hard Y because it required use of a little-used strategy to solve a motif called a zigzag. Since the strategy is not often used by players, the EternaBrain CNN did not learn the strategy and the strategy was not included in the SAP. In each panel, the 5’ end of each puzzle is at the top left, with the puzzle drawn counter-clockwise from that point.

Similar articles

Cited by

References

    1. Wiedenheft B., Sternberg S. H. & Doudna J. A. RNA-guided genetic silencing systems in bacteria and archaea. Nature 482, 331–338 (2012). 10.1038/nature10886 - DOI - PubMed
    1. Reynolds A. et al. Rational siRNA design for RNA interference. Nat. Biotechnol. 22, 326–330 (2004). 10.1038/nbt936 - DOI - PubMed
    1. Bonnet É., Rzążewski P. & Sikora F. Designing RNA Secondary Structures is Hard. Research in Computational and Molecular Biology 248 (2017). - PubMed
    1. Garcia-Martin J. A., Clote P. & Dotu I. RNAiFOLD: a constraint programming algorithm for RNA inverse folding and molecular design. J. Bioinform. Comput. Biol. 11, 1350001 (2013). 10.1142/S0219720013500017 - DOI - PubMed
    1. Taneda A. MODENA: a multi-objective RNA inverse folding. Adv. Appl. Bioinform. Chem. 4, 1–12 (2011). - PMC - PubMed

Publication types