This is a preprint.
Ribonanza: deep learning of RNA structure through dual crowdsourcing
- PMID: 38464325
- PMCID: PMC10925082
- DOI: 10.1101/2024.02.24.581671
Ribonanza: deep learning of RNA structure through dual crowdsourcing
Abstract
Prediction of RNA structure from sequence remains an unsolved problem, and progress has been slowed by a paucity of experimental data. Here, we present Ribonanza, a dataset of chemical mapping measurements on two million diverse RNA sequences collected through Eterna and other crowdsourced initiatives. Ribonanza measurements enabled solicitation, training, and prospective evaluation of diverse deep neural networks through a Kaggle challenge, followed by distillation into a single, self-contained model called RibonanzaNet. When fine tuned on auxiliary datasets, RibonanzaNet achieves state-of-the-art performance in modeling experimental sequence dropout, RNA hydrolytic degradation, and RNA secondary structure, with implications for modeling RNA tertiary structure.
Figures

















Similar articles
-
Deep learning models for predicting RNA degradation via dual crowdsourcing.Nat Mach Intell. 2022;4(12):1174-1184. doi: 10.1038/s42256-022-00571-8. Epub 2022 Dec 14. Nat Mach Intell. 2022. PMID: 36567960 Free PMC article.
-
Deep learning models for predicting RNA degradation via dual crowdsourcing.ArXiv [Preprint]. 2021 Oct 14:arXiv:2110.07531v2. ArXiv. 2021. Update in: Nat Mach Intell. 2022;4(12):1174-1184. doi: 10.1038/s42256-022-00571-8. PMID: 34671698 Free PMC article. Updated. Preprint.
-
EternaBrain: Automated RNA design through move sets and strategies from an Internet-scale RNA videogame.PLoS Comput Biol. 2019 Jun 27;15(6):e1007059. doi: 10.1371/journal.pcbi.1007059. eCollection 2019 Jun. PLoS Comput Biol. 2019. PMID: 31247029 Free PMC article.
-
Comprehensive review and assessment of computational methods for predicting RNA post-transcriptional modification sites from RNA sequences.Brief Bioinform. 2020 Sep 25;21(5):1676-1696. doi: 10.1093/bib/bbz112. Brief Bioinform. 2020. PMID: 31714956 Review.
-
Development and integration of VGG and dense transfer-learning systems supported with diverse lung images for discovery of the Coronavirus identity.Inform Med Unlocked. 2022;32:101004. doi: 10.1016/j.imu.2022.101004. Epub 2022 Jul 8. Inform Med Unlocked. 2022. PMID: 35822170 Free PMC article. Review.
References
-
- Jumper J., Evans R., Pritzel A., Green T., Figurnov M., Ronneberger O., Tunyasuvunakool K., Bates R., Žídek A., Potapenko A., Bridgland A., Meyer C., Kohl S. A. A., Ballard A. J., Cowie A., Romera-Paredes B., Nikolov S., Jain R., Adler J., Back T., Petersen S., Reiman D., Clancy E., Zielinski M., Steinegger M., Pacholska M., Berghammer T., Bodenstein S., Silver D., Vinyals O., Senior A. W., Kavukcuoglu K., Kohli P. & Hassabis D. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021). - PMC - PubMed
-
- Baek M., DiMaio F., Anishchenko I., Dauparas J., Ovchinnikov S., Lee G. R., Wang J., Cong Q., Kinch L. N., Schaeffer R. D., Millán C., Park H., Adams C., Glassman C. R., DeGiovanni A., Pereira J. H., Rodrigues A. V., van Dijk A. A., Ebrecht A. C., Opperman D. J., Sagmeister T., Buhlheller C., Pavkov-Keller T., Rathinaswamy M. K., Dalwadi U., Yip C. K., Burke J. E., Garcia K. C., Grishin N. V., Adams P. D., Read R. J. & Baker D. Accurate prediction of protein structures and interactions using a three-track neural network. Science 373, 871–876 (2021). - PMC - PubMed
-
- Chen K., Zhou Y., Wang S. & Xiong P. RNA tertiary structure modeling with BRiQ potential in CASP15. Proteins 91, 1771–1778 (2023). - PubMed
Publication types
Grants and funding
LinkOut - more resources
Full Text Sources