. 2024 Jun 28;40(Suppl 1):i347-i356.

doi: 10.1093/bioinformatics/btae259.

RiboDiffusion: tertiary structure-based RNA inverse folding with generative diffusion models

Han Huang^{1

2}, Ziqian Lin^{1

3}, Dongchen He¹, Liang Hong¹, Yu Li¹

Affiliations

¹ Department of Computer Science and Engineering, CUHK, Hong Kong SAR, 999077, China.
² School of Computer Science and Engineering, Beihang University, Beijing, 100191, China.
³ School of Artificial Intelligence, Nanjing University, Nanjing, 210023, China.

PMID: 38940178
PMCID: PMC11211841
DOI: 10.1093/bioinformatics/btae259

RiboDiffusion: tertiary structure-based RNA inverse folding with generative diffusion models

Han Huang et al. Bioinformatics. 2024.

. 2024 Jun 28;40(Suppl 1):i347-i356.

doi: 10.1093/bioinformatics/btae259.

Authors

Han Huang^{1

2}, Ziqian Lin^{1

3}, Dongchen He¹, Liang Hong¹, Yu Li¹

Affiliations

¹ Department of Computer Science and Engineering, CUHK, Hong Kong SAR, 999077, China.
² School of Computer Science and Engineering, Beihang University, Beijing, 100191, China.
³ School of Artificial Intelligence, Nanjing University, Nanjing, 210023, China.

PMID: 38940178
PMCID: PMC11211841
DOI: 10.1093/bioinformatics/btae259

Abstract

Motivation: RNA design shows growing applications in synthetic biology and therapeutics, driven by the crucial role of RNA in various biological processes. A fundamental challenge is to find functional RNA sequences that satisfy given structural constraints, known as the inverse folding problem. Computational approaches have emerged to address this problem based on secondary structures. However, designing RNA sequences directly from 3D structures is still challenging, due to the scarcity of data, the nonunique structure-sequence mapping, and the flexibility of RNA conformation.

Results: In this study, we propose RiboDiffusion, a generative diffusion model for RNA inverse folding that can learn the conditional distribution of RNA sequences given 3D backbone structures. Our model consists of a graph neural network-based structure module and a Transformer-based sequence module, which iteratively transforms random sequences into desired sequences. By tuning the sampling weight, our model allows for a trade-off between sequence recovery and diversity to explore more candidates. We split test sets based on RNA clustering with different cut-offs for sequence or structure similarity. Our model outperforms baselines in sequence recovery, with an average relative improvement of 11% for sequence similarity splits and 16% for structure similarity splits. Moreover, RiboDiffusion performs consistently well across various RNA length categories and RNA types. We also apply in silico folding to validate whether the generated sequences can fold into the given 3D RNA backbones. Our method could be a powerful tool for RNA design that explores the vast sequence space and finds novel solutions to 3D structural constraints.

Availability and implementation: The source code is available at https://github.com/ml4bio/RiboDiffusion.

PubMed Disclaimer

Conflict of interest statement

None declared.

Figures

**Figure 1.**
Overview of RiboDiffusion for tertiary structure-based RNA inverse folding. We construct a dataset with experimentally determined RNA structures from PDB, supplemented with additional structures predicted by an RNA structure prediction model. We cluster RNA with different cut-offs for sequence or structure similarity and make cross-split to evaluate models. RiboDiffusion trains a neural network with a structure module and a sequence module to recover the original sequence from a noisy sequence and a coarse-grained RNA backbone extracted from the tertiary structure. RiboDiffusion then uses the trained network to iteratively refine random initial sequences until they match the target structure. We present a comprehensive evaluation and analysis of the proposed method.

**Figure 2.**
Violin plots for the recovery rate distribution of methods for different types of RNA, including tRNA, rRNA, sRNA, ribozyme, snRNA, SRP RNA, hammerhead ribozyme, and pre miRNA.

**Figure 3.**
Performance of RiboDiffusion on different RNA families under the cross-family setting. The average length and number of tertiary structures for each family are marked above violin plots.

**Figure 4.**
Analysis of RiboDiffusion. (a, b) *In silico* folding validation results that show the TM-score between structures predicted by RhoFold or DRFold and the given fixed RNA backbones (on *Seq. 0.4* split). *Native* represents structures predicted from original sequences of given backbones as references, while *Generated* represents structures predicted from generated sequences. (c, d) Trade-offs between the diversity of generated sequences and recovery rate, as well as refolding F1-score (including models with and without augmented data). (e) Visualization of input RNA structures (pink) and predicted structures (green) of generated sequences. The generated sequences and the corresponding native sequences are shown below the structure visualization, where different nucleotide types are marked in red.

See this image and copyright information in PMC

Cited by

Sifting through the noise: A survey of diffusion probabilistic models and their applications to biomolecules.
Norton T, Bhattacharya D. Norton T, et al. J Mol Biol. 2025 Mar 15;437(6):168818. doi: 10.1016/j.jmb.2024.168818. Epub 2024 Oct 9. J Mol Biol. 2025. PMID: 39389290 Review.
Comprehensive datasets for RNA design, machine learning, and beyond.
Badura J, Rybarczyk A, Zok T. Badura J, et al. Sci Rep. 2025 Jul 1;15(1):21417. doi: 10.1038/s41598-025-07041-2. Sci Rep. 2025. PMID: 40594473 Free PMC article.
Secondary-Structure-Informed RNA Inverse Design via Relational Graph Neural Networks.
Manzourolajdad A, Mohebbi M. Manzourolajdad A, et al. Noncoding RNA. 2025 Feb 26;11(2):18. doi: 10.3390/ncrna11020018. Noncoding RNA. 2025. PMID: 40126342 Free PMC article.
Computational De Novo Design of Group II Introns Yields Highly Active Ribozymes.
Szokoli D, Nwosu NE, Glatt LM, Mutschler H. Szokoli D, et al. Chembiochem. 2025 Jul 18;26(14):e202500356. doi: 10.1002/cbic.202500356. Epub 2025 Jun 30. Chembiochem. 2025. PMID: 40504414 Free PMC article.
DRAG: design RNAs as hierarchical graphs with reinforcement learning.
Li Y, Pan X, Shen H, Yang Y. Li Y, et al. Brief Bioinform. 2025 Mar 4;26(2):bbaf106. doi: 10.1093/bib/bbaf106. Brief Bioinform. 2025. PMID: 40079262 Free PMC article.

References

1. Andronescu M, Fejes AP, Hutter F. et al. A new algorithm for RNA secondary structure design. J Mol Biol 2004;336:607–24. - PubMed
1. Baek M, McHugh R, Anishchenko I. et al. Accurate prediction of protein-nucleic acid complexes using rosettafoldna. Nat Methods 2024;21:117–21. - PMC - PubMed
1. Bank PD. Protein data bank. Nature New Biol 1971;233:223. - PubMed
1. Benhenda M. ChemGAN challenge for drug discovery: can AI reproduce natural chemical diversity? arXiv, arXiv:1708.08227, 2017, preprint: not peer reviewed.
1. Busch A, Backofen R.. Info-RNA – a fast approach to inverse RNA folding. Bioinformatics 2006;22:1823–31. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

RiboDiffusion: tertiary structure-based RNA inverse folding with generative diffusion models

Affiliations

RiboDiffusion: tertiary structure-based RNA inverse folding with generative diffusion models

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

Grants and funding

LinkOut - more resources

Full Text Sources