Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2025 Aug 15:2025.08.14.670328.
doi: 10.1101/2025.08.14.670328.

Accelerating Biomolecular Modeling with AtomWorks and RF3

Affiliations

Accelerating Biomolecular Modeling with AtomWorks and RF3

Nathaniel Corley et al. bioRxiv. .

Abstract

Deep learning methods trained on protein structure databases have revolutionized biomolecular structure prediction, but developing and training new models remains a considerable challenge. To facilitate the development of new models, we present AtomWorks: a broadly applicable data framework for developing state-of-the-art biomolecular foundation models spanning diverse tasks, including structure prediction, generative protein design, and fixed backbone sequence design. We use AtomWorks to train RosettaFold-3 (RF3), a structure prediction network capable of predicting arbitrary biomolecular complexes with an improved treatment of chirality that narrows the performance gap between closed-source AlphaFold3 (AF3) and existing open-source implementations. We expect that AtomWorks will accelerate the next generation of open-source biomolecular machine learning models and that RF3 will be broadly useful as a structure prediction tool. To this end, we release the AtomWorks framework (https://github.com/RosettaCommons/atomworks), together with curated training data, code and model weights for RF3 (https://github.com/RosettaCommons/modelforge) under a permissive BSD license.

PubMed Disclaimer

Figures

Fig. 1:
Fig. 1:. AtomWorks enables training of state-of-the-art models for structure prediction, generative protein design, and fixed-backbone sequence design within a single data framework.
Top: AtomWorks can ingest (and RF3 is trained on) a diverse set of datasets including the Protein Data Bank (PDB), generated nucleic acid distillation sets, monomer distillation sets, PDB structures with inverted chirality, and PDB structures with extended disordered regions. Middle: Networks for all biomolecular modeling use cases can share common components within the AtomWorks framework. Bottom: Representations of outputs from various models trained with AtomWorks.
Fig. 2:
Fig. 2:. RF3 respects the chirality of the inputs.
A: RF3 respects the chirality of small molecule inputs more than all other structure prediction models. We first cluster all small molecules in our test set by CCD code and average the percentage of correctly predicted centers within each cluster. Overall chiral center accuracy is computed by taking the mean across all clusters. Statistical significance was assessed using two-sample t-tests. RF3 and AF3 both significantly outperformed Boltz-2 (p < 0.001 and p = 0.002, respectively), while the difference between RF3 and AF3 was not significant (p = 0.064). Model training date cutoff indicated in parentheses. B: Comparison of predicted (teal) with ground-truth crystal structure (gray) for PDB ID 7UZL. In this example, 100% of the chiral centers are predicted with the correct chirality (including three D amino acids). C: On a test set of mixed chirality macrocycles from 2022, outside the training date cutoff of all models benchmarked, RF3 predicts the structures with a high degree of accuracy (1.74 mean backbone RMSD) D: In the mixed chirality test set, 85% of chiral centers are predicted correctly by RF3 vs. 70% by AF3. Boltz 1x structures were predicted with inference-time chiral guidance and thus are guaranteed to satisfy the input chirality.
Fig. 3:
Fig. 3:. Novel capabilities of RF3.
A. RF3 enables users to input desired conformers. (left) example of a user input conformer and prediction. (right) providing the ligand ground truth ligand conformer or the protein holo conformation improves accuracy. “Templating” in this case means providing all-by-all pairwise distances within the ligand or protein. B. RF3 is trained with a disorder distillation set. In contrast to AF3, which repredicted the entire PDB with AF2 to show examples of extended disordered regions, we chose to use the more compute-efficient Rosetta macromolecular modeling software to generate structures with “extended” backbones for disordered regions. In 2% of cases, the model is trained with PDB examples with extended disordered regions. (left) example of a prediction with a large unresolved region which is predicted in an extended conformation. (right) Analysis of the mean RASA over unresolved regions in our test set. Model training date cutoff indicated in parentheses.
Fig. 4:
Fig. 4:. RF3 accurately predicts biomolecular interactions.
A. Scatterplots comparing all-atom interface lDDT of RF3 and AF3 on protein-protein interactions and protein-ligand interactions. To reduce redundancy, the test dataset was clustered (by sequence homology 40% for polymers and CCD identity for non-polymers); each point represents a cluster mean. For all networks, we generate five structures from the same seed and take the sample that scores the highest on each metric (“Best of 5” approach). B. Boxplots showing accuracy of RF3, AF3 and Boltz on different structure modeling tasks. Two versions of RF3 are shown: one trained on structures released before September, 2021 and another trained on structures deposited before January, 2024. Each point in the boxplot is a mean value over a cluster of structures. Model training date cutoff indicated in parentheses.

References

    1. Jumper John, Evans Richard, Pritzel Alexander, Green Tim, Figurnov Michael, Tunyasuvunakool Kathryn, Ronneberger Olaf, Bates Russ, Žídek Augustin, Bridgland Alex, et al. Alphafold 2. Fourteenth Critical Assessment of Techniques for Protein Structure Prediction, 2020.
    1. Baek Minkyung, DiMaio Frank, Anishchenko Ivan, Dauparas Justas, Ovchinnikov Sergey, Lee Gyu Rie, Wang Jue, Cong Qian, Kinch Lisa N, Schaeffer R. Dustin, Millán Claudia, Park Hahnbeom, Adams Carlos, Glassman Craig R, DeGiovanni Alexander, Pereira Jose H, Rodrigues Andrew V, van Dijk Amelie A, Ebrecht Andrea C, Opperman D. J., Sagmeister Tobias, Buhlheller Christoph, Pavkov-Keller Tea, Rathinaswamy Manoj K, Dalwadi Ujwala, Yip Christopher K, Burke John E, Garcia K. Christopher, Grishin Nick V, Adams Paul D, Read Randy J, and Baker David. Accurate prediction of protein structures and interactions using a three-track neural network. Science, 373(6557):871–876, 2021. doi: 10.1126/science.abj8754. URL https://www.science.org/doi/10.1126/science.abj8754. - DOI - PMC - PubMed
    1. Evans Richard, O’Neill Michael, Pritzel Alexander, Antropova Natasha, Senior Andrew, Green Timothy, Žídek Augustin, Bates Russell, Blackwell Sam, Yim Jason, Ronneberger Olaf, Bodenstein Sebastian, Zielinski Mirko, Bridgland Andrej, Potapenko Alexander, Cowie Charlie, Tunyasuvunakool Kathryn, Jain Ruchi, Clancy Ellen, Kohli Pushmeet, Jumper John, and Hassabis Demis. Protein complex prediction with alphafold-multimer. bioRxiv, page 2021.10.04.463034, 2022. doi: 10.1101/2021.10.04.463034. URL https://doi.org/10.1101/2021.10.04.463034. - DOI
    1. Krishna Rohith, Wang Jue, Ahern Woody, Sturmfels Pascal, Venkatesh Preetham, Kalvet Indrek, Lee Gyu Rie, Morey-Burrows Felix S, Anishchenko Ivan, Humphreys Ian R, et al. Generalized biomolecular modeling and design with rosettafold all-atom. Science, 384(6693): eadl2528, 2024. - PubMed
    1. Abramson Josh, Adler Jonas, Dunger Jack, Evans Richard, Green Tim, Pritzel Alexander, Ronneberger Olaf, Willmore Lindsay, Ballard Andrew J, Bambrick Joshua, et al. Accurate structure prediction of biomolecular interactions with alphafold 3. Nature, pages 1–3, 2024. - PMC - PubMed

Publication types

LinkOut - more resources