Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Aug;34(8):e70210.
doi: 10.1002/pro.70210.

Artificial intelligence and first-principle methods in protein redesign: A marriage of convenience?

Affiliations

Artificial intelligence and first-principle methods in protein redesign: A marriage of convenience?

Damiano Cianferoni et al. Protein Sci. 2025 Aug.

Abstract

Since AlphaFold2's rise, many deep learning methods for protein design have emerged. Here, we validate widely used and recognized tools, compare them with first-principle methods, and explore their combinations, focusing on their effectiveness in protein redesign and potential for therapeutic repurposing. We address two challenges: evaluating tools and combinations ability to detect the effects of multiple concurrent mutations in protein variants, and leveraging large-scale datasets to compare modeling-free methods, namely force fields, which handle point mutations well with limited backbone rearrangement, and inverse folding tools, which excel at native sequence recovery but may struggle with non-natural proteins. Debuting TriCombine, a tool that identifies residue triangles in input structures, matches them to a structural database, and scores mutants based on substitution frequencies, we shortlisted candidates, modeled them with FoldX, and generated 16 SH3 mutants carrying up to 9 concurrent substitutions. The dataset was expanded to include 36 mutants and 11 crystal structures (7 newly solved), along with a parallel set of multiple non-concurrent mutants from three additional proteins. For broader validation, we analyzed 160,000 four-site GB1 mutants and 163,555 (single and double) variants across 179 natural and de novo domains. We show that combining AI-based modeling tools with force field scoring functions yields the most reliable results. Inverse folding tools perform very well but lose accuracy on less-represented proteins. First-principle force fields like FoldX remain the most accurate for point mutations. All methods perform worse when applied to unsolved de novo models, underscoring the need for hybrid strategies in robust protein design.

Keywords: artificial intelligence; crystallographic structure; force field; protein design.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflicts of interest.

Figures

FIGURE 1
FIGURE 1
(a) In red, tube visualization of 1e6g backbone showing the nine residues of the hydrophobic core as sticks. In yellow, tube backbone of 1shg showing hydrophobic core residues as sticks. (b) Table shows the 16 candidates core mutations, compared with the two starting structures sequences. A “‐” symbol indicates the same amino acid as in the WT structure (1shg). Table columns from left to right: name of the mutant, identities of the mutated residues at a certain position, number of mutations for the mutant (# mut), predicted variation in Å3 for the hydrophobic core (ΔVol), total cavity volume generated in Å3 (IC), whether the mutant has total energy sensibly worse than its reference structure (DM), presence of residues with Van der Waals clashes (CR), residues with polar atoms desolvated not making a H‐bond (PDR). Light gray indicates at least one core residue with a ΔΔG with respect to its reference structure between 0.5 and 0.8 kcal/mol. Dark gray shows more than one residue between 0.5 and 0.8 kcal/mol or at least one higher than 0.8 kcal/mol. Black boxes indicate more than one core residue above 0.8 kcal/mol. (c) Experimental ΔG of folding for the 16 mutants, 1shg and 1e6g variants. A green dotted line marks WT stability. (d) Comparison of experimental ΔΔG with FoldX energy predictions for mutants modeled over 1shg template previously mutated to Alanine in the hydrophobic core positions. (e) Comparison of experimental ΔΔG values with the FoldX energy predictions for mutants modeled over the 1e6g template previously mutated to Alanine in core positions.
FIGURE 2
FIGURE 2
(a) Table showing true positive (tp), true negative (tn), false positive (fp), false negative (fn), and balanced accuracy (bACC) with confidence intervals (CI) of foldability prediction for all tested approaches over the 36 stability data points for SH3 variants (left). In the middle, a histogram compares the correlations of predictions with experimental ΔΔGs on a percentage scale. On the right the same comparison shown on the left but for the literature batch. (b–g) Scatterplot of selected scorers showing experimental ΔΔGs versus predicted Δ‐scores. Dotted green line shows WT values on the two axes, bottom left point shows variants correctly predicted as less stable than the WT, top right are correctly predicted as more stable. Bottom right shows overpredicted variants and top left underpredicted ones. Finally in green the WT variant point.
FIGURE 3
FIGURE 3
(a) Barplot illustrating average GDT scores across 11 crystallographic structures of SH3 domain mutants. Darker bars show the percentage of atoms predicted within 0.5 Å from expected positions; Clearer bars: within 1 Å. Tool names are color‐coded: red for those modeling starting from sequence, orange for the same ones but later relaxed with FoldX (F) or Rosetta (R), blue for fixed backbone ones using 1shg as template. (b) Heatmaps showing hydrophobic core side chain RMSD for each modeling method averaged over 11 crystallized variants. Above the average Cα RMSD, below the whole‐atom one. Values were capped to an upper limit of 2 Å. The complete RMSD analysis is available at Figure S6 and Table S4.
FIGURE 4
FIGURE 4
(a) Wild‐type GB1 structure (pink) bound to IgG ligand (purple), with the mutated residue highlighted. The right panel shows a close‐up of the four residues and two interacting ligand residues (Asn and His, purple). (b) Diversity of mutants from the filtered Mega‐scale dataset. (c) Histogram comparing AUC performance of five fixed‐backbone predictors for 1–4 Hamming Distance mutants in the GB1 dataset. (d) Histogram comparing AUC performance for 1–2 Hamming Distance mutants from Natural and Artificial proteins in the filtered Mega‐scale dataset. (e) True positive (TP) and false positive (FP) counts for the five predictors, relative to experimental ΔΔG > 0 against wild type in the GB1 dataset. (f) TP and FP count for the five predictors, comparing experimental ΔΔG > 0 against wild type in the filtered Mega‐scale dataset.

References

    1. Afonine PV, Poon BK, Read RJ, Sobolev OV, Terwilliger TC, Urzhumtsev A, et al. Real‐space refinement in PHENIX for cryo‐EM and crystallography. Acta Crystallogr D Struct Biol. 2018;74:531–544. - PMC - PubMed
    1. Ahdritz G, Bouatta N, Kadyan S, Xia Q, Gerecke W, O'Donnell TJ, et al. OpenFold: Retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for generalization. 2022. bioRxiv [Internet]. Available from: http://biorxiv.org/lookup/doi/10.1101/2022.11.20.517210 - DOI - PMC - PubMed
    1. Alford RF, Leaver‐Fay A, Jeliazkov JR, O'Meara MJ, DiMaio FP, Park H, et al. The Rosetta all‐atom energy function for macromolecular modeling and design. J Chem Theory Comput. 2017;13:3031–3048. - PMC - PubMed
    1. Alley EC, Khimulya G, Biswas S, AlQuraishi M, Church GM. Unified rational protein engineering with sequence‐based deep representation learning. Nat Methods. 2019;16:1315–1322. - PMC - PubMed
    1. Anand N, Eguchi R, Mathews II, Perez CP, Derry A, Altman RB, et al. Protein sequence design with a learned potential. Nat Commun. 2022;13:746. - PMC - PubMed

LinkOut - more resources