Deep-learning structure elucidation from single-mutant deep mutational scanning

Zachary C Drake¹, Elijah H Day¹, Paul D Toth², Steffen Lindert³

Affiliations

¹ Department of Chemistry and Biochemistry, University of California, Los Angeles, Los Angeles, CA, USA.
² Department of Chemistry and Biochemistry, Ohio State University, Columbus, OH, USA.
³ Department of Chemistry and Biochemistry, University of California, Los Angeles, Los Angeles, CA, USA. lindert@ucla.edu.

PMID: 40715235
PMCID: PMC12297490
DOI: 10.1038/s41467-025-62261-4

Deep-learning structure elucidation from single-mutant deep mutational scanning

Zachary C Drake et al. Nat Commun. 2025.

. 2025 Jul 25;16(1):6874.

doi: 10.1038/s41467-025-62261-4.

Authors

Zachary C Drake¹, Elijah H Day¹, Paul D Toth², Steffen Lindert³

Affiliations

¹ Department of Chemistry and Biochemistry, University of California, Los Angeles, Los Angeles, CA, USA.
² Department of Chemistry and Biochemistry, Ohio State University, Columbus, OH, USA.
³ Department of Chemistry and Biochemistry, University of California, Los Angeles, Los Angeles, CA, USA. lindert@ucla.edu.

PMID: 40715235
PMCID: PMC12297490
DOI: 10.1038/s41467-025-62261-4

Abstract

Deep learning has revolutionized the field of protein structure prediction. AlphaFold2, a deep neural network, vastly outperformed previous algorithms to provide near atomic-level accuracy when predicting protein structures. Despite its success, there still are limitations which prevent accurate predictions for numerous protein systems. Here we show that sparse residue burial restraints from deep mutational scanning (DMS) can refine AlphaFold2 to significantly enhance results. Burial information extracted from DMS is used to explicitly guide residue placement during structure generation. DMS-Fold was validated on both simulated and experimental single-mutant DMS, with DMS-Fold outperforming AlphaFold2 for 88% of protein targets and with 252 proteins having an improvement greater than 0.1 in TM-Score. DMS-Fold is free and publicly available: [ https://github.com/LindertLab/DMS-Fold ].

PubMed Disclaimer

Conflict of interest statement

Competing interests: The authors declare no competing interests.

Figures

Fig. 1. Heat maps depicting the correlation between changes in protein thermodynamic stabilities (ΔΔG) and solubility metrics, atomic depth (AD) and neighbor count (NC), for individual mutational types across the mega-scale set.
a Comparing ΔΔGs to native residue atomic depth. b Comparing ΔΔGs to native neighbor count. c Differences in correlation coefficients between atomic depth and neighbor count. d Comparing ΔΔGs to burial extent (defined as weighted average of both atomic depth and neighbor count).

**Fig. 2. DMS-fold overview.**
a Atomic depth, neighbor count, and mutation ∆∆Gs are used to identify mutational types likely to be destabilizing for buried residues. These mutational types are used to calculate burial scores of residues from given mutational stabilities. b DMS-Fold network architecture based on the original OpenFold architecture. Residue burial information derived from deep mutational scanning data (a) is encoded as burial scores. These are then embedded into the pair representation along the diagonal. The pair representation, coupled with the MSA representation, is initialized before being processed by the Evoformer.

**Fig. 3. Performance of DMS-Fold on the 710 CASP14/CAMEO proteins with simulated changes in protein thermodynamic stabilities (∆∆Gs).**
a Template modeling score (TM-Score) comparison of predictions from DMS-Fold and AlphaFold2 (N = 25) using a size-dependent number of nonredundant sequences (N_eff). Size of each marker represents the N_eff used for MSA subsampling. Color represents the change in network confidence, pLDDT, between DMS-Fold and AlphaFold2. b TM-Score distributions of both networks binned to TM-Scores of AlphaFold2 predictions (N = 25). c TM-Score distributions of predictions from both DMS-Fold and AlphaFold2 (N = 1) using different uniform N_eff values. d Five predicted structures (aligned to native structure (grey)) where DMS-Fold with a size-dependent N_eff (blue) had a TM-Score improvement > 0.5 compared to AlphaFold2 with no MSA-subsampling (orange). e Comparison of changes in pLDDTs and TM-Scores between predictions with DMS-Fold and AlphaFold2. Color represents the change in the difference of solubility metrics for the DMS-Fold structure and the native structure with the AlphaFold2 structure and the native structure. Points in panels a and b show the mean. In all box plots, the line shows the median and the whiskers represent the 1.5x interquartile range.

**Fig. 4. Performance of DMS-Fold on the 175 Mega-scale proteins using experimental changes in protein thermodynamic stabilities (∆∆Gs).**
a Template modeling score (TM-Score) comparison of predictions from DMS-Fold and AlphaFold2 (N = 25) using a size-dependent number of nonredundant sequences (N_eff). Size of each marker represents the N_eff used for MSA subsampling. Color represents the change in network confidence, pLDDT between DMS-Fold and AlphaFold2. b TM-Score distributions of both networks binned to TM-Scores of AlphaFold2 predictions (N = 25). c TM-Score distributions of predictions from both DMS-Fold and AlphaFold2 using different uniform N_eff values. d Top five predicted structures from AlphaFold2 with a size-dependent N_eff (orange) and DMS-Fold (N = 1) with a size-dependent N_eff (blue) aligned to their native structure (grey). e Comparison of changes in pLDDTs and TM-Scores between predictions with DMS-Fold and AlphaFold2. Color represents the change in the difference of solubility metrics for the DMS-Fold structure and the native structure with the AlphaFold2 structure and the native structure. Points in panels (a, b) show the mean. In all box plots, the line shows the median and the whiskers represent the 1.5x interquartile range.

**Fig. 5. Burial scores explicitly guide DMS-Fold inference.**
a AlphaFold2 prediction of protein 2 A (PDB ID: 7BNY) with residues colored by encoded burial score (legend in panel c). b DMS-Fold prediction of protein 2 A with residues colored by encoded burial score. c Per-residue comparisons of predicted encoded burial scores and burial extents of the native, AlphaFold2, and DMS-Fold structures. d DMS-Fold prediction of protein 2 A using false encoded burial scores of zero for all residues (blue) compared to native structure (grey).

See this image and copyright information in PMC

References

1. Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature596, 583–589 (2021). - PMC - PubMed
1. Bertoline, L. M. F., Lima, A. N., Krieger, J. E. & Teixeira, S. K. Before and after AlphaFold2: An overview of protein structure prediction. Front. Bioinforma.3, 1120370 (2023). - PMC - PubMed
1. Baek, M. et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science373, 871–876 (2021). - PMC - PubMed
1. Lin, Z. et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science379, 1123–1130 (2023). - PubMed
1. Wu, R. et al. High-resolution de novo structure prediction from primary sequence. bioRxiv. 2022.2007.2021.500999 (2022). 10.1101/2022.07.21.500999

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
- Nature Publishing Group
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Deep-learning structure elucidation from single-mutant deep mutational scanning

Affiliations

Deep-learning structure elucidation from single-mutant deep mutational scanning

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

Substances

LinkOut - more resources

Full Text Sources