State-of-the-Art Estimation of Protein Model Accuracy Using AlphaFold

James P Roney¹, Sergey Ovchinnikov²

Affiliations

¹ Harvard University, Cambridge, Massachusetts 02138, USA.
² John Harvard Distinguished Science Fellowship Program, Harvard University, Cambridge, Massachusetts 02138, USA.

PMID: 36563190
PMCID: PMC12178128
DOI: 10.1103/PhysRevLett.129.238101

State-of-the-Art Estimation of Protein Model Accuracy Using AlphaFold

James P Roney et al. Phys Rev Lett. 2022.

. 2022 Dec 2;129(23):238101.

doi: 10.1103/PhysRevLett.129.238101.

Authors

James P Roney¹, Sergey Ovchinnikov²

Affiliations

¹ Harvard University, Cambridge, Massachusetts 02138, USA.
² John Harvard Distinguished Science Fellowship Program, Harvard University, Cambridge, Massachusetts 02138, USA.

PMID: 36563190
PMCID: PMC12178128
DOI: 10.1103/PhysRevLett.129.238101

Abstract

The problem of predicting a protein's 3D structure from its primary amino acid sequence is a longstanding challenge in structural biology. Recently, approaches like alphafold have achieved remarkable performance on this task by combining deep learning techniques with coevolutionary data from multiple sequence alignments of related protein sequences. The use of coevolutionary information is critical to these models' accuracy, and without it their predictive performance drops considerably. In living cells, however, the 3D structure of a protein is fully determined by its primary sequence and the biophysical laws that cause it to fold into a low-energy configuration. Thus, it should be possible to predict a protein's structure from only its primary sequence by learning an approximate biophysical energy function. We provide evidence that alphafold has learned such an energy function, and uses coevolution data to solve the global search problem of finding a low-energy conformation. We demonstrate that alphafold'slearned energy function can be used to rank the quality of candidate protein structures with state-of-the-art accuracy, without using any coevolution data. Finally, we explore several applications of this energy function, including the prediction of protein structures without multiple sequence alignments.

PubMed Disclaimer

Figures

**FIG. 1.**
The hypothesized role of coevolutionary information in alphaf’s predictions. Images inspired by [9,10].

**FIG. 2.**
Decoy ranking results on the rosetta decoy dataset. (a) Decoy TM score vs composite confidence for an example target. Three selected alphafold output structures are visualized, color indicates model confidence. (b) Mean Spearman correlations between various metrics and decoy TM Score. (c) Mean TM Scores of the top-ranked decoys for various metrics, as well as the mean TM Score of alphafold’s prediction with no MSA. All error bars in (b) and (c) are bootstrap 95% confidence intervals of the mean. (d) Comparison of Spearman correlations for alphafold and rosetta (left) or DeepAccNet (right). (e) Comparison of top-1 accuracies for alphafold and rosetta (left) or DeepAccNet (right). For (d) and (e), each dot is a target in the rosetta decoy dataset; a dot’s position in each scatterplot depicts the relevant Spearman correlation or top-1 accuracy values computed over the decoys corresponding to that target.

**FIG. 3.**
Decoy ranking results on CASP. (a) GDT_TS loss for alphafold and top EMA methods from casp14. (b) GDT_TS Z-scores for alphafold and top EMA methods from casp14. Error bars are bootstrap 95% confidence intervals of the mean.

**FIG. 4.**
Application of alphafold’s template mechanism for sequence and structure generation. We compare single-sequence structure prediction with (a) the baseline structure prediction protocol (alphafold with a single-sequence input and three recycles) or (b) two instances of alphafold for structure generation and discrimination. (c) Protocol for sequence design to minimize loss between desired and predicted structure via distogram, with and without template (red line). (d) Comparing structure accuracy of (a) vs (b) on the rosetta decoy set. Dots colored by PLDDT red to blue (50 to 90). (e) Comparing sequence recovery with and without templates on the rosetta monomeric and casp14 FM datasets.

See this image and copyright information in PMC

References

1. Jumper J et al. , Highly accurate protein structure prediction with alphafold, Nature (London) 596, 583 (2021). - PMC - PubMed
1. Balakrishnan S, Kamisetty H, Carbonell JG, Lee S-I, and Langmead CJ, Learning generative models for protein fold families, Proteins 79, 1061 (2011). - PubMed
1. Jones DT, Buchan DWA, Cozzetto D, and Pontil M, PSICOV: Precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments, Bioinformatics 28, 184 (2012). - PubMed
1. Morcos F, Pagnani A, Lunt B, Bertolino A, Marks DS, Sander C, Zecchina R, Onuchic JN, Hwa T, and Weigt M, Direct-coupling analysis of residue coevolution captures native contacts across many protein families, Proc. Natl. Acad. Sci. U.S.A 108, E1293 (2011). - PMC - PubMed
1. Anfinsen CB, Haber E, Sela M, and White FH Jr., The kinetics of formation of native ribonuclease during oxidation of the reduced polypeptide chain, Proc. Natl. Acad. Sci. U.S.A 47, 1309 (1961). - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

DP5 OD026389/OD/NIH HHS/United States

LinkOut - more resources

Full Text Sources
- American Physical Society
- PubMed Central
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

State-of-the-Art Estimation of Protein Model Accuracy Using AlphaFold

Affiliations

State-of-the-Art Estimation of Protein Model Accuracy Using AlphaFold

Authors

Affiliations

Abstract

Figures

References

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources