Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Dec 2;129(23):238101.
doi: 10.1103/PhysRevLett.129.238101.

State-of-the-Art Estimation of Protein Model Accuracy Using AlphaFold

Affiliations

State-of-the-Art Estimation of Protein Model Accuracy Using AlphaFold

James P Roney et al. Phys Rev Lett. .

Abstract

The problem of predicting a protein's 3D structure from its primary amino acid sequence is a longstanding challenge in structural biology. Recently, approaches like alphafold have achieved remarkable performance on this task by combining deep learning techniques with coevolutionary data from multiple sequence alignments of related protein sequences. The use of coevolutionary information is critical to these models' accuracy, and without it their predictive performance drops considerably. In living cells, however, the 3D structure of a protein is fully determined by its primary sequence and the biophysical laws that cause it to fold into a low-energy configuration. Thus, it should be possible to predict a protein's structure from only its primary sequence by learning an approximate biophysical energy function. We provide evidence that alphafold has learned such an energy function, and uses coevolution data to solve the global search problem of finding a low-energy conformation. We demonstrate that alphafold'slearned energy function can be used to rank the quality of candidate protein structures with state-of-the-art accuracy, without using any coevolution data. Finally, we explore several applications of this energy function, including the prediction of protein structures without multiple sequence alignments.

PubMed Disclaimer

Figures

FIG. 1.
FIG. 1.
The hypothesized role of coevolutionary information in alphaf’s predictions. Images inspired by [9,10].
FIG. 2.
FIG. 2.
Decoy ranking results on the rosetta decoy dataset. (a) Decoy TM score vs composite confidence for an example target. Three selected alphafold output structures are visualized, color indicates model confidence. (b) Mean Spearman correlations between various metrics and decoy TM Score. (c) Mean TM Scores of the top-ranked decoys for various metrics, as well as the mean TM Score of alphafold’s prediction with no MSA. All error bars in (b) and (c) are bootstrap 95% confidence intervals of the mean. (d) Comparison of Spearman correlations for alphafold and rosetta (left) or DeepAccNet (right). (e) Comparison of top-1 accuracies for alphafold and rosetta (left) or DeepAccNet (right). For (d) and (e), each dot is a target in the rosetta decoy dataset; a dot’s position in each scatterplot depicts the relevant Spearman correlation or top-1 accuracy values computed over the decoys corresponding to that target.
FIG. 3.
FIG. 3.
Decoy ranking results on CASP. (a) GDT_TS loss for alphafold and top EMA methods from casp14. (b) GDT_TS Z-scores for alphafold and top EMA methods from casp14. Error bars are bootstrap 95% confidence intervals of the mean.
FIG. 4.
FIG. 4.
Application of alphafold’s template mechanism for sequence and structure generation. We compare single-sequence structure prediction with (a) the baseline structure prediction protocol (alphafold with a single-sequence input and three recycles) or (b) two instances of alphafold for structure generation and discrimination. (c) Protocol for sequence design to minimize loss between desired and predicted structure via distogram, with and without template (red line). (d) Comparing structure accuracy of (a) vs (b) on the rosetta decoy set. Dots colored by PLDDT red to blue (50 to 90). (e) Comparing sequence recovery with and without templates on the rosetta monomeric and casp14 FM datasets.

References

    1. Jumper J et al. , Highly accurate protein structure prediction with alphafold, Nature (London) 596, 583 (2021). - PMC - PubMed
    1. Balakrishnan S, Kamisetty H, Carbonell JG, Lee S-I, and Langmead CJ, Learning generative models for protein fold families, Proteins 79, 1061 (2011). - PubMed
    1. Jones DT, Buchan DWA, Cozzetto D, and Pontil M, PSICOV: Precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments, Bioinformatics 28, 184 (2012). - PubMed
    1. Morcos F, Pagnani A, Lunt B, Bertolino A, Marks DS, Sander C, Zecchina R, Onuchic JN, Hwa T, and Weigt M, Direct-coupling analysis of residue coevolution captures native contacts across many protein families, Proc. Natl. Acad. Sci. U.S.A 108, E1293 (2011). - PMC - PubMed
    1. Anfinsen CB, Haber E, Sela M, and White FH Jr., The kinetics of formation of native ribonuclease during oxidation of the reduced polypeptide chain, Proc. Natl. Acad. Sci. U.S.A 47, 1309 (1961). - PMC - PubMed

LinkOut - more resources