Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Nov;29(11):1056-1067.
doi: 10.1038/s41594-022-00849-w. Epub 2022 Nov 7.

A structural biology community assessment of AlphaFold2 applications

Affiliations

A structural biology community assessment of AlphaFold2 applications

Mehmet Akdel et al. Nat Struct Mol Biol. 2022 Nov.

Abstract

Most proteins fold into 3D structures that determine how they function and orchestrate the biological processes of the cell. Recent developments in computational methods for protein structure predictions have reached the accuracy of experimentally determined models. Although this has been independently verified, the implementation of these methods across structural-biology applications remains to be tested. Here, we evaluate the use of AlphaFold2 (AF2) predictions in the study of characteristic structural elements; the impact of missense variants; function and ligand binding site predictions; modeling of interactions; and modeling of experimental structural data. For 11 proteomes, an average of 25% additional residues can be confidently modeled when compared with homology modeling, identifying structural features rarely seen in the Protein Data Bank. AF2-based predictions of protein disorder and complexes surpass dedicated tools, and AF2 models can be used across diverse applications equally well compared with experimentally determined structures, when the confidence metrics are critically considered. In summary, we find that these advances are likely to have a transformative impact in structural biology and broader life-science research.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Additional coverage provided by AF2-predicted models.
a, Added structural coverage (per-protein, left; per-residue, middle) and per-residue confidence of regions not covered by SMR (right) for 11 species included in both the AF2 and SMR databases. b, Fraction of confident (pLDDT > 70) residues per human AF2 model, binned by r.m.s.d. from the corresponding trRosetta-derived domain-level Pfam model; 3,035 AF2 predicted structures of protein regions matching one of 1,464 different Pfam domain families were compared with the corresponding trRosetta model. c, Median fragment length and median pLDDT score of human AF2-only regions. The highlighted area identifies high-confidence regions with domain-like length. The bottom, middle line and top of the box correspond to the 25th, 50th and 75th percentiles, respectively. d, Comparison of AF2 SASA (SASA20, 20-residue smoothing) and pLDDT (pLDDT20, 20-residue smoothing) against a disorder prediction method (IUpred2).
Fig. 2
Fig. 2. The space of characteristic structural elements in AF2 structural models for 21 species.
Visualization of t-SNE dimensionality reduction analysis, in which structures with similar structural elements are placed closer together and the 20 most common superfamilies are colored. The axes corresponding to the t-SNE dimension 1 and t-SNE dimension 2 were omitted. Six shape-mer groups (that is, topics) discussed in the text, consisting of mainly AF2 proteins as opposed to PDB proteins, are labeled A–F, and a representative structure is depicted for each. Residues in the representative structures are colored according to their contribution to the topic under consideration—red residues have the highest contribution, and blue residues are specific to the example and not to the topic.
Fig. 3
Fig. 3. Comparing structure-based prediction of impact of protein missense mutations using experimental and AF2-derived models.
a, Relationship between the predicted ΔΔG for mutations with measured experimental impact of the mutation from deep mutational scanning data (−1 × Pearson correlation). The predicted change in stability was determined using one of three structure-based methods, using structures from AF2 or available experimental models. The bottom, middle line and top of the box correspond to the 25th, 50th and 75th percentiles, respectively. The lines extend to 1.5 × IQR (interquartile range). A total of 117,135 mutations were used in the analysis. b, Correlations based on the FoldX predictions as in a, but subsetting the positions in AF2 models according to confidence and whether the position is present in an experimental structure. Data are presented as mean values ± the confidence intervals calculated via fisher’s Z transform (R’s cor.test function). c, The mean impact of a mutation, calculated as the enrichment ratio (ER) score, from DMS data for positions in AF2 models with different degrees of confidence. A total of 117,135 mutations were used in the analysis. d, Comparative performance of methods for predicting stability changes upon mutation using AF2 and experimental and homology models based on protein structure templates of different identity cut-offs. Experimental measurements of stability are for 2,648 single-point missense mutations over 121 proteins. The bottom, middle line and top of the box correspond to the 25th, 50th and 75th percentiles, respectively. The lines extend to 1.5 × IQR. e, Example application for structure-based prediction of stability impact of known disease mutations for a human protein with little structural coverage prior to AF2. ΔΔG stability changes were predicted using Rosetta, and a substantial impact was considered for ΔΔG > 1.5 kcal/mol.
Fig. 4
Fig. 4. Pocket detection and function prediction.
a, Size of known binding sites (or unified binding sites) compared with the size of top AutoSite pockets in experimental holo (bound), experimental apo (unbound) and AF2 structures. AF2 structures are split into high-confidence (mean pLDDT > 90) and low-confidence (mean pLDDT ≤ 90) subsets. The bottom, middle line and top of the box correspond to the 25th, 50th and 75th percentiles, respectively. The lines extend to 1.5 × IQR. b, Distribution of overlap between known binding sites and top predicted pockets for holo, apo and AF2 structures. The bottom, middle line and top of the box correspond to the 25th, 50th and 75th percentiles, respectively. The lines extend to 1.5 × IQR (interquartile range). c, Enzymatic activity prediction using pocket-derived, template-derived and combined metrics. AUC, area under the curve. TPR, true positive rate; FPR, false positive rate. d, Superposition of the AF2 model of DEGS1 (O15121) with PDB entry 4ZYO. Orange: ribbon representation of AF2 predicted structure for DEGS1. Cyan: ribbon representation of 4ZYO. Zinc atoms (light blue spheres) and bound substrate (dark blue ball and stick) as observed in the structure of 4ZYO are also shown. e, Close up of the metal-binding center of 4ZYO. Ribbon representation of the protein and metal chelators for DEGS1 and 4ZYO are shown in orange and cyan, respectively. The zinc atoms observed in 4ZYO are shown as light blue spheres. Metal-chelating residues for DEGS1 are clearly identifiable.
Fig. 5
Fig. 5. Using AF2 to predict homo-oligomeric assemblies and their oligomeric state.
a, AF2 prediction for each oligomeric state (1–4 for monomers and dimers, and 1–5 for trimers and tetramers). Only proteins for which the monomer had pLDDT > 90 are shown. For visualization, the predicted successes (top) and failures (bottom) were separated into two plots. Success is defined when the peak of the homo-oligomeric state scan matches the annotation, or the pTMscore of the next oligomer state is substantially lower (−0.1). b, For each of the annotated assemblies, the pTMscore of monomeric prediction is compared with the max pTMscore of non-monomeric prediction. c, Monomer prediction failure. Two monomers were predicted to be homo-dimers. For the first case (PDB: 1BKZ), the prediction matched the asymmetric unit (shown as blue/green and prediction in white). For the second case (PDB: 1BWZ), the prediction matched one of the crystallographic interfaces. d, 3TDT trimer was predicted to be a tetramer. Although the interface is technically correct, for this c-symmetric protein, the pTMscore was not able to discriminate between 3 and 4 copies. e, Comparison of docking quality between AF2 (x axis) and a standard docking tool GRAMM (y axis). Comparisons were made using the DockQ score. Models with a DockQ score that was higher than 0.23 are assumed to be acceptable according to the Critical Assessment for Predicted Interactions (CAPRI) criteria (marked outside the shaded area). Black circles indicate the complex was well modeled by both methods. The average DockQ score and the number of acceptable or better models are shown in the axis labels. It should be noted here that AF2 both folds and docks the proteins, whereas GRAMM only docks them. f, Examples of AF2-predicted interactions mediated by regions of intrinsic disorder.
Fig. 6
Fig. 6. Application of AF2 predictions to modeling into cryo-EM or crystallographic data.
a, AF2 predictions for individual chains in 6O85, aligned to the original model and colored by Cα–Cα distance, with the map (EMD-0651) contoured at 6.5 σ. Red domains at the bottom were correctly folded but misplaced owing to flexibility; smaller regions of red correspond either to flexible tails or register errors in the original model. b,c, Use of adaptive distance and torsion restraints to correct problematic geometry in the original model. The models before (b) and after (c) refitting are shown; satisfied distance restraints are hidden for clarity. d, Owing to very poor local resolution and lack of homologs, the carboxy-terminal domain in chain J (left of the dashed line) was previously left unmodeled. This domain was predicted with high confidence by AF2 (mean pLDDT = 83.0), and fit readily into the available density. e, High-confidence regions may still contain subtle errors that are difficult or impossible to detect in the absence of experimental data. The side chain of Trp A111 (pLDDT = 86.1) was modeled backwards (blue), forming an H-bond with Asp A77; the final model fitted to the map (gray) instead forms an H-bond with Glu A81. f, Rebuilding the recent 3.3-Å crystal structure 7OGG, starting from molecular replacement with AF2 models, dramatically improved model completeness. Blue, residues identified in original model; yellow sticks, residues modeled as unknown in the original model; red, residues identified in rebuilt model. g, Helix modeled as unknown (residues 558–573 of chain R, red), surrounded by unmodeled density (3 σ mFo-DFc, green(+), red(–); +2 σ sharpened 2mFo-DFc, cyan surface; +1.5 σ unsharpened 2mFo-DFc difference map (Fo and Fc are the experimentally measured and model-based amplitudes, D is the Sigma-A weighting factor and m is the figure of merit), cyan wireframe; +5 σ anomalous difference map, purple surface and arrows). h, Final model, with anomalous difference blobs corresponding to selenomethionine residues 213 and 217 of chain Q and with the previously unmodeled density filled; this region was predicted with an average pLDDT of 88, and required only minor side chain corrections to fit the density.

References

    1. Burley SK, et al. RCSB Protein Data Bank: powerful new tools for exploring 3D structures of biological macromolecules for basic and applied research and education in fundamental biology, biomedicine, biotechnology, bioengineering and energy sciences. Nucleic Acids Res. 2021;49:D437–D451. - PMC - PubMed
    1. Thomas J, Ramakrishnan N, Bailey-Kellogg C. Graphical models of residue coupling in protein families. IEEE/ACM Trans. Comput. Biol. Bioinform. 2008;5:183–197. - PubMed
    1. Dunn SD, Wahl LM, Gloor GB. Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction. Bioinformatics. 2008;24:333–340. - PubMed
    1. Bartlett GJ, Taylor WR. Using scores derived from statistical coupling analysis to distinguish correct and incorrect folds in de-novo protein structure prediction. Proteins. 2008;71:950–959. - PubMed
    1. Wang S, Sun S, Li Z, Zhang R, Xu J. Accurate de novo prediction of protein contact map by ultra-deep learning model. PLoS Comput. Biol. 2017;13:e1005324. - PMC - PubMed

Publication types