Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 May 5;12(1):7349.
doi: 10.1038/s41598-022-10607-z.

A database of calculated solution parameters for the AlphaFold predicted protein structures

Affiliations

A database of calculated solution parameters for the AlphaFold predicted protein structures

Emre Brookes et al. Sci Rep. .

Abstract

Recent spectacular advances by AI programs in 3D structure predictions from protein sequences have revolutionized the field in terms of accuracy and speed. The resulting "folding frenzy" has already produced predicted protein structure databases for the entire human and other organisms' proteomes. However, rapidly ascertaining a predicted structure's reliability based on measured properties in solution should be considered. Shape-sensitive hydrodynamic parameters such as the diffusion and sedimentation coefficients ([Formula: see text], [Formula: see text]) and the intrinsic viscosity ([η]) can provide a rapid assessment of the overall structure likeliness, and SAXS would yield the structure-related pair-wise distance distribution function p(r) vs. r. Using the extensively validated UltraScan SOlution MOdeler (US-SOMO) suite, a database was implemented calculating from AlphaFold structures the corresponding [Formula: see text], [Formula: see text], [η], p(r) vs. r, and other parameters. Circular dichroism spectra were computed using the SESCA program. Some of AlphaFold's drawbacks were mitigated, such as generating whenever possible a protein's mature form. Others, like the AlphaFold direct applicability to single-chain structures only, the absence of prosthetic groups, or flexibility issues, are discussed. Overall, this implementation of the US-SOMO-AF database should already aid in rapidly evaluating the consistency in solution of a relevant portion of AlphaFold predicted protein structures.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
Screenshots of the US-SOMO-AF webpage. Shown are the results for AF-P01029-F1 that includes the removal of the signal sequence and two propeptides. (a) The upper part containing text/data information. (b) The bottom part showing the computed p(r) vs. r distribution and CD spectrum graphs, and the JSmol representation of the structure.
Figure 2
Figure 2
Plots of selected calculated parameters for 41,200 AF-v1 predicted structures with no corresponding entries in the solved structures PDB database. (a) Rs vs. M, log–log scale. (b) [η] vs. M, log–log scale. (c) [η] vs. % decreasing mean confidence level, log-lin scale. (d) A 3D plot where M (log scale) is on the vertical Z-axis, and Rs and [η] are on the horizontal X- and Y-axes, respectively (both linear scales).
Figure 3
Figure 3
JSmol snapshots of the structures for the entries reported in Table, together with the calculated p(r) vs. r and CD plots.
Figure 4
Figure 4
P(r) vs. r curves SAXS-derived and calculated from AF and RCSB PDB structures. (af) Protein source and names, SASBDB, AF (UniProt) and RCSB PDB accession numbers for each entry are indicated in the boxes within each panel. In all panels the experimentally-derived and the AF-calculated p(r) vs. r are black and red lines, respectively. Additional SAXS-derived and AF-calculated p(r) vs. r present in (c,f) are blue and magenta lines, respectively. Additional PDB-calculated p(r) vs. r (green lines) are present in (c,d).
Figure 5
Figure 5
Calculated p(r) vs. r distributions for the 100 conformations generated in the DMD run on the AF-predicted O88338 structure.
Figure 6
Figure 6
Histograms of the calculated parameters for the MMC-generated conformations of three AF-predicted structures from Table 2. Shown are the distributions of Rg/Rs (a,c,e) and of [η] (b,d,f) calculated for AF-Q4DE01 (16,520 conformations, (a,b)), AF-A0A060D4L2 (16,666 conformations, (c,d)), and AF-Q8IJG3 (16,367 conformations, (e,f)). In each panel, the vertical green lines mark the location of the starting structure parameters, while the vertical solid and dashed red lines indicate the average ± SD over all conformations (the actual values are reported in each panel’s inside legend).

Similar articles

Cited by

References

    1. Anfinsen CB. Principles that govern the folding of protein chains. Science. 1973;181:223–230. doi: 10.1126/science.181.4096.223. - DOI - PubMed
    1. Jumper J, et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021;596:583–589. doi: 10.1038/s41586-021-03819-2. - DOI - PMC - PubMed
    1. Baek M, et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science. 2021;373:871–876. doi: 10.1126/science.abj8754. - DOI - PMC - PubMed
    1. Tunyasuvunakool K, et al. Highly accurate protein structure prediction for the human proteome. Nature. 2021;596:590–596. doi: 10.1038/s41586-021-03828-1. - DOI - PMC - PubMed
    1. The UniProt Consortium UniProt: The universal protein knowledgebase in 2021. Nucleic Acids Res. 2021;49:D480–D489. doi: 10.1093/nar/gkaa1100. - DOI - PMC - PubMed