Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 May 29:27:2443-2449.
doi: 10.1016/j.csbj.2025.05.047. eCollection 2025.

Comprehensive assessment of AlphaFold's predictions of secondary structure and solvent accessibility at the amino acid-level in eukaryotic, bacterial and archaeal proteins

Affiliations

Comprehensive assessment of AlphaFold's predictions of secondary structure and solvent accessibility at the amino acid-level in eukaryotic, bacterial and archaeal proteins

Jing Yu et al. Comput Struct Biotechnol J. .

Abstract

Numerous sequence-based predictors of the amino acid (AA)-level solvent accessibility (SA) and secondary structure (SS) of proteins have been developed. We empirically investigated whether these two key characteristics of AA-level structure can be accurately predicted from putative structures generated by the popular AlphaFold2. We compared AlphaFold2's results against several representative SS and SA predictors on a large test dataset that covers five distinct taxonomic groups (animals, plants, fungi, bacteria, and archaea). We used a broad collection of metrics that evaluate predictions of the numeric and binary (buried vs. solvent exposed) SA and the 3-state SS at both AA- and SS-region levels. We found that AlphaFold2 generated very accurate results, with high average Q3 accuracy of 0.928 for the SS prediction and high Pearson Correlation Coefficient (PCC) of 0.815 between its putative and native SA values. AlphaFold2 significantly and consistently outperforms the considered predictors of SA and SS across the five taxonomic groups and both AA and region level evaluations. Moreover, we demonstrated that AlphaFold2 nearly perfectly reconstructs distributions of the sizes and numbers of the SS regions. We also showed that AlphaFold2 substantially improves over the SS and SA predictors when tested on a low sequence similarity test dataset, although its results and results of two other predictors suffer a modest drop in the quality of predicting SS regions. Altogether, our results suggest that AlphaFold2 makes very accurate predictions of SS and SA, which can be easily extracted from 200+ million pre-computed AF2's structure predictions in AlphaFoldDB.

Keywords: AlphaFold; Evaluation; Prediction; Protein structure; Secondary structure; Solvent accessibility.

PubMed Disclaimer

Conflict of interest statement

Authors declare no conflict of interests.

Figures

None
Graphical abstract
Fig. 1
Fig. 1
Assessment of the SA predictions at the AA level. The stick plots represent means and ranges of the assessment scores (MAE, PCC, MCC, and ACC) across different organisms (taxonomic groups). We showed results of the statistical significance analysis at the top of the plots where “* denote statistically significant differences with p-value < 0.05 when comparing a given predictor against AF2.
Fig. 2
Fig. 2
Assessment of the SS predictions at the AA level. The stick plots represent means and ranges of the assessment scores (Q3, Q2H, Q2C, Q2E, and Qerror) across different organisms (taxonomic groups). We showed results of the statistical significance analysis at the top of the plots where “*” denote statistically significant differences with p-value < 0.05 when comparing a given predictor against AF2.
Fig. 3
Fig. 3
Assessment of the SS predictions at the region level. The stick plots represent means and ranges of assessment scores (MAERs, SOVs, and KLs) across different organisms (taxonomic groups). We showed results of the statistical significance analysis at the top of the plots where “*” denote statistically significant differences with p-value < 0.05 when comparing a given predictor against AF2.

Similar articles

References

    1. Morris R., Black K.A., Stollar E.J. Uncovering protein function: from classification to complexes. Essays Biochem. 2022;66(3):255–285. - PMC - PubMed
    1. Staker B.L., Buchko G.W., Myler P.J. Recent contributions of structure-based drug design to the development of antibacterial compounds. Curr Opin Microbiol. 2015;27:133–138. - PMC - PubMed
    1. Wang J., et al. Exploring human diseases and biological mechanisms by protein structure prediction and modeling. Adv Exp Med Biol. 2016;939:39–61. - PMC - PubMed
    1. Pauling L., Corey R.B., Branson H.R. The structure of proteins; two hydrogen-bonded helical configurations of the polypeptide chain. Proc Natl Acad Sci USA. 1951;37(4):205–211. - PMC - PubMed
    1. Pauling L., Corey R.B. The pleated sheet, a new layer configuration of polypeptide chains. Proc Natl Acad Sci USA. 1951;37(5):251–256. - PMC - PubMed

LinkOut - more resources