Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Mar 28;38(7):1881-1887.
doi: 10.1093/bioinformatics/btab881.

Current structure predictors are not learning the physics of protein folding

Affiliations

Current structure predictors are not learning the physics of protein folding

Carlos Outeiral et al. Bioinformatics. .

Abstract

Summary: Motivation. Predicting the native state of a protein has long been considered a gateway problem for understanding protein folding. Recent advances in structural modeling driven by deep learning have achieved unprecedented success at predicting a protein's crystal structure, but it is not clear if these models are learning the physics of how proteins dynamically fold into their equilibrium structure or are just accurate knowledge-based predictors of the final state. Results. In this work, we compare the pathways generated by state-of-the-art protein structure prediction methods to experimental data about protein folding pathways. The methods considered were AlphaFold 2, RoseTTAFold, trRosetta, RaptorX, DMPfold, EVfold, SAINT2 and Rosetta. We find evidence that their simulated dynamics capture some information about the folding pathway, but their predictive ability is worse than a trivial classifier using sequence-agnostic features like chain length. The folding trajectories produced are also uncorrelated with experimental observables such as intermediate structures and the folding rate constant. These results suggest that recent advances in structure prediction do not yet provide an enhanced understanding of protein folding. Availability. The data underlying this article are available in GitHub at https://github.com/oxpig/structure-vs-folding/.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Proto col for the analysis of simulated folding pathways. (a) Trajectory generation process. Protein sequences are used to generate the necessary input features for a modified protein structure predictor using default processing scripts. The structure prediction software outputs detailed search trajectories, that are then summarized as the fraction of native contacts between pairs of secondary structure elements. (b) The trajectories are smoothed, and the positions of maximum change are identified via numerical differentiation. These peaks are subsequently clustered using KDE with a Gaussian kernel, allowing us to identify main phases of folding, and establishing whether the trajectory proceeds in one or more steps; and into the structural intermediates, which can be compared with HDX experiments
Fig. 2.
Fig. 2.
Correlation between the folding rate constant and folding events in simulated trajectories of the seven structure prediction methods considered, the length of the protein chain and the average contact order of the native structure. Every point represents the average over the maximum number of decoys possible (200 decoys for RoseTTAFold, trRosetta, RaptorX, DMPfold and EVfold; and 10 decoys for SAINT2 and Rosetta)
Fig. 3.
Fig. 3.
Average pairwise Jaccard similarity between multistate folding trajectories across all proteins in the dataset, for the seven structure prediction programs. Most methods exhibit significant variability between independent trajectories

References

    1. Adams P.D. et al. (2010) Phenix: a comprehensive python-based system for macromolecular structure solution. Acta Crystallogr. Sect. D Biol. Crystallogr., 66, 213–221. - PMC - PubMed
    1. Alford R.F. et al. (2017) The Rosetta all-atom energy function for macromolecular modeling and design. J. Chem. Theory Comput., 13, 3031–3048. - PMC - PubMed
    1. Baek M. et al. (2021) Accurate prediction of protein structures and interactions using a three-track network. Science, 373, 6557, 871–876. - PMC - PubMed
    1. Berman H.M. et al. (2000) The protein data bank. Nucleic Acids Res., 28, 235–242. - PMC - PubMed
    1. Best R.B. et al. (2013) Native contacts determine protein folding mechanisms in atomistic simulations. Proc. Natl. Acad. Sci. USA, 110, 17874–17879. - PMC - PubMed

Publication types