Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Dec 1;39(12):btad712.
doi: 10.1093/bioinformatics/btad712.

DeepFold: enhancing protein structure prediction through optimized loss functions, improved template features, and re-optimized energy function

Affiliations

DeepFold: enhancing protein structure prediction through optimized loss functions, improved template features, and re-optimized energy function

Jae-Won Lee et al. Bioinformatics. .

Erratum in

Abstract

Motivation: Predicting protein structures with high accuracy is a critical challenge for the broad community of life sciences and industry. Despite progress made by deep neural networks like AlphaFold2, there is a need for further improvements in the quality of detailed structures, such as side-chains, along with protein backbone structures.

Results: Building upon the successes of AlphaFold2, the modifications we made include changing the losses of side-chain torsion angles and frame aligned point error, adding loss functions for side chain confidence and secondary structure prediction, and replacing template feature generation with a new alignment method based on conditional random fields. We also performed re-optimization by conformational space annealing using a molecular mechanics energy function which integrates the potential energies obtained from distogram and side-chain prediction. In the CASP15 blind test for single protein and domain modeling (109 domains), DeepFold ranked fourth among 132 groups with improvements in the details of the structure in terms of backbone, side-chain, and Molprobity. In terms of protein backbone accuracy, DeepFold achieved a median GDT-TS score of 88.64 compared with 85.88 of AlphaFold2. For TBM-easy/hard targets, DeepFold ranked at the top based on Z-scores for GDT-TS. This shows its practical value to the structural biology community, which demands highly accurate structures. In addition, a thorough analysis of 55 domains from 39 targets with publicly available structures indicates that DeepFold shows superior side-chain accuracy and Molprobity scores among the top-performing groups.

Availability and implementation: DeepFold tools are open-source software available at https://github.com/newtonjoo/deepfold.

PubMed Disclaimer

Conflict of interest statement

None declared.

Figures

Figure 1.
Figure 1.
Prediction flow diagram of DeepFold. Given an amino acid sequence, the method first generates input features using MSAs and templates, where the MSAs are obtained from HHBlits, JackHMMER, and HHpred, and the templates/alignments are generated by CRFalign. Protein 3D structures are predicted by DeepFold network and then final structures are re-optimized by conformational space annealing (CSA). (See the main text for details.)
Figure 2.
Figure 2.
CASP15 rankings in terms of sum Z of (a) the assessor’s formula [Equation (12)] and (b) GDT-TS for TBM easy/hard targets (62 domains in total). In each figure, only top 50 out of 132 teams are shown.
Figure 3.
Figure 3.
Comparison between the accuracies of DFolding (y-axis) versus DFolding-server (without CSA re-optimization, x-axis). The CASP15 official results of GDT-TS, LDDT, SC error, and Molprobity are used. Higher values are better for GDT-TS, LDDT, while lower values are better for SCerror and Molprobity. All numbers within the figures are average values over 109 domains.
Figure 4.
Figure 4.
Comparison between the accuracies of DFolding (y-axis) and DFolding-server (x-axis) for publicly available 55 domains. All numbers within the figures are average values over 55 domains. Lower value is better for Molprobity.
Figure 5.
Figure 5.
Comparison of the templates of the 55 domains in CASP15. (a) TM-score comparison is shown between the templates found by DeepFold (through CRFalign) versus the ones found by AF2. Each point represents the template with the highest TM-score obtained by the respective method for a given target. Orange colors are the domains with a lower TM-score for CRFalign than the AF2 method, and cyan colors indicate the domains for which the same templates are found by both methods. Red color is the target T1123. (b) A comparison of the alignment qualities of CRFalign and AF2 is shown. All numbers within the figures are average values over 55 domains. The dotted line is indicating a difference > 0.15, and as a result, we had significantly better template information for 18 domains.
Figure 6.
Figure 6.
The target T1123 in CASP15 for which DeepFold predictions showed the best GDT-TS scores. (a) DeepFold human and server models ranked at the top three in a row. (b) The predicted 3D structures of T1123 and the native structure (colored as green). Compared to AF2, the DeepFold prediction correctly predicts helix and β-sheets near the C-terminal region, highlighted in blue. (c) The template 5EWO obtained by CRFalign shows a high structural similarity to the native structures (TM = 0.69) compared with the template in AF2 (TM = 0.31).

References

    1. Adhikari B, Hou J, Cheng J. et al. Protein contact prediction by integrating deep multiple sequence alignments, coevolution and machine learning. Proteins Struct Funct Bioinform 2017;86:84–96. - PMC - PubMed
    1. Ahdritz G, Bouatta N, Kadyan S. et al. Openfold: retraining alphafold2 yields new insights into its learning mechanisms and capacity for generalization. bioRxiv, 10.1101/2022.11.20.517210, 2022, preprint: not peer reviewed. - DOI - PMC - PubMed
    1. Altschul SF, Madden TL, Schäffer AA. et al. Gapped blast and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997;25:3389–402. - PMC - PubMed
    1. Anfinsen CB. Principles that govern the folding of protein chains. Science 1973;181:223–30. - PubMed
    1. Baek M, DiMaio F, Anishchenko I. et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science 2021;373:871–6. - PMC - PubMed

Publication types