Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Jan 21;117(3):1496-1503.
doi: 10.1073/pnas.1914677117. Epub 2020 Jan 2.

Improved protein structure prediction using predicted interresidue orientations

Affiliations

Improved protein structure prediction using predicted interresidue orientations

Jianyi Yang et al. Proc Natl Acad Sci U S A. .

Abstract

The prediction of interresidue contacts and distances from coevolutionary data using deep learning has considerably advanced protein structure prediction. Here, we build on these advances by developing a deep residual network for predicting interresidue orientations, in addition to distances, and a Rosetta-constrained energy-minimization protocol for rapidly and accurately generating structure models guided by these restraints. In benchmark tests on 13th Community-Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction (CASP13)- and Continuous Automated Model Evaluation (CAMEO)-derived sets, the method outperforms all previously described structure-prediction methods. Although trained entirely on native proteins, the network consistently assigns higher probability to de novo-designed proteins, identifying the key fold-determining residues and providing an independent quantitative measure of the "ideality" of a protein structure. The method promises to be useful for a broad range of protein structure prediction and design problems.

Keywords: deep learning; protein contact prediction; protein structure prediction.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interest.

Figures

Fig. 1.
Fig. 1.
Predicting interresidue geometries and protein 3D structure from a multiple sequence alignment. (A) Representation of the rigid-body transform from one residue to another using angles and distances. (B) Architecture of the deep neural network with multiobjective training to predict interresidue geometries from an MSA. (C) Outline of the structure-modeling protocol based on the restraints derived from the predicted distance and orientation (see Methods for details).
Fig. 2.
Fig. 2.
Accuracy of predicted interresidue geometries. (A) Contribution of different factors to the increase in trRosetta performance on CASP13’s free modeling and CAMEO’s very hard targets. Incorporation of MSA subsampling, orientations, and MSA selection in the modeling pipeline increases precision of the top L long-range predicted contacts by 1.7% (red bar), 2.2% (yellow), and 3.1% (green), respectively, and increasing the depth of the network from 36 to 61 residual blocks boosts the performance by an additional 0.6% (orange bar). (B) Correlation between predicted probability of the top L long- + medium-range contacts and their actual precision measured based on the native structures. (C) Distribution of predicted probabilities for residue pairs to be within 20 Å in the native structure; populations in blue and red correspond to residue pairs with d ≤ 20 Å and d > 20 Å in experimental structures, respectively. Confident predictions are clustered at probability values P(d < 20 Å) > 92.5%; probabilities for unreliable background predictions are predominantly <15%. (D) Correlations between actual rigid-body transform parameters from the experimental structures with the modes of the predicted distributions for the most reliable long- and medium-range contacts from the top 7.5% percentile; color coding indicates probability density.
Fig. 3.
Fig. 3.
Comparison of model accuracy. (A) Average TM-score of all methods on the 31 FM targets of CASP13. The colored stacked bar indicates the contributions of different components to our method. A7D was the top human group in CASP 13; Zhang-Server and RaptorX were the top 2 server groups. (B) Head-to-head comparison between our method and the A7D’s TM-scores over the 31 FM targets (blue points; red points are for 6 targets with extensive refinement). (C) Structures for the CASP13 target T0950; the native structure and the predicted model are shown in gray and rainbow cartoons, respectively. (D) Comparison between our method and the top servers from the CAMEO experiments. (E) Native structure (in gray) and the predicted model (in rainbow) for CAMEO target 5WB4_H. In all of these comparisons, it should be emphasized that the CASP and CAMEO predictions, unlike ours, were made blindly.
Fig. 4.
Fig. 4.
trRosetta accurately predicts structures of de novo-designed proteins and captures effects of mutations. Differences in the accuracy of predicted contacts (A) and trRosetta models (B) for de novo-designed (blue) and naturally occurring (orange) proteins of similar size from single amino acid sequences. (CE) Examples of trRosetta models for de novo designs of various topology: β-barrel, PDB ID 6D0T (C); α-helical IL2-mimetic, PDB ID 6DG6 (D); and Foldit design with α/β topology, PDB ID 6MRS (E). Experimental structures are in gray, and models are in rainbow. Frames show experimental structures color-coded by estimated tolerance to single-site mutations (red, less tolerant; blue, more tolerant); the 8 residues least tolerant to mutation are in stick representation, and glycine residues are indicated by arrows. Heat maps on the right show the change in probability of the designed fold for substitutions of the same residue type (indicated at top) at different sequence positions (indicated at bottom).

Comment in

  • Deep learning 3D structures.
    Singh A. Singh A. Nat Methods. 2020 Mar;17(3):249. doi: 10.1038/s41592-020-0779-y. Nat Methods. 2020. PMID: 32132733 No abstract available.

References

    1. Abriata L. A., Tamò G. E., Dal Peraro M., A further leap of improvement in tertiary structure prediction in CASP13 prompts new routes for future assessments. Proteins 87, 1100–1112 (2019). - PubMed
    1. Berman H. M., et al. , The protein data bank. Nucleic Acids Res. 28, 235–242 (2000). - PMC - PubMed
    1. Kandathil S. M., Greener J. G., Jones D. T., Prediction of interresidue contacts with DeepMetaPSICOV in CASP13. Proteins 87, 1092–1099 (2019). - PMC - PubMed
    1. Xu J., Distance-based protein folding powered by deep learning. Proc. Natl. Acad. Sci. U.S.A. 116, 16856–16865 (2019). - PMC - PubMed
    1. Hou J., Wu T., Cao R., Cheng J., Protein tertiary structure modeling driven by deep learning and contact distance prediction in CASP13. Proteins 87, 1165–1178 (2019). - PMC - PubMed

Publication types