Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Dec;89(12):1734-1751.
doi: 10.1002/prot.26193. Epub 2021 Aug 7.

Protein structure prediction using deep learning distance and hydrogen-bonding restraints in CASP14

Affiliations

Protein structure prediction using deep learning distance and hydrogen-bonding restraints in CASP14

Wei Zheng et al. Proteins. 2021 Dec.

Abstract

In this article, we report 3D structure prediction results by two of our best server groups ("Zhang-Server" and "QUARK") in CASP14. These two servers were built based on the D-I-TASSER and D-QUARK algorithms, which integrated four newly developed components into the classical protein folding pipelines, I-TASSER and QUARK, respectively. The new components include: (a) a new multiple sequence alignment (MSA) collection tool, DeepMSA2, which is extended from the DeepMSA program; (b) a contact-based domain boundary prediction algorithm, FUpred, to detect protein domain boundaries; (c) a residual convolutional neural network-based method, DeepPotential, to predict multiple spatial restraints by co-evolutionary features derived from the MSA; and (d) optimized spatial restraint energy potentials to guide the structure assembly simulations. For 37 FM targets, the average TM-scores of the first models produced by D-I-TASSER and D-QUARK were 96% and 112% higher than those constructed by I-TASSER and QUARK, respectively. The data analysis indicates noticeable improvements produced by each of the four new components, especially for the newly added spatial restraints from DeepPotential and the well-tuned force field that combines spatial restraints, threading templates, and generic knowledge-based potentials. However, challenges still exist in the current pipelines. These include difficulties in modeling multi-domain proteins due to low accuracy in inter-domain distance prediction and modeling protein domains from oligomer complexes, as the co-evolutionary analysis cannot distinguish inter-chain and intra-chain distances. Specifically tuning the deep learning-based predictors for multi-domain targets and protein complexes may be helpful to address these issues.

Keywords: CASP14; ab initio folding; deep learning; domain partition; multiple sequence alignment; protein structure prediction; residue-residue distance prediction.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
An overview of the common procedures shared by the five automated pipelines of Zhang-Group servers in CASP14 (“Zhang-Server”,,“QUARK”,“Zhang-CEthreder”, “Zhang-TBM” and “Zhang_Ab_Initio”) on target classification, domain splitting and multi-domain structure assembly.
Figure 2.
Figure 2.
(A) DeepMSA2 pipeline for multiple sequence alignment (MSA) generation, which contains four approaches, dMSA, qMSA, mMSA and MSA selection. (B) DeepPotential pipeline for generating spatial geometric restraints, which include contact maps, distances, orientations and hydrogen-bond networks. (C) The illustration of the hydrogen-bond definition used in D-I-TASSER and D-QUARK.
Figure 3.
Figure 3.
(A) D-I-TASSER pipeline, which is an extension of I-TASSER and C-I-TASSER that integrates deep learning-based distance and hydrogen-bond networks with iterative threading assembly simulations. (B) D-QUARK pipeline, which is an extension of QUARK and C-QUARK that integrates deep learning-based distance and orientation predictions with replica-exchange Monte Carlo fragment assembly simulations.
Figure 4.
Figure 4.
Head-to-head comparisons between (A) D-I-TASSER and I-TASSER, (B) D-QUARK and QUARK, (C) D-I-TASSER and C-I-TASSER, (D) D-QUARK and C-QUARK. C-I-TASSER, I-TASSER, C-QUARK, and QUARK were run using the same domain partitions and the same set of templates used by D-I-TASSER and D-QUARK during CASP14.
Figure 5.
Figure 5.
(A) The relationship between the model quality of D-I-TASSER/D-QUARK and MAEn, which represents the mean absolute error between distances derived from the experimental structures and predicted distances for the long-range top 5L distances (L is the length of the protein) from DeepPotential. (B) The relationship between the model quality of D-I-TASSER/D-QUARK and MAEm, which is defined as the mean absolute error between the distances calculated from the model and predicted distances for the long-range top 5L distances from DeepPotential. (C) The experimental structure of T1094-D2. (D) The superposition between the experimental structure and the best template (PDB ID: 4bj1A) identified by LOMETS3 for T1094-D2. (E) The residue-residue distance map prediction for T1094-D2, where the predicted distance map is shown in the upper triangle matrix and the distance map derived from the experimental structure is shown in the lower triangle matrix. The D-I-TASSER and D-QUARK models (F), the C-I-TASSER and C-QUARK models (G) and the I-TASSER and QUARK models (H) of T1094-D2 superposed with the experimental structure. (I) The experimental structure of T1026-D1. (J) The D-I-TASSER and D-QUARK models of T1026-D1 superposed with the experimental structure. (K) The residue-residue distance map for T1026-D1, where the predicted distance map is shown in the upper triangle matrix and distance map calculated from the experimental structure is shown in the lower triangle matrix. (L) The superposition of the experimental structure and the high-quality templates identified by LOMETS3 for T1026-D1. (M) The D-I-TASSER and D-QUARK models for T1026-D1 after excluding good templates superposed with the experimental structure.
Figure 6.
Figure 6.
(A) The head-to-head comparison of predicted distance errors (MAEn) between MSAs from DeepMSA2 and DeepMSA. (B) The head-to-head comparison of the model quality generated by D-I-TASSER between MSAs from DeepMSA2 and DeepMSA. (C) The head-to-head comparison of the model quality generated by D-QUARK between MSAs from DeepMSA2 and DeepMSA. (D) 12 FM targets, where the TM-score differences of the D-I-TASSER (D-QUARK) models were over 0.05 when using different MSA pipelines.
Figure 7.
Figure 7.
(A) The experimental structure and domain partition for T1094. (B) The illustration of predicted domain boundaries by FUpred based on the DeepPotential contact map. (C) The D-I-TASSER and D-QUARK models for two domains of T1094 superposed with the experimental structures. (D) The residue-residue distance map predicted from DeepPotential (upper triangle) and the distance map calculated from the experimental structure (lower triangle) for T1094. (E) The D-I-TASSER and D-QUARK full-length models of T1094 superposed with the experimental structures.
Figure 8.
Figure 8.
(A) Three copies (named here as chain A, B and C) of the same monomer protein of T1070-D1 form a symmetric oligomer complex. (B) The D-I-TASSER model of T1070-D1 superposed with the experimental structure. (C) The local segments of β-strands S5 and S6 from the D-I-TASSER model and the T1070-D1 oligomer structure. (D) The predicted distance map by DeepPotential and the distance map calculated from the T1070-D1 oligomer complex. The bottom left and upper right matrices are two intra-chain distance maps for two T1070-D1 monomer copies, chains A and B, respectively, where the two upper triangle matrices are the predicted distance maps and the lower triangle matrices are derived from the experimental oligomer structures. The bottom right matrix is the inter-chain distance map formed by chains A and B which was calculated from the T1070-D1 oligomer complex. (E) The illustration of the intra-chain and inter-chain distances between residue 39 and residue 54 in the experimental structure and the D-I-TASSER model.

Similar articles

Cited by

References

    1. Kryshtafovych A, Schwede T, Topf M, Fidelis K, Moult J. Critical assessment of methods of protein structure prediction (CASP)—Round XIII. Proteins: Structure, Function, and Bioinformatics. 2019;87(12):1011–1020. - PMC - PubMed
    1. Moult J, Fidelis K, Kryshtafovych A, Schwede T, Tramontano A. Critical assessment of methods of protein structure prediction (CASP)—Round XII. Proteins: Structure, Function, and Bioinformatics. 2018;86(S1):7–15. - PMC - PubMed
    1. Moult J, Fidelis K, Kryshtafovych A, Schwede T, Tramontano A. Critical assessment of methods of protein structure prediction: Progress and new directions in round XI. Proteins: Structure, Function, and Bioinformatics. 2016;84(S1):4–14. - PMC - PubMed
    1. Moult J, Fidelis K, Kryshtafovych A, Schwede T, Tramontano A. Critical assessment of methods of protein structure prediction (CASP) — round x. Proteins: Structure, Function, and Bioinformatics. 2014;82(S2):1–6. - PMC - PubMed
    1. Roy A, Kucukural A, Zhang Y. I-TASSER: a unified platform for automated protein structure and function prediction. Nature Protocols. 2010;5(4):725–738. - PMC - PubMed

Publication types