Protein structure prediction using deep learning distance and hydrogen-bonding restraints in CASP14

doi:10.1002/prot.26193

. 2021 Dec;89(12):1734-1751.

doi: 10.1002/prot.26193. Epub 2021 Aug 7.

Protein structure prediction using deep learning distance and hydrogen-bonding restraints in CASP14

Wei Zheng¹, Yang Li^{1

2}, Chengxin Zhang¹, Xiaogen Zhou¹, Robin Pearce¹, Eric W Bell¹, Xiaoqiang Huang¹, Yang Zhang^{1

3}

Affiliations

¹ Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA.
² School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China.
³ Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, USA.

PMID: 34331351
PMCID: PMC8616857
DOI: 10.1002/prot.26193

Protein structure prediction using deep learning distance and hydrogen-bonding restraints in CASP14

Wei Zheng et al. Proteins. 2021 Dec.

. 2021 Dec;89(12):1734-1751.

doi: 10.1002/prot.26193. Epub 2021 Aug 7.

Authors

Wei Zheng¹, Yang Li^{1

2}, Chengxin Zhang¹, Xiaogen Zhou¹, Robin Pearce¹, Eric W Bell¹, Xiaoqiang Huang¹, Yang Zhang^{1

3}

Affiliations

¹ Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA.
² School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China.
³ Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, USA.

PMID: 34331351
PMCID: PMC8616857
DOI: 10.1002/prot.26193

Abstract

In this article, we report 3D structure prediction results by two of our best server groups ("Zhang-Server" and "QUARK") in CASP14. These two servers were built based on the D-I-TASSER and D-QUARK algorithms, which integrated four newly developed components into the classical protein folding pipelines, I-TASSER and QUARK, respectively. The new components include: (a) a new multiple sequence alignment (MSA) collection tool, DeepMSA2, which is extended from the DeepMSA program; (b) a contact-based domain boundary prediction algorithm, FUpred, to detect protein domain boundaries; (c) a residual convolutional neural network-based method, DeepPotential, to predict multiple spatial restraints by co-evolutionary features derived from the MSA; and (d) optimized spatial restraint energy potentials to guide the structure assembly simulations. For 37 FM targets, the average TM-scores of the first models produced by D-I-TASSER and D-QUARK were 96% and 112% higher than those constructed by I-TASSER and QUARK, respectively. The data analysis indicates noticeable improvements produced by each of the four new components, especially for the newly added spatial restraints from DeepPotential and the well-tuned force field that combines spatial restraints, threading templates, and generic knowledge-based potentials. However, challenges still exist in the current pipelines. These include difficulties in modeling multi-domain proteins due to low accuracy in inter-domain distance prediction and modeling protein domains from oligomer complexes, as the co-evolutionary analysis cannot distinguish inter-chain and intra-chain distances. Specifically tuning the deep learning-based predictors for multi-domain targets and protein complexes may be helpful to address these issues.

Keywords: CASP14; ab initio folding; deep learning; domain partition; multiple sequence alignment; protein structure prediction; residue-residue distance prediction.

PubMed Disclaimer

Figures

**Figure 1.**
An overview of the common procedures shared by the five automated pipelines of Zhang-Group servers in CASP14 (“Zhang-Server”,,“QUARK”,“Zhang-CEthreder”, “Zhang-TBM” and “Zhang_Ab_Initio”) on target classification, domain splitting and multi-domain structure assembly.

**Figure 2.**
(A) DeepMSA2 pipeline for multiple sequence alignment (MSA) generation, which contains four approaches, dMSA, qMSA, mMSA and MSA selection. (B) DeepPotential pipeline for generating spatial geometric restraints, which include contact maps, distances, orientations and hydrogen-bond networks. (C) The illustration of the hydrogen-bond definition used in D-I-TASSER and D-QUARK.

**Figure 3.**
(A) D-I-TASSER pipeline, which is an extension of I-TASSER and C-I-TASSER that integrates deep learning-based distance and hydrogen-bond networks with iterative threading assembly simulations. (B) D-QUARK pipeline, which is an extension of QUARK and C-QUARK that integrates deep learning-based distance and orientation predictions with replica-exchange Monte Carlo fragment assembly simulations.

**Figure 4.**
Head-to-head comparisons between (A) D-I-TASSER and I-TASSER, (B) D-QUARK and QUARK, (C) D-I-TASSER and C-I-TASSER, (D) D-QUARK and C-QUARK. C-I-TASSER, I-TASSER, C-QUARK, and QUARK were run using the same domain partitions and the same set of templates used by D-I-TASSER and D-QUARK during CASP14.

**Figure 5.**
(A) The relationship between the model quality of D-I-TASSER/D-QUARK and *MAE*_n, which represents the mean absolute error between distances derived from the experimental structures and predicted distances for the long-range top 5L distances (L is the length of the protein) from DeepPotential. (B) The relationship between the model quality of D-I-TASSER/D-QUARK and *MAE*_m, which is defined as the mean absolute error between the distances calculated from the model and predicted distances for the long-range top 5L distances from DeepPotential. (C) The experimental structure of T1094-D2. (D) The superposition between the experimental structure and the best template (PDB ID: 4bj1A) identified by LOMETS3 for T1094-D2. (E) The residue-residue distance map prediction for T1094-D2, where the predicted distance map is shown in the upper triangle matrix and the distance map derived from the experimental structure is shown in the lower triangle matrix. The D-I-TASSER and D-QUARK models (F), the C-I-TASSER and C-QUARK models (G) and the I-TASSER and QUARK models (H) of T1094-D2 superposed with the experimental structure. (I) The experimental structure of T1026-D1. (J) The D-I-TASSER and D-QUARK models of T1026-D1 superposed with the experimental structure. (K) The residue-residue distance map for T1026-D1, where the predicted distance map is shown in the upper triangle matrix and distance map calculated from the experimental structure is shown in the lower triangle matrix. (L) The superposition of the experimental structure and the high-quality templates identified by LOMETS3 for T1026-D1. (M) The D-I-TASSER and D-QUARK models for T1026-D1 after excluding good templates superposed with the experimental structure.

**Figure 6.**
(A) The head-to-head comparison of predicted distance errors (*MAE*_n) between MSAs from DeepMSA2 and DeepMSA. (B) The head-to-head comparison of the model quality generated by D-I-TASSER between MSAs from DeepMSA2 and DeepMSA. (C) The head-to-head comparison of the model quality generated by D-QUARK between MSAs from DeepMSA2 and DeepMSA. (D) 12 FM targets, where the TM-score differences of the D-I-TASSER (D-QUARK) models were over 0.05 when using different MSA pipelines.

**Figure 7.**
(A) The experimental structure and domain partition for T1094. (B) The illustration of predicted domain boundaries by FUpred based on the DeepPotential contact map. (C) The D-I-TASSER and D-QUARK models for two domains of T1094 superposed with the experimental structures. (D) The residue-residue distance map predicted from DeepPotential (upper triangle) and the distance map calculated from the experimental structure (lower triangle) for T1094. (E) The D-I-TASSER and D-QUARK full-length models of T1094 superposed with the experimental structures.

**Figure 8.**
(A) Three copies (named here as chain A, B and C) of the same monomer protein of T1070-D1 form a symmetric oligomer complex. (B) The D-I-TASSER model of T1070-D1 superposed with the experimental structure. (C) The local segments of β-strands S5 and S6 from the D-I-TASSER model and the T1070-D1 oligomer structure. (D) The predicted distance map by DeepPotential and the distance map calculated from the T1070-D1 oligomer complex. The bottom left and upper right matrices are two intra-chain distance maps for two T1070-D1 monomer copies, chains A and B, respectively, where the two upper triangle matrices are the predicted distance maps and the lower triangle matrices are derived from the experimental oligomer structures. The bottom right matrix is the inter-chain distance map formed by chains A and B which was calculated from the T1070-D1 oligomer complex. (E) The illustration of the intra-chain and inter-chain distances between residue 39 and residue 54 in the experimental structure and the D-I-TASSER model.

See this image and copyright information in PMC

Cited by

I-TASSER-MTD: a deep-learning-based platform for multi-domain protein structure and function prediction.
Zhou X, Zheng W, Li Y, Pearce R, Zhang C, Bell EW, Zhang G, Zhang Y. Zhou X, et al. Nat Protoc. 2022 Oct;17(10):2326-2353. doi: 10.1038/s41596-022-00728-0. Epub 2022 Aug 5. Nat Protoc. 2022. PMID: 35931779 Review.
Revolutionizing Medicinal Chemistry: The Application of Artificial Intelligence (AI) in Early Drug Discovery.
Han R, Yoon H, Kim G, Lee H, Lee Y. Han R, et al. Pharmaceuticals (Basel). 2023 Sep 6;16(9):1259. doi: 10.3390/ph16091259. Pharmaceuticals (Basel). 2023. PMID: 37765069 Free PMC article. Review.
Progressive assembly of multi-domain protein structures from cryo-EM density maps.
Zhou X, Li Y, Zhang C, Zheng W, Zhang G, Zhang Y. Zhou X, et al. Nat Comput Sci. 2022 Apr;2(4):265-275. doi: 10.1038/s43588-022-00232-1. Epub 2022 Apr 28. Nat Comput Sci. 2022. PMID: 35844960 Free PMC article.
Using multiple computer-predicted structures as molecular replacement models: application to the antiviral mini-protein LCB2.
Korban SA, Mikhailovskii O, Gurzhiy VV, Podkorytov IS, Skrynnikov NR. Korban SA, et al. IUCrJ. 2025 Jul 1;12(Pt 4):488-501. doi: 10.1107/S2052252525005123. IUCrJ. 2025. PMID: 40549150 Free PMC article.
Protein Function Analysis through Machine Learning.
Avery C, Patterson J, Grear T, Frater T, Jacobs DJ. Avery C, et al. Biomolecules. 2022 Sep 6;12(9):1246. doi: 10.3390/biom12091246. Biomolecules. 2022. PMID: 36139085 Free PMC article. Review.

See all "Cited by" articles

References

1. Kryshtafovych A, Schwede T, Topf M, Fidelis K, Moult J. Critical assessment of methods of protein structure prediction (CASP)—Round XIII. Proteins: Structure, Function, and Bioinformatics. 2019;87(12):1011–1020. - PMC - PubMed
1. Moult J, Fidelis K, Kryshtafovych A, Schwede T, Tramontano A. Critical assessment of methods of protein structure prediction (CASP)—Round XII. Proteins: Structure, Function, and Bioinformatics. 2018;86(S1):7–15. - PMC - PubMed
1. Moult J, Fidelis K, Kryshtafovych A, Schwede T, Tramontano A. Critical assessment of methods of protein structure prediction: Progress and new directions in round XI. Proteins: Structure, Function, and Bioinformatics. 2016;84(S1):4–14. - PMC - PubMed
1. Moult J, Fidelis K, Kryshtafovych A, Schwede T, Tramontano A. Critical assessment of methods of protein structure prediction (CASP) — round x. Proteins: Structure, Function, and Bioinformatics. 2014;82(S2):1–6. - PMC - PubMed
1. Roy A, Kucukural A, Zhang Y. I-TASSER: a unified platform for automated protein structure and function prediction. Nature Protocols. 2010;5(4):725–738. - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

T32 CA140044/CA/NCI NIH HHS/United States

LinkOut - more resources

Full Text Sources

[1] Kryshtafovych A, Schwede T, Topf M, Fidelis K, Moult J. Critical assessment of methods of protein structure prediction (CASP)—Round XIII. Proteins: Structure, Function, and Bioinformatics. 2019;87(12):1011–1020. - PMC - PubMed

[2] Kryshtafovych A, Schwede T, Topf M, Fidelis K, Moult J. Critical assessment of methods of protein structure prediction (CASP)—Round XIII. Proteins: Structure, Function, and Bioinformatics. 2019;87(12):1011–1020. - PMC - PubMed

[3] Moult J, Fidelis K, Kryshtafovych A, Schwede T, Tramontano A. Critical assessment of methods of protein structure prediction (CASP)—Round XII. Proteins: Structure, Function, and Bioinformatics. 2018;86(S1):7–15. - PMC - PubMed

[4] Moult J, Fidelis K, Kryshtafovych A, Schwede T, Tramontano A. Critical assessment of methods of protein structure prediction (CASP)—Round XII. Proteins: Structure, Function, and Bioinformatics. 2018;86(S1):7–15. - PMC - PubMed

[5] Moult J, Fidelis K, Kryshtafovych A, Schwede T, Tramontano A. Critical assessment of methods of protein structure prediction: Progress and new directions in round XI. Proteins: Structure, Function, and Bioinformatics. 2016;84(S1):4–14. - PMC - PubMed

[6] Moult J, Fidelis K, Kryshtafovych A, Schwede T, Tramontano A. Critical assessment of methods of protein structure prediction: Progress and new directions in round XI. Proteins: Structure, Function, and Bioinformatics. 2016;84(S1):4–14. - PMC - PubMed

[7] Moult J, Fidelis K, Kryshtafovych A, Schwede T, Tramontano A. Critical assessment of methods of protein structure prediction (CASP) — round x. Proteins: Structure, Function, and Bioinformatics. 2014;82(S2):1–6. - PMC - PubMed

[8] Moult J, Fidelis K, Kryshtafovych A, Schwede T, Tramontano A. Critical assessment of methods of protein structure prediction (CASP) — round x. Proteins: Structure, Function, and Bioinformatics. 2014;82(S2):1–6. - PMC - PubMed

[9] Roy A, Kucukural A, Zhang Y. I-TASSER: a unified platform for automated protein structure and function prediction. Nature Protocols. 2010;5(4):725–738. - PMC - PubMed

[10] Roy A, Kucukural A, Zhang Y. I-TASSER: a unified platform for automated protein structure and function prediction. Nature Protocols. 2010;5(4):725–738. - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Protein structure prediction using deep learning distance and hydrogen-bonding restraints in CASP14

Affiliations

Protein structure prediction using deep learning distance and hydrogen-bonding restraints in CASP14

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources