Improving protein tertiary structure prediction by deep learning and distance prediction in CASP14

doi:10.1002/prot.26186

. 2022 Jan;90(1):58-72.

doi: 10.1002/prot.26186. Epub 2021 Jul 27.

Improving protein tertiary structure prediction by deep learning and distance prediction in CASP14

Jian Liu¹, Tianqi Wu¹, Zhiye Guo¹, Jie Hou², Jianlin Cheng¹

Affiliations

¹ Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri, USA.
² Department of Computer Science, Saint Louis University, St. Louis, Missouri, USA.

PMID: 34291486
PMCID: PMC8671168
DOI: 10.1002/prot.26186

Improving protein tertiary structure prediction by deep learning and distance prediction in CASP14

Jian Liu et al. Proteins. 2022 Jan.

. 2022 Jan;90(1):58-72.

doi: 10.1002/prot.26186. Epub 2021 Jul 27.

Authors

Jian Liu¹, Tianqi Wu¹, Zhiye Guo¹, Jie Hou², Jianlin Cheng¹

Affiliations

¹ Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri, USA.
² Department of Computer Science, Saint Louis University, St. Louis, Missouri, USA.

PMID: 34291486
PMCID: PMC8671168
DOI: 10.1002/prot.26186

Abstract

Substantial progresses in protein structure prediction have been made by utilizing deep-learning and residue-residue distance prediction since CASP13. Inspired by the advances, we improve our CASP14 MULTICOM protein structure prediction system by incorporating three new components: (a) a new deep learning-based protein inter-residue distance predictor to improve template-free (ab initio) tertiary structure prediction, (b) an enhanced template-based tertiary structure prediction method, and (c) distance-based model quality assessment methods empowered by deep learning. In the 2020 CASP14 experiment, MULTICOM predictor was ranked seventh out of 146 predictors in tertiary structure prediction and ranked third out of 136 predictors in inter-domain structure prediction. The results demonstrate that the template-free modeling based on deep learning and residue-residue distance prediction can predict the correct topology for almost all template-based modeling targets and a majority of hard targets (template-free targets or targets whose templates cannot be recognized), which is a significant improvement over the CASP13 MULTICOM predictor. Moreover, the template-free modeling performs better than the template-based modeling on not only hard targets but also the targets that have homologous templates. The performance of the template-free modeling largely depends on the accuracy of distance prediction closely related to the quality of multiple sequence alignments. The structural model quality assessment works well on targets for which enough good models can be predicted, but it may perform poorly when only a few good models are predicted for a hard target and the distribution of model quality scores is highly skewed. MULTICOM is available at https://github.com/jianlin-cheng/MULTICOM_Human_CASP14/tree/CASP14_DeepRank3 and https://github.com/multicom-toolbox/multicom/tree/multicom_v2.0.

Keywords: inter-residue distance prediction; protein quality assessment; protein structure prediction.

PubMed Disclaimer

Figures

**FIGURE 1**
The pipeline of MULTICOM human and server protein structure predictors [Color figure can be viewed at wileyonlinelibrary.com]

**FIGURE 2**
The average loss of 40 QA methods and features in MULTICOM. (A) the loss on 61 “all groups” full‐length targets. (B) the loss on 30 TBM‐easy or TBM‐hard full‐length targets. (C) the loss on 31 FM/TBM or FM full‐length targets. Red: three DeepRank methods including DeepRank, DeepRank_con, DeepRank3_Cluster; Green: three Multi‐model methods including APOLLO, Pcons, and ModFOLDcluster2; Blue: 17 single‐model methods including (i.e., SBROD, RWplus, Voronota, Dope, OPUS_PSP, RF_CB_SRS_OD, DeepQA, ProQ2, ProQ3 ⁴¹ ); Pink: six contact matching scores including DeepDist/DNCON2 short‐range, medium‐range and long‐range contact matching scores; Yellow: 11 distance scores including SSIM and PSNR, GIST, RMSE, Recall, Precision, PHASH, Pearson correlation, and ORB [Color figure can be viewed at wileyonlinelibrary.com]

**FIGURE 3**
Evaluation of four MULTICOM server predictors in terms of the TM‐scores for the first submitted models. (A) On 92 “all group” +4 “server only” domains (left: TM‐scores of MULTICOM‐DEEP, MULTICOM‐HYBRID, MULTICOM‐CONSTRUCT models versus TM‐scores of MULTICOM‐CLUSTER models; right plot: mean and variation of the TM‐scores of the models of the four methods). (B) On 58 template‐based (TBM‐easy, TBM‐hard) domains. (C) On 38 FM or TBM/FM domains [Color figure can be viewed at wileyonlinelibrary.com]

**FIGURE 4**
Predicted structures and distance maps compared with native structures and true distance maps for 20 FM or FM/TBM domains for which the first model predicted by MULTICOM‐DEEP has the correct topology (TM‐score > 0.5). For each domain, on the left is the comparison of the distance maps (lower triangle: true distance map; upper triangle: predicted distance map); and on the right is the comparison of predicted and true structures (light yellow: native structure, light blue: the first predicted structure). The TM‐score of the predicted structure and the precision of top L/2 long‐range contact predictions for each domain is listed on top of each sub‐figure [Color figure can be viewed at wileyonlinelibrary.com]

**FIGURE 5**
(A) Logarithm of Neff of MSAs versus the quality of MULTICOM‐DIST top‐1 models on the 38 CASP14 FM or FM/TBM domains. (B) The precision of top L/2 long‐range contact predictions versus the quality of MULTICOM‐DIST top‐1 models on the 38 FM or FM/TBM domains [Color figure can be viewed at wileyonlinelibrary.com]

**FIGURE 6**
(A) The plot of the number of non‐gap residues of multiple sequence alignment of T1036s1 against residue positions, where x‐axis stands for each residue position and y‐axis stands for the number of non‐gap amino acids. (B) The true distance map of T1036s1‐D1 (lower triangle) versus the predicted distance map from MULTICOM‐DIST (upper triangle). (C) The true structure of target T1036s1‐D1 in rainbow, starting from the N‐terminal in blue to C‐terminal in red [Color figure can be viewed at wileyonlinelibrary.com]

**FIGURE 7**
The GDT‐TS scores of original models versus the GDT‐TS scores of combined models (MULTICOM_TS1) [Color figure can be viewed at wileyonlinelibrary.com]

**FIGURE 8**
Good examples for model combination on targets T1034, T1046s1, and T1065s2 (light yellow: native structure, light blue: MULTICOM_TS1 [final combined model], pink: original model) [Color figure can be viewed at wileyonlinelibrary.com]

**FIGURE 9**
(A) The percentage of good‐quality models (TM‐score > 0.5) versus GDT‐TS loss of DeepRank. (B) The distribution of TM‐scores of the models of T1031‐D1 (green), T1039‐D1 (red), and T1043‐D1 (blue); dots on the curves denote the top model selected for the targets. (C) The skewness of TM‐scores of the models versus GDT‐TS losses of DeepRank for all 61 targets [Color figure can be viewed at wileyonlinelibrary.com]

See this image and copyright information in PMC

Cited by

Deep Learning for Genomics: From Early Neural Nets to Modern Large Language Models.
Yue T, Wang Y, Zhang L, Gu C, Xue H, Wang W, Lyu Q, Dun Y. Yue T, et al. Int J Mol Sci. 2023 Nov 1;24(21):15858. doi: 10.3390/ijms242115858. Int J Mol Sci. 2023. PMID: 37958843 Free PMC article. Review.
Hierarchical deep learning for predicting GO annotations by integrating protein knowledge.
Merino GA, Saidi R, Milone DH, Stegmayer G, Martin MJ. Merino GA, et al. Bioinformatics. 2022 Sep 30;38(19):4488-4496. doi: 10.1093/bioinformatics/btac536. Bioinformatics. 2022. PMID: 35929781 Free PMC article.
Distance-based reconstruction of protein quaternary structures from inter-chain contacts.
Soltanikazemi E, Quadir F, Roy RS, Guo Z, Cheng J. Soltanikazemi E, et al. Proteins. 2022 Mar;90(3):720-731. doi: 10.1002/prot.26269. Epub 2021 Nov 2. Proteins. 2022. PMID: 34716620 Free PMC article.
Homology Modelling, Molecular Docking and Molecular Dynamics Simulation Studies of CALMH1 against Secondary Metabolites of Bauhinia variegata to Treat Alzheimer's Disease.
Khare N, Maheshwari SK, Rizvi SMD, Albadrani HM, Alsagaby SA, Alturaiki W, Iqbal D, Zia Q, Villa C, Jha SK, Jha NK, Jha AK. Khare N, et al. Brain Sci. 2022 Jun 12;12(6):770. doi: 10.3390/brainsci12060770. Brain Sci. 2022. PMID: 35741655 Free PMC article.
Improving AlphaFold2-based protein tertiary structure prediction with MULTICOM in CASP15.
Liu J, Guo Z, Wu T, Roy RS, Chen C, Cheng J. Liu J, et al. Commun Chem. 2023 Sep 7;6(1):188. doi: 10.1038/s42004-023-00991-6. Commun Chem. 2023. PMID: 37679431 Free PMC article.

See all "Cited by" articles

References

1. Moult J, Fidelis K, Kryshtafovych A, et al. Critical assessment of methods of protein structure prediction (CASP)—round XII. Proteins: Struct Funct Bioinform. 2018;86:7‐15. - PMC - PubMed
1. Källberg M, Wang H, Wang S, et al. Template‐based protein structure modeling using the RaptorX web server. Nat Protoc. 2012;7:1511‐1522. - PMC - PubMed
1. Li J, Cheng J. A stochastic point cloud sampling method for multi‐template protein comparative modeling. Sci Rep. 2016;6:25687. - PMC - PubMed
1. Remmert M, Biegert A, Hauser A, et al. HHblits: lightning‐fast iterative protein sequence searching by HMM‐HMM alignment. Nat Methods. 2012;9:173‐175. - PubMed
1. Rohl CA, Strauss CE, Misura KM, et al. Protein structure prediction using Rosetta. Methods Enzymol. 2004;383:66‐93. - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

R01 GM093123/GM/NIGMS NIH HHS/United States

LinkOut - more resources

Full Text Sources

[1] Moult J, Fidelis K, Kryshtafovych A, et al. Critical assessment of methods of protein structure prediction (CASP)—round XII. Proteins: Struct Funct Bioinform. 2018;86:7‐15. - PMC - PubMed

[2] Moult J, Fidelis K, Kryshtafovych A, et al. Critical assessment of methods of protein structure prediction (CASP)—round XII. Proteins: Struct Funct Bioinform. 2018;86:7‐15. - PMC - PubMed

[3] Källberg M, Wang H, Wang S, et al. Template‐based protein structure modeling using the RaptorX web server. Nat Protoc. 2012;7:1511‐1522. - PMC - PubMed

[4] Källberg M, Wang H, Wang S, et al. Template‐based protein structure modeling using the RaptorX web server. Nat Protoc. 2012;7:1511‐1522. - PMC - PubMed

[5] Li J, Cheng J. A stochastic point cloud sampling method for multi‐template protein comparative modeling. Sci Rep. 2016;6:25687. - PMC - PubMed

[6] Li J, Cheng J. A stochastic point cloud sampling method for multi‐template protein comparative modeling. Sci Rep. 2016;6:25687. - PMC - PubMed

[7] Remmert M, Biegert A, Hauser A, et al. HHblits: lightning‐fast iterative protein sequence searching by HMM‐HMM alignment. Nat Methods. 2012;9:173‐175. - PubMed

[8] Remmert M, Biegert A, Hauser A, et al. HHblits: lightning‐fast iterative protein sequence searching by HMM‐HMM alignment. Nat Methods. 2012;9:173‐175. - PubMed

[9] Rohl CA, Strauss CE, Misura KM, et al. Protein structure prediction using Rosetta. Methods Enzymol. 2004;383:66‐93. - PubMed

[10] Rohl CA, Strauss CE, Misura KM, et al. Protein structure prediction using Rosetta. Methods Enzymol. 2004;383:66‐93. - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Improving protein tertiary structure prediction by deep learning and distance prediction in CASP14

Affiliations

Improving protein tertiary structure prediction by deep learning and distance prediction in CASP14

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

Grants and funding

LinkOut - more resources

Full Text Sources