Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jan;90(1):58-72.
doi: 10.1002/prot.26186. Epub 2021 Jul 27.

Improving protein tertiary structure prediction by deep learning and distance prediction in CASP14

Affiliations

Improving protein tertiary structure prediction by deep learning and distance prediction in CASP14

Jian Liu et al. Proteins. 2022 Jan.

Abstract

Substantial progresses in protein structure prediction have been made by utilizing deep-learning and residue-residue distance prediction since CASP13. Inspired by the advances, we improve our CASP14 MULTICOM protein structure prediction system by incorporating three new components: (a) a new deep learning-based protein inter-residue distance predictor to improve template-free (ab initio) tertiary structure prediction, (b) an enhanced template-based tertiary structure prediction method, and (c) distance-based model quality assessment methods empowered by deep learning. In the 2020 CASP14 experiment, MULTICOM predictor was ranked seventh out of 146 predictors in tertiary structure prediction and ranked third out of 136 predictors in inter-domain structure prediction. The results demonstrate that the template-free modeling based on deep learning and residue-residue distance prediction can predict the correct topology for almost all template-based modeling targets and a majority of hard targets (template-free targets or targets whose templates cannot be recognized), which is a significant improvement over the CASP13 MULTICOM predictor. Moreover, the template-free modeling performs better than the template-based modeling on not only hard targets but also the targets that have homologous templates. The performance of the template-free modeling largely depends on the accuracy of distance prediction closely related to the quality of multiple sequence alignments. The structural model quality assessment works well on targets for which enough good models can be predicted, but it may perform poorly when only a few good models are predicted for a hard target and the distribution of model quality scores is highly skewed. MULTICOM is available at https://github.com/jianlin-cheng/MULTICOM_Human_CASP14/tree/CASP14_DeepRank3 and https://github.com/multicom-toolbox/multicom/tree/multicom_v2.0.

Keywords: inter-residue distance prediction; protein quality assessment; protein structure prediction.

PubMed Disclaimer

Figures

FIGURE 1
FIGURE 1
The pipeline of MULTICOM human and server protein structure predictors [Color figure can be viewed at wileyonlinelibrary.com]
FIGURE 2
FIGURE 2
The average loss of 40 QA methods and features in MULTICOM. (A) the loss on 61 “all groups” full‐length targets. (B) the loss on 30 TBM‐easy or TBM‐hard full‐length targets. (C) the loss on 31 FM/TBM or FM full‐length targets. Red: three DeepRank methods including DeepRank, DeepRank_con, DeepRank3_Cluster; Green: three Multi‐model methods including APOLLO, Pcons, and ModFOLDcluster2; Blue: 17 single‐model methods including (i.e., SBROD, RWplus, Voronota, Dope, OPUS_PSP, RF_CB_SRS_OD, DeepQA, ProQ2, ProQ3 41 ); Pink: six contact matching scores including DeepDist/DNCON2 short‐range, medium‐range and long‐range contact matching scores; Yellow: 11 distance scores including SSIM and PSNR, GIST, RMSE, Recall, Precision, PHASH, Pearson correlation, and ORB [Color figure can be viewed at wileyonlinelibrary.com]
FIGURE 3
FIGURE 3
Evaluation of four MULTICOM server predictors in terms of the TM‐scores for the first submitted models. (A) On 92 “all group” +4 “server only” domains (left: TM‐scores of MULTICOM‐DEEP, MULTICOM‐HYBRID, MULTICOM‐CONSTRUCT models versus TM‐scores of MULTICOM‐CLUSTER models; right plot: mean and variation of the TM‐scores of the models of the four methods). (B) On 58 template‐based (TBM‐easy, TBM‐hard) domains. (C) On 38 FM or TBM/FM domains [Color figure can be viewed at wileyonlinelibrary.com]
FIGURE 4
FIGURE 4
Predicted structures and distance maps compared with native structures and true distance maps for 20 FM or FM/TBM domains for which the first model predicted by MULTICOM‐DEEP has the correct topology (TM‐score > 0.5). For each domain, on the left is the comparison of the distance maps (lower triangle: true distance map; upper triangle: predicted distance map); and on the right is the comparison of predicted and true structures (light yellow: native structure, light blue: the first predicted structure). The TM‐score of the predicted structure and the precision of top L/2 long‐range contact predictions for each domain is listed on top of each sub‐figure [Color figure can be viewed at wileyonlinelibrary.com]
FIGURE 5
FIGURE 5
(A) Logarithm of Neff of MSAs versus the quality of MULTICOM‐DIST top‐1 models on the 38 CASP14 FM or FM/TBM domains. (B) The precision of top L/2 long‐range contact predictions versus the quality of MULTICOM‐DIST top‐1 models on the 38 FM or FM/TBM domains [Color figure can be viewed at wileyonlinelibrary.com]
FIGURE 6
FIGURE 6
(A) The plot of the number of non‐gap residues of multiple sequence alignment of T1036s1 against residue positions, where x‐axis stands for each residue position and y‐axis stands for the number of non‐gap amino acids. (B) The true distance map of T1036s1‐D1 (lower triangle) versus the predicted distance map from MULTICOM‐DIST (upper triangle). (C) The true structure of target T1036s1‐D1 in rainbow, starting from the N‐terminal in blue to C‐terminal in red [Color figure can be viewed at wileyonlinelibrary.com]
FIGURE 7
FIGURE 7
The GDT‐TS scores of original models versus the GDT‐TS scores of combined models (MULTICOM_TS1) [Color figure can be viewed at wileyonlinelibrary.com]
FIGURE 8
FIGURE 8
Good examples for model combination on targets T1034, T1046s1, and T1065s2 (light yellow: native structure, light blue: MULTICOM_TS1 [final combined model], pink: original model) [Color figure can be viewed at wileyonlinelibrary.com]
FIGURE 9
FIGURE 9
(A) The percentage of good‐quality models (TM‐score > 0.5) versus GDT‐TS loss of DeepRank. (B) The distribution of TM‐scores of the models of T1031‐D1 (green), T1039‐D1 (red), and T1043‐D1 (blue); dots on the curves denote the top model selected for the targets. (C) The skewness of TM‐scores of the models versus GDT‐TS losses of DeepRank for all 61 targets [Color figure can be viewed at wileyonlinelibrary.com]

Similar articles

Cited by

References

    1. Moult J, Fidelis K, Kryshtafovych A, et al. Critical assessment of methods of protein structure prediction (CASP)—round XII. Proteins: Struct Funct Bioinform. 2018;86:7‐15. - PMC - PubMed
    1. Källberg M, Wang H, Wang S, et al. Template‐based protein structure modeling using the RaptorX web server. Nat Protoc. 2012;7:1511‐1522. - PMC - PubMed
    1. Li J, Cheng J. A stochastic point cloud sampling method for multi‐template protein comparative modeling. Sci Rep. 2016;6:25687. - PMC - PubMed
    1. Remmert M, Biegert A, Hauser A, et al. HHblits: lightning‐fast iterative protein sequence searching by HMM‐HMM alignment. Nat Methods. 2012;9:173‐175. - PubMed
    1. Rohl CA, Strauss CE, Misura KM, et al. Protein structure prediction using Rosetta. Methods Enzymol. 2004;383:66‐93. - PubMed

Publication types

LinkOut - more resources