. 2019 Dec;87(12):1165-1178.

doi: 10.1002/prot.25697. Epub 2019 Apr 25.

Protein tertiary structure modeling driven by deep learning and contact distance prediction in CASP13

Jie Hou¹, Tianqi Wu¹, Renzhi Cao², Jianlin Cheng¹

Affiliations

¹ Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri.
² Department of Computer Science, Pacific Lutheran University, Tacoma, Washington.

PMID: 30985027
PMCID: PMC6800999
DOI: 10.1002/prot.25697

Protein tertiary structure modeling driven by deep learning and contact distance prediction in CASP13

Jie Hou et al. Proteins. 2019 Dec.

. 2019 Dec;87(12):1165-1178.

doi: 10.1002/prot.25697. Epub 2019 Apr 25.

Authors

Jie Hou¹, Tianqi Wu¹, Renzhi Cao², Jianlin Cheng¹

Affiliations

¹ Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri.
² Department of Computer Science, Pacific Lutheran University, Tacoma, Washington.

PMID: 30985027
PMCID: PMC6800999
DOI: 10.1002/prot.25697

Abstract

Predicting residue-residue distance relationships (eg, contacts) has become the key direction to advance protein structure prediction since 2014 CASP11 experiment, while deep learning has revolutionized the technology for contact and distance distribution prediction since its debut in 2012 CASP10 experiment. During 2018 CASP13 experiment, we enhanced our MULTICOM protein structure prediction system with three major components: contact distance prediction based on deep convolutional neural networks, distance-driven template-free (ab initio) modeling, and protein model ranking empowered by deep learning and contact prediction. Our experiment demonstrates that contact distance prediction and deep learning methods are the key reasons that MULTICOM was ranked 3rd out of all 98 predictors in both template-free and template-based structure modeling in CASP13. Deep convolutional neural network can utilize global information in pairwise residue-residue features such as coevolution scores to substantially improve contact distance prediction, which played a decisive role in correctly folding some free modeling and hard template-based modeling targets. Deep learning also successfully integrated one-dimensional structural features, two-dimensional contact information, and three-dimensional structural quality scores to improve protein model quality assessment, where the contact prediction was demonstrated to consistently enhance ranking of protein models for the first time. The success of MULTICOM system clearly shows that protein contact distance prediction and model selection driven by deep learning holds the key of solving protein structure prediction problem. However, there are still challenges in accurately predicting protein contact distance when there are few homologous sequences, folding proteins from noisy contact distances, and ranking models of hard targets.

Keywords: contact prediction; deep learning; distance prediction; protein model quality assessment; protein structure prediction; template-based modeling; template-free modeling.

PubMed Disclaimer

Figures

**Figure 1**
The pipeline of MULTICOM server and human prediction systems

**Figure 2**
The pipeline of DNCON2 for protein residue‐residue contact distance prediction. The input volume has 56 channels (matrices) containing various pairwise residue‐residue features

**Figure 3**
Automated contact distance‐based ab initio protein structure prediction by CONFOLD2

**Figure 4**
The workflow of deep learning‐based model quality assessment with contacts (DeepRank)

**Figure 5**
Evaluation of four MULTICOM predictors. The methods are ranked by average TM‐score of the first (ie, TS1) submitted models. A, On 104 domains (left plot: TM_scores of MULTICOM, MULTICOM_CLUSTER, MULTICOM‐NOVEL models vs TM_scores of MULTICOM‐CONSTRUCT models; right plot: mean and variation of the TM‐scores of the models of the four methods). B, On 40 template‐based (TBM‐easy) domains. C, On 31 template‐free (FM) domains

**Figure 6**
Comparison of DeepRank with individual QA methods used in MULTICOM predictors. A, The box plot of loss of each method. Here the loss is measure at 1‐point scale (ie, the highest/perfect GDT‐TS score = 1). B, The GDT‐TS score at the 100‐point scale of the top models selected by each individual QA method and DeepRank is plotted against the GDT‐TS score of MULTICOM's first submitted models for 74 “all group” full‐length targets. The curve for each method is fitted by the second‐degree polynomial regression function. The area under the curve for each method is calculated and shown on the top left. The larger area indicates the better capacity of model selection

**Figure 7**
Tertiary structure prediction for T0966. A, The distribution of GDT‐TS scores of 146 server models. B, The plot of the true GDT‐TS scores of models against their predicted ranking by MULTICOM. The point highlighted in red is the top model selected by DeepRank. C, The native structure of target T0966 (PDB code: 5w6l). D, The top selected model. E, The final first MULTICOM model (TS1)

**Figure 8**
The modeling performance of contact‐based ab initio modeling methods vs the predicted contact accuracy (L/5 contacts) in CASP13. Each point represents the modeling accuracy in terms of GDT‐TS score vs the accuracy of predicted contacts for a method. The colors represent different modeling methods. Rosetta without contacts (purple) was included for comparison. The averaged GDT‐TS score and TM‐score of five methods on the all CASP13 targets are summarized in the top‐right table

**Figure 9**
A successful ab initio modeling example (a domain of target T1000) for which no significant templates were identified. For the FM domain of T1000 (residues 282‐523), the accuracy of top L/5 predicted contacts is 100%, top L 79% and top 2L 50%. CONFOLD2 successfully built a complicated α‐helix + β‐sheet + α‐helix model for the domain with TM‐score of 0.8 and GDT‐TS of 0.64, while RosettaCon failed to generate a correct topology (ie, TM‐score = 0.33 < 0.5 threshold). This example shows that the pure contact distance driven method such as CONFOLD2 can build high‐quality structural models of complicated topology for large domains if a sufficient number of accurate contact predictions are provided

**Figure 10**
The successful modeling of a large multidomain target T0996 and the contact‐based validation. The contacts (red) predicted by DNCON2 match with the contacts (blue) in the template‐based models domain by domain

See this image and copyright information in PMC

References

1. Abriata LA, Tamò GE, Monastyrskyy B, Kryshtafovych A, Dal Peraro M. Assessment of hard target modeling in CASP12 reveals an emerging role of alignment‐based contact prediction methods. Proteins. 2018;86:97‐112. - PubMed
1. Kinch LN, Li W, Monastyrskyy B, Kryshtafovych A, Grishin NV. Evaluation of free modeling targets in CASP11 and ROLL. Proteins. 2016;84:51‐66. - PMC - PubMed
1. Moult J, Fidelis K, Kryshtafovych A, Schwede T, Tramontano A. Critical assessment of methods of protein structure prediction: progress and new directions in round XI. Proteins. 2016;84:4‐14. - PMC - PubMed
1. Schaarschmidt J, Monastyrskyy B, Kryshtafovych A, Bonvin AM. Assessment of contact predictions in CASP12: co‐evolution and deep learning coming of age. Proteins. 2018;86:51‐66. - PMC - PubMed
1. Marks DS, Colwell LJ, Sheridan R, et al. Protein 3D structure computed from evolutionary sequence variation. PLoS One. 2011;6(12):e28766. - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

R01 GM093123/GM/NIGMS NIH HHS/United States

LinkOut - more resources

Full Text Sources
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Protein tertiary structure modeling driven by deep learning and contact distance prediction in CASP13

Affiliations

Protein tertiary structure modeling driven by deep learning and contact distance prediction in CASP13

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Miscellaneous