Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Dec;87(12):1165-1178.
doi: 10.1002/prot.25697. Epub 2019 Apr 25.

Protein tertiary structure modeling driven by deep learning and contact distance prediction in CASP13

Affiliations

Protein tertiary structure modeling driven by deep learning and contact distance prediction in CASP13

Jie Hou et al. Proteins. 2019 Dec.

Abstract

Predicting residue-residue distance relationships (eg, contacts) has become the key direction to advance protein structure prediction since 2014 CASP11 experiment, while deep learning has revolutionized the technology for contact and distance distribution prediction since its debut in 2012 CASP10 experiment. During 2018 CASP13 experiment, we enhanced our MULTICOM protein structure prediction system with three major components: contact distance prediction based on deep convolutional neural networks, distance-driven template-free (ab initio) modeling, and protein model ranking empowered by deep learning and contact prediction. Our experiment demonstrates that contact distance prediction and deep learning methods are the key reasons that MULTICOM was ranked 3rd out of all 98 predictors in both template-free and template-based structure modeling in CASP13. Deep convolutional neural network can utilize global information in pairwise residue-residue features such as coevolution scores to substantially improve contact distance prediction, which played a decisive role in correctly folding some free modeling and hard template-based modeling targets. Deep learning also successfully integrated one-dimensional structural features, two-dimensional contact information, and three-dimensional structural quality scores to improve protein model quality assessment, where the contact prediction was demonstrated to consistently enhance ranking of protein models for the first time. The success of MULTICOM system clearly shows that protein contact distance prediction and model selection driven by deep learning holds the key of solving protein structure prediction problem. However, there are still challenges in accurately predicting protein contact distance when there are few homologous sequences, folding proteins from noisy contact distances, and ranking models of hard targets.

Keywords: contact prediction; deep learning; distance prediction; protein model quality assessment; protein structure prediction; template-based modeling; template-free modeling.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The pipeline of MULTICOM server and human prediction systems
Figure 2
Figure 2
The pipeline of DNCON2 for protein residue‐residue contact distance prediction. The input volume has 56 channels (matrices) containing various pairwise residue‐residue features
Figure 3
Figure 3
Automated contact distance‐based ab initio protein structure prediction by CONFOLD2
Figure 4
Figure 4
The workflow of deep learning‐based model quality assessment with contacts (DeepRank)
Figure 5
Figure 5
Evaluation of four MULTICOM predictors. The methods are ranked by average TM‐score of the first (ie, TS1) submitted models. A, On 104 domains (left plot: TM_scores of MULTICOM, MULTICOM_CLUSTER, MULTICOM‐NOVEL models vs TM_scores of MULTICOM‐CONSTRUCT models; right plot: mean and variation of the TM‐scores of the models of the four methods). B, On 40 template‐based (TBM‐easy) domains. C, On 31 template‐free (FM) domains
Figure 6
Figure 6
Comparison of DeepRank with individual QA methods used in MULTICOM predictors. A, The box plot of loss of each method. Here the loss is measure at 1‐point scale (ie, the highest/perfect GDT‐TS score = 1). B, The GDT‐TS score at the 100‐point scale of the top models selected by each individual QA method and DeepRank is plotted against the GDT‐TS score of MULTICOM's first submitted models for 74 “all group” full‐length targets. The curve for each method is fitted by the second‐degree polynomial regression function. The area under the curve for each method is calculated and shown on the top left. The larger area indicates the better capacity of model selection
Figure 7
Figure 7
Tertiary structure prediction for T0966. A, The distribution of GDT‐TS scores of 146 server models. B, The plot of the true GDT‐TS scores of models against their predicted ranking by MULTICOM. The point highlighted in red is the top model selected by DeepRank. C, The native structure of target T0966 (PDB code: 5w6l). D, The top selected model. E, The final first MULTICOM model (TS1)
Figure 8
Figure 8
The modeling performance of contact‐based ab initio modeling methods vs the predicted contact accuracy (L/5 contacts) in CASP13. Each point represents the modeling accuracy in terms of GDT‐TS score vs the accuracy of predicted contacts for a method. The colors represent different modeling methods. Rosetta without contacts (purple) was included for comparison. The averaged GDT‐TS score and TM‐score of five methods on the all CASP13 targets are summarized in the top‐right table
Figure 9
Figure 9
A successful ab initio modeling example (a domain of target T1000) for which no significant templates were identified. For the FM domain of T1000 (residues 282‐523), the accuracy of top L/5 predicted contacts is 100%, top L 79% and top 2L 50%. CONFOLD2 successfully built a complicated α‐helix + β‐sheet + α‐helix model for the domain with TM‐score of 0.8 and GDT‐TS of 0.64, while RosettaCon failed to generate a correct topology (ie, TM‐score = 0.33 < 0.5 threshold). This example shows that the pure contact distance driven method such as CONFOLD2 can build high‐quality structural models of complicated topology for large domains if a sufficient number of accurate contact predictions are provided
Figure 10
Figure 10
The successful modeling of a large multidomain target T0996 and the contact‐based validation. The contacts (red) predicted by DNCON2 match with the contacts (blue) in the template‐based models domain by domain

Similar articles

Cited by

References

    1. Abriata LA, Tamò GE, Monastyrskyy B, Kryshtafovych A, Dal Peraro M. Assessment of hard target modeling in CASP12 reveals an emerging role of alignment‐based contact prediction methods. Proteins. 2018;86:97‐112. - PubMed
    1. Kinch LN, Li W, Monastyrskyy B, Kryshtafovych A, Grishin NV. Evaluation of free modeling targets in CASP11 and ROLL. Proteins. 2016;84:51‐66. - PMC - PubMed
    1. Moult J, Fidelis K, Kryshtafovych A, Schwede T, Tramontano A. Critical assessment of methods of protein structure prediction: progress and new directions in round XI. Proteins. 2016;84:4‐14. - PMC - PubMed
    1. Schaarschmidt J, Monastyrskyy B, Kryshtafovych A, Bonvin AM. Assessment of contact predictions in CASP12: co‐evolution and deep learning coming of age. Proteins. 2018;86:51‐66. - PMC - PubMed
    1. Marks DS, Colwell LJ, Sheridan R, et al. Protein 3D structure computed from evolutionary sequence variation. PLoS One. 2011;6(12):e28766. - PMC - PubMed

Publication types

LinkOut - more resources