Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Dec;87(12):1113-1127.
doi: 10.1002/prot.25800. Epub 2019 Aug 20.

Evaluation of template-based modeling in CASP13

Affiliations

Evaluation of template-based modeling in CASP13

Tristan I Croll et al. Proteins. 2019 Dec.

Abstract

Performance in the template-based modeling (TBM) category of CASP13 is assessed here, using a variety of metrics. Performance of the predictor groups that participated is ranked using the primary ranking score that was developed by the assessors for CASP12. This reveals that the best results are obtained by groups that include contact predictions or inter-residue distance predictions derived from deep multiple sequence alignments. In cases where there is a good homolog in the wwPDB (TBM-easy category), the best results are obtained by modifying a template. However, for cases with poorer homologs (TBM-hard), very good results can be obtained without using an explicit template, by deep learning algorithms trained on the wwPDB. Alternative metrics are introduced, to allow testing of aspects of structural models that are not addressed by traditional CASP metrics. These include comparisons to the main-chain and side-chain torsion angles of the target, and the utility of models for solving crystal structures by the molecular replacement method. The alternative metrics are poorly correlated with the traditional metrics, and it is proposed that modeling has reached a sufficient level of maturity that the best models should be expected to satisfy this wider range of criteria.

Keywords: CASP; molecular replacement; structure prediction; template-based modeling.

PubMed Disclaimer

Conflict of interest statement

The authors have no conflict of interest to declare.

Figures

Figure 1
Figure 1
Overall trends in model difficulty and accuracy over time. A, The average difficulty of TBM targets in CASP13 was somewhat lower than in CASP12, with templates of both higher sequence identity and coverage available. B, The distribution of GDT_TS scores for TBM models has shifted toward higher values since the first four rounds of CASP, with a further substantial shift from values below 50 to very good values above 80 between CASP12 and CASP13. C, The accuracy of sequence alignments has improved significantly since CASP11, particularly for low homology templates. D, In keeping with (C), GDT_TS scores appear to still be improving for harder targets. T0999‐D2 is an outlier due to ambiguity in the definition of a “domain,” as discussed in the main text. In (C) and (D), individual data points are shown for CASP12 and −13, with only trend lines shown for earlier meetings. Each point represents the best model submitted by any group for a given target
Figure 2
Figure 2
A, T0999‐D2; B, wwPDB entry 5xwb (ligand‐free open conformation43); and C, wwPDB entry 3nvs (ligand‐bound closed conformation; ligand shown in space‐filling representation). All three structures are aligned to superimpose the bottom domain. Only models based on an open conformation as in 5xwb will resemble the target. There may be additional flexibility in the open state, as the relative orientations of the domains in T0999‐D2 and 5xwb differ somewhat
Figure 3
Figure 3
A,B, Overview of TBM rankings for (A) all 99 groups and (B) top 20 groups. Rankings are based on the sum of sCASP12 scores for all models designated “model 1” submitted in the TBM‐easy and ‐hard categories. C, Performance across difficulty categories for top four TBM groups. While template‐based methods performed best in the TBM‐easy category, the template‐free machine learning methods of the A7D group clearly outperformed in categories where template homology was weak or nonexistent
Figure 4
Figure 4
Example summary chart from torsion‐space comparison of template and model to target for T0965‐D1 (TBM‐hard). Top panel: per‐residue backbone torsion deviations from the target (lower is better). Second panel: difference between template and model results from top panel—negative values indicate the model has improved agreement compared to the template. Background coloring indicates the residual differences in Cα positions between template (green) or model (purple) and target after rigid‐body alignment. Sites with potentially problematic peptide bonds (cis/trans disagreement or twisted more than 30° from planar) are indicated with crosses and triangles respectively. Third panel: sidechain dihedral errors, weighted for degree of burial and distance from backbone as described in the main text. Bottom panel: difference between template and model sidechain results—negative indicates improvement
Figure 5
Figure 5
Torsion‐based scoring metrics reveal issues not captured by standard scores. Horizontal axis: sum of all positive z‐scores by standard ranking formula. Vertical axis: sum of all positive z‐scores by torsion‐only formula. Each point represents the aggregate of all models submitted by a single group in the TBM‐easy and TBM‐hard categories. Points are colored according to change in ranking going from SCASP12‐ASE to Storsion. The top 10 groups in molecular replacement trials disregarding error estimates (see Figure 7A) are marked in red. The three points at lower‐right (each originating from I‐TASSER15) demonstrate that it is possible to achieve excellent (indeed, field‐leading) scores by default metrics while still suffering from severe distortions at the local level
Figure 6
Figure 6
Target T0981‐D5 (TBM‐hard) presents a particularly stark example of the importance of carefully considering model stereochemistry. A, The two leading models by SCASP12‐ASE (horizontal axis) (i: A7D; ii: Zhang‐Server) appear at opposite extremes according to Storsion (vertical axis). Note: the corresponding SCASP12 scores (including the ASE measure) for these two models are 1.79 and 1.55, respectively. B, The Cα correspondence to the target is quite similar in both cases: close in the core fold while deviating substantially on the two extended hairpins at right. Gray = target; cyan = A7D; green = Zhang‐Server. C, Summary of markup used in panels D through F. (i) Severe sidechain outlier (P < .05%). Less severe outliers appear as smaller, yellow‐orange versions of the same motif. (ii, iii) Ramachandran outlier (P < .05%) and marginal (P < 2%) respectively. (iv) Peptide bond twisted more than 30° out of plane. D‐F, While the A7D model (E) contains a similar number of Ramachandran outliers to the target (D), more than half of all residues in the model from Zhang‐Server (F) contain Ramachandran, sidechain and/or peptide bond planarity outliers
Figure 7
Figure 7
A, Top 10 groups ranked by mean z‐scores for LLG calculations. Groups are sorted by the maximum of the mean z‐score computed using the calculations where the B‐factor column is interpreted as an RMS coordinate error estimate for each atom (blue bars) or where constant B‐factors are used (orange bars). (B,C) Effect of B‐factor weighing on MR utility for BAKER‐ROSETTASERVER model of T1002‐D3. Both panels show the experimental structure of T1002‐D3 in blue. Panel (B) shows the best model (number 3) submitted by BAKER‐ROSETTASERVER in gold. Panel (C) shows the same model in salmon, but only including the residues for which the estimated coordinate error was less than 2 å
Figure 8
Figure 8
Value added for utility in MR. For 26 of 27 evaluation units, the best model is better than the best template previously available from the PDB
Figure 9
Figure 9
Rankings by geometric quality for (A,B) TBM‐easy and (C,D) TBM‐hard categories. A,C, Scores for top 20 groups in each category. B,D, comparison of Sgeom vs the standard SCASP12. It is particularly notable that A7D, the top group in TBM‐hard—by either metric—did not in fact use a template‐based method
Figure 10
Figure 10
Importance of considering model quality when selecting templates. A, Scatter plot of resolution vs MolProbity score for all PDB entries identified as templates used in this CASP round (excluding models with resolutions below 10 å). Red lines connect selected templates to similar models with significantly better resolution and/or MolProbity score. Alternative models were selected from those with better than 90% (solid lines), 70% (dashed line), or 50% (dotted lines) sequence identity to the template chain. B, Representative fragment of chain F from 5mqf (5.9 å resolution cryo‐EM model used as a template for T0954 by six groups). The density is uninterpretable on the atomic scale—this chain is a homology model, truncated to poly‐Ala and rigid‐body docked into patchy density. C, Equivalent region from the 100% sequence‐identical 5xjc (3.6 å cryo‐EM model, used by only two groups). All sidechains are present and for the most part modeled into strong, convincing density. The lower MolProbity score for 5mqf arises simply because truncated sidechains do not contribute to clashscore nor count as rotamer outliers

References

    1. Berman H, Henrick K, Nakamura H, Markley JL. The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data. Nucleic Acids Res. 2007;35(suppl 1):D301‐D303. 10.1093/nar/gkl971. - DOI - PMC - PubMed
    1. Berman HM, Coimbatore Narayanan B, Di Costanzo L, et al. Trendspotting in the Protein Data Bank. FEBS Lett. 2013;587(8):1036‐1045. 10.1016/j.febslet.2012.12.029. - DOI - PMC - PubMed
    1. Stephens ZD, Lee SY, Faghri F, et al. Big data: astronomical or genomical? PLoS Biol. 2015;13(7). 10.1371/journal.pbio.1002195. - DOI - PMC - PubMed
    1. Perez A, Morrone JA, Simmerling C, Dill KA. Advances in free‐energy‐based simulations of protein folding and ligand binding. Curr Opin Struct Biol. 2016;36:25‐31. 10.1016/j.sbi.2015.12.002. - DOI - PMC - PubMed
    1. Kryshtafovych A, Monastyrskyy B, Fidelis K, Moult J, Schwede T, Tramontano A. Evaluation of the template‐based modeling in CASP12. Proteins Struct Funct Bioinform. 2018;86:321‐334. 10.1002/prot.25425. - DOI - PMC - PubMed

Publication types