Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Dec;89(12):1888-1900.
doi: 10.1002/prot.26248. Epub 2021 Oct 3.

Assessing the accuracy of contact and distance predictions in CASP14

Affiliations

Assessing the accuracy of contact and distance predictions in CASP14

Victoria Ruiz-Serra et al. Proteins. 2021 Dec.

Abstract

We present the results of the assessment of the intramolecular residue-residue contact and distance predictions from groups participating in the 14th round of the CASP experiment. The performance of contact prediction methods was evaluated with the measures used in previous CASPs, while distance predictions were assessed based on a new protocol, which considers individual distance pairs as well as the whole predicted distance matrix, using a graph-based framework. The results of the evaluation indicate that predictions by the tFold framework, TripletRes and DeepPotential were the most accurate in both categories. With regards to progress in method performance, the results of the assessment in contact prediction did not reveal any discernible difference when compared to CASP13. Arguably, this could be due to CASP14 FM targets being more challenging than ever before.

Keywords: CASP14; benchmarkin; community-wide experiment; numerical evaluation measures; prediction of residue-residue contact and distance.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Improvement of contact prediction over CASP11-CASP14 meetings. Participating groups (X-axis) are ranked according to average precision (Y-axis) for the top L/5 contacts.
Figure 2.
Figure 2.
Analysis of target difficulty in CASP14. (A) Scatter plot representing the average sequence identity vs. average coverage of the best structural templates from CASP12 to CASP14. Cyan circles correspond to FM targets while blue triangles represent FM + FM/TBM targets. Red to green gradient of the plot box reflects predictive difficulty from harder to easier, with the lower bottom corner hosting the most difficult target sets. (B) Proportion of FM targets with low alignment depth (Neff<0.2) in CASP13 vs CASP14. Data are only shown for targets containing long-range contacts. (C) Contact predictions accuracy as a function of connected secondary structure elements (x-axis). Results are shown in terms of average F1-score (green bars) for long-range L/5 contacts (all participating groups). Orange bars indicate the overall frequency of long-range contacts as a function of connected secondary structure elements in target structures.
Figure 3.
Figure 3.
Cumulative z-score ranking of participating groups on FM targets. Performance is shown for the top L/5 long-range contacts. Group names are labeled as h and s to denote human-expert and server methods, respectively.
Figure 4.
Figure 4.
Results of the paired Student’s t-Test computed on the top-10 performing groups according to cumulative z-score ranking. Red cells correspond to p-value < 0.05.
Figure 5.
Figure 5.
Contact precision of the best prediction method as a function of alignment depth. Data are shown for Top L long-range contacts and FM targets in CASP13 and CASP14. R refers to Pearson coefficient and p to p-value.
Figure 6.
Figure 6.
Heatmap clusters of group performances and assessment metrics. Columns include four bin-level assessment scores (average bin precision, F1, MDD, and MBN) and four graph-based scores (Diversity, Strength, Clustering Coefficient, and Shortest Path). Rows represent participant groups. Rows and columns were clustered using Euclidean distance with complete linkage. The heatmap is coloured according to the per-group sum of z-score values >0 for each metric, and ranges from yellow (low) to blue (high).
Figure 7.
Figure 7.
Cumulative z-score ranking of participating groups in the distance prediction category. Performance is shown according to the linear combination of 5 non-redundant metrics (z-MDD + z-F1 + z-clustering + z-diversity + z-shortest_path). Group names are labeled as h and s to denote human-expert and server methods, respectively.
Figure 8.
Figure 8.
Results of the paired Student’s t-Test computed on the top-10 performing groups according to the metascore-based ranking. Red cells correspond to p-value < 0.05.
Figure 9.
Figure 9.
Target T1093-D1, log(Neff/len) = 0.11. (A-C) Contact maps. (D-F) Distance maps. (G-I) Graphs-based representation of predicted and native distance maps. Graph nodes represent amino acid residues and are colored based on per-residue strength values. (J-L) 3D structure of the T1093-D1 target colored according to per-residue strength values. Color ranges from red (high strength) to blue (low strength). Highlighted regions correspond to amino-acid residues showing the highest strength values in the native graph.
Figure 10.
Figure 10.
(A) 3D structure of target T1080-D1, N-terminal and C-terminal in blue-red gradient color. (B) Diagram of secondary structure content. β-strand elements are indicated by green rectangles while turns and loops are denoted by the black line. (C) Normalized mean difference (NMD) between observed and predicted per-residue strength computed over all predictions. (D) Heatmap showing the normalized absolute difference in per-residue strength between predicted and native graphs for each participant group (y-axis). (E) Absolute difference between predicted and native per-residue shortest path (Δs_path, yellow line) and RMSD (black line) for predictors G009, G304, G488 and G319. Values of Δs_path and RMSD are shown as normalised values between 0 and 1.

References

    1. Göbel U, Sander C, Schneider R, et al. Correlated mutations and residue contacts in proteins. Proteins 1994;18:309–317. - PubMed
    1. de Juan D, Pazos F, Valencia A. Emerging methods in protein co-evolution. Nat Rev Genet 2013;14:249–261. - PubMed
    1. Lesk AM. CASP2: report on ab initio predictions. Proteins 1997;Suppl 1:151–166. - PubMed
    1. Olmea O, Valencia A. Improving contact predictions by the combination of correlated mutations and other sources of sequence information. Folding and Design 1997;2:S25–S32. 10.1016/s1359-0278(97)00060-6. - DOI - PubMed
    1. Havel TF, Crippen GM, Kuntz ID. Effects of distance constraints on macromolecular conformation. II. Simulation of experimental results and theoretical predictions. Biopolymers 1979;18:73–81. 10.1002/bip.1979.360180108. - DOI

Publication types