Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Dec;89(12):1800-1823.
doi: 10.1002/prot.26222. Epub 2021 Sep 13.

Prediction of protein assemblies, the next frontier: The CASP14-CAPRI experiment

Marc F Lensink  1 Guillaume Brysbaert  1 Théo Mauri  1 Nurul Nadzirin  2 Sameer Velankar  2 Raphael A G Chaleil  3 Tereza Clarence  3 Paul A Bates  3 Ren Kong  4 Bin Liu  4 Guangbo Yang  4 Ming Liu  4 Hang Shi  4 Xufeng Lu  4 Shan Chang  4 Raj S Roy  5 Farhan Quadir  5 Jian Liu  5 Jianlin Cheng  5   6 Anna Antoniak  7 Cezary Czaplewski  7 Artur Giełdoń  7 Mateusz Kogut  7 Agnieszka G Lipska  7 Adam Liwo  7 Emilia A Lubecka  8 Martyna Maszota-Zieleniak  7 Adam K Sieradzan  7 Rafał Ślusarz  7 Patryk A Wesołowski  7   9 Karolina Zięba  7 Carlos A Del Carpio Muñoz  10 Eiichiro Ichiishi  11 Ameya Harmalkar  12 Jeffrey J Gray  12 Alexandre M J J Bonvin  13 Francesco Ambrosetti  13 Rodrigo Vargas Honorato  13 Zuzana Jandova  13 Brian Jiménez-García  13 Panagiotis I Koukos  13 Siri Van Keulen  13 Charlotte W Van Noort  13 Manon Réau  13 Jorge Roel-Touris  13 Sergei Kotelnikov  14   15   16 Dzmitry Padhorny  14   15 Kathryn A Porter  17 Andrey Alekseenko  14   15   18 Mikhail Ignatov  14   15 Israel Desta  17 Ryota Ashizawa  14   15 Zhuyezi Sun  17 Usman Ghani  17 Nasser Hashemi  17 Sandor Vajda  17   19 Dima Kozakov  14   15 Mireia Rosell  20   21 Luis A Rodríguez-Lumbreras  20   21 Juan Fernandez-Recio  20   21 Agnieszka Karczynska  22 Sergei Grudinin  22 Yumeng Yan  23 Hao Li  23 Peicong Lin  23 Sheng-You Huang  23 Charles Christoffer  24 Genki Terashi  25 Jacob Verburgt  25 Daipayan Sarkar  25 Tunde Aderinwale  24 Xiao Wang  24 Daisuke Kihara  24   25 Tsukasa Nakamura  26 Yuya Hanazono  27 Ragul Gowthaman  28   29 Johnathan D Guest  28   29 Rui Yin  28   29 Ghazaleh Taherzadeh  28   29 Brian G Pierce  28   29 Didier Barradas-Bautista  30 Zhen Cao  30 Luigi Cavallo  30 Romina Oliva  31 Yuanfei Sun  32 Shaowen Zhu  32 Yang Shen  32 Taeyong Park  33 Hyeonuk Woo  33 Jinsol Yang  33 Sohee Kwon  33 Jonghun Won  33 Chaok Seok  33 Yasuomi Kiyota  34 Shinpei Kobayashi  34 Yoshiki Harada  34 Mayuko Takeda-Shitaka  34 Petras J Kundrotas  35 Amar Singh  35 Ilya A Vakser  35 Justas Dapkūnas  36 Kliment Olechnovič  36 Česlovas Venclovas  36 Rui Duan  37 Liming Qiu  37 Xianjin Xu  37 Shuang Zhang  37 Xiaoqin Zou  6   37   38   39 Shoshana J Wodak  40
Affiliations

Prediction of protein assemblies, the next frontier: The CASP14-CAPRI experiment

Marc F Lensink et al. Proteins. 2021 Dec.

Abstract

We present the results for CAPRI Round 50, the fourth joint CASP-CAPRI protein assembly prediction challenge. The Round comprised a total of twelve targets, including six dimers, three trimers, and three higher-order oligomers. Four of these were easy targets, for which good structural templates were available either for the full assembly, or for the main interfaces (of the higher-order oligomers). Eight were difficult targets for which only distantly related templates were found for the individual subunits. Twenty-five CAPRI groups including eight automatic servers submitted ~1250 models per target. Twenty groups including six servers participated in the CAPRI scoring challenge submitted ~190 models per target. The accuracy of the predicted models was evaluated using the classical CAPRI criteria. The prediction performance was measured by a weighted scoring scheme that takes into account the number of models of acceptable quality or higher submitted by each group as part of their five top-ranking models. Compared to the previous CASP-CAPRI challenge, top performing groups submitted such models for a larger fraction (70-75%) of the targets in this Round, but fewer of these models were of high accuracy. Scorer groups achieved stronger performance with more groups submitting correct models for 70-80% of the targets or achieving high accuracy predictions. Servers performed less well in general, except for the MDOCKPP and LZERD servers, who performed on par with human groups. In addition to these results, major advances in methodology are discussed, providing an informative overview of where the prediction of protein assemblies currently stands.

Keywords: CAPRI; CASP; blind prediction; docking; oligomeric state; protein assemblies; protein complexes; protein docking; protein-protein interaction; template-based modeling.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:. The Targets of Round 50.
(a) Dimeric targets, (b) trimeric targets, (c) large assemblies. The dimeric targets are divided into Easy (T164/T1032, T166/H1045) and Difficult (T169/T1054, T176/T1078, T178/T1083, T179/T1087) targets. The trimeric targets T165/H1036 and T174/T1070 were Difficult, whereas T168/T1052 was easy. The large assembly target T177/T1081 was an easy target. The remaining targets T170/H1060 and T180/T0199 featured both Easy and Difficult to predict interfaces.
Figure 2:
Figure 2:. Evaluated interfaces of the bacterial Arginine decarboxylate (T177/T1081).
The two primary interfaces are within each decameric ring, the third interface lies between the two rings. Individual subunits illustrating the intra- and inter-decamer interfaces are colored.
Figure 3:
Figure 3:. Subunit arrangement and interfaces of the T5 phage tail distal complex (T170/H1060).
(a) The rings A and B (rings are underlined) consist of 3 identical copies of protein A (proteins are not underlined); ring C contains an inner Ci (3 copies of B) and outer Co (12 copies of C) ring; ring D contains 6 copies of protein D. The best templates for each protein are shown in the image. (b) Shows the organization of the 5 rings in the larger assembly as it was resolved by cryo-EM. To the right of the rings are listed the chain identifiers, with the number of residues in each chain in parentheses. (c) Shows the 9 different interfaces, the rings in or between which they occur, two exemplary chains of the interface and the buried area between the two chains.
Figure 4:
Figure 4:. Subunit interactions and quasi symmetry of the duck hepatitis B virus capsid (T180/T1099)
(a) shows the entire capsid, highlighting the five-fold and three-fold symmetry also shown in (b) that is exhibited by the assembly. The capsid contains 60 copies of the four-chain asymmetric unit shown in (c), in which the chain pairs A:B and C:D form the tight, primary interface. The secondary interface, shown in (b), is formed by interactions between chains A (green, forming the pentagon) and chains C (magenta, forming the triangle) of neighboring units. (d) A difference in backbone conformation of chains A/C vs B/D (backbone rmsd 0.6 Å) results in a quasi-identical interface connecting the pentagon and triangle together through interface [2’] of (b). (e) shows the overlap of chain A of the target to its analogue in the template 3j2v, highlighting the regions that needed to be modeled correctly for an accurate prediction of both interfaces.
Figure 5:
Figure 5:. Global landscape of the interface prediction performance.
Scatter plot showing the average Recall and Precision values (see main text for definition) of the interfaces in models submitted by all predictors (a) and scorers (b) for the 12 targets of Round 50. Each point represents the average Recall and Precision values for the interfaces of the individual protein components (i.e. the receptor and ligand proteins, respectively) in the 5 models submitted by each participant for one binary association mode. Averaging was performed separately over models in the 4 CAPRI accuracy categories (incorrect, acceptable, medium, and high). For example, for a participant submitting 5 models or which 2 were incorrect, 2 of medium quality and 1 of high quality, average Recall and Precision values were computed for the 2 incorrect models, and the 2 medium-quality ones, respectively, whereas those for the single high-quality models were used as such. Individual points are color-coded by the CAPRI model quality category (as indicated in the legend displayed in the upper left corner of each graph). The upper right-hand quadrant of the graph, with Recall and Precision values above 0.5, contains all points corresponding to “correct” interface predictions. The 2 salient outlier green points in (a) correspond to the medium accuracy models with high f(non-nat) values submitted by Kozakov/CLUSPRO for the T170.5 interface. The 2 salient outlier red points in (b), correspond to the high accuracy models with however high f(non-nat) values submitted by the group of Zou for the T177.2 interface.
Figure 6:
Figure 6:. Global overview of the prediction performance for targets of Round 50.
Shown are the distributions of the DockQ values computed for the top-five models submitted by all predictor and scorer groups for individual targets of Round 50. (a) Scatter plots of DockQ values for individual models submitted by predictors (left column) and scorers (right column) for individual targets. The targets are labeled by their CAPRI target number and interface rank. Individual points are color-coded according to the CAPRI model quality category; yellow: incorrect; blue: acceptable; green: medium; red: high. For each target, a baseline-level prediction, represented by the best model of the top-performing automatic server (MDOCKPP; see Table 2), is represented by black triangles. (b) The same information presented as boxplot distributions (whiskers at 9th and 91st percentiles) of models submitted for each target and prediction category; color coding is as for the upper panel, but with a lighter shade of blue for better visibility.
Figure 7:
Figure 7:. f1 as a function of S-rms.
Each point in the figure represents the best model of a predictor group for each of the 23 interfaces. Individual points are color-coded following the CAPRI model quality as in Figure 6. The results for the best predictors (Baker, Seok, Venclovas) and servers (LZERD, MDOCKPP) are highlighted. See main text for definition of f1 and S-rms. The upper left quadrant features the best models, with S-rms values below 3.5 Å and f1 values above 0.3, corresponding to mostly medium and high-quality models.
Figure 8:
Figure 8:. Gauging progress.
Panel (a) shows the performance score of the top 29 ranking predictor and server groups (both CAPRI and CASP-only groups; server groups are listed in capital letters). The height of the bar is the ScoreG value of Eq. (2), with individual contributions from high, medium, or acceptable-quality models indicated. The total number of targets for which at least an acceptable quality model was produced is indicated in the graph by a diamond. Panel (b) shows the same data from the previous CASP13-CAPRI Round.
Figure 9:
Figure 9:. Model quality of individual protein subunits in assembly models of the 12 targets of Round 50.
Shown are whisker plots (displaying the median, 1st and 3rd quartile, and 9th and 91st percentile) representing the distributions of M-rms values of individual protein subunits in models submitted for each of the targets of Round 50. Targets are labeled by their CAPRI target number; chain identifiers (A, B, etc) are used for the different proteins in the hetero-complexes.

References

    1. Ideker T, Sharan R. Protein networks in disease. Genome Res 2008;18(4):644–652. - PMC - PubMed
    1. Barabasi AL, Gulbahce N, Loscalzo J. Network medicine: a network-based approach to human disease. Nat Rev Genet 2011;12(1):56–68. - PMC - PubMed
    1. Lo Conte L, Chothia C, Janin J. The atomic structure of protein-protein recognition sites. Journal of molecular biology 1999;285(5):2177–2198. - PubMed
    1. Dey S, Pal A, Chakrabarti P, Janin J. The subunit interfaces of weakly associated homodimeric proteins. Journal of molecular biology 2010;398(1):146–160. - PubMed
    1. Ponstingl H, Kabir T, Gorse D, Thornton JM. Morphological aspects of oligomeric protein structures. Progress in biophysics and molecular biology 2005;89(1):9–35. - PubMed

Publication types

LinkOut - more resources