Modeling SARS-CoV-2 proteins in the CASP-commons experiment

Affiliations

¹ Genome Center, University of California, Davis, Davis, California, USA.
² Department of Cell Biology and Molecular genetics, Institute for Bioscience and Biotechnology Research, University of Maryland, Rockville, Maryland, USA.
³ Department of Physics & Astronomy, Brigham Young University, Provo, Utah, USA.
⁴ Department of Chemistry, Seoul National University, Seoul, South Korea.
⁵ Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania.

PMID: 34462960
PMCID: PMC8616790
DOI: 10.1002/prot.26231

Modeling SARS-CoV-2 proteins in the CASP-commons experiment

Andriy Kryshtafovych et al. Proteins. 2021 Dec.

. 2021 Dec;89(12):1987-1996.

doi: 10.1002/prot.26231. Epub 2021 Oct 5.

Affiliations

¹ Genome Center, University of California, Davis, Davis, California, USA.
² Department of Cell Biology and Molecular genetics, Institute for Bioscience and Biotechnology Research, University of Maryland, Rockville, Maryland, USA.
³ Department of Physics & Astronomy, Brigham Young University, Provo, Utah, USA.
⁴ Department of Chemistry, Seoul National University, Seoul, South Korea.
⁵ Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania.

PMID: 34462960
PMCID: PMC8616790
DOI: 10.1002/prot.26231

Abstract

Critical Assessment of Structure Prediction (CASP) is an organization aimed at advancing the state of the art in computing protein structure from sequence. In the spring of 2020, CASP launched a community project to compute the structures of the most structurally challenging proteins coded for in the SARS-CoV-2 genome. Forty-seven research groups submitted over 3000 three-dimensional models and 700 sets of accuracy estimates on 10 proteins. The resulting models were released to the public. CASP community members also worked together to provide estimates of local and global accuracy and identify structure-based domain boundaries for some proteins. Subsequently, two of these structures (ORF3a and ORF8) have been solved experimentally, allowing assessment of both model quality and the accuracy estimates. Models from the AlphaFold2 group were found to have good agreement with the experimental structures, with main chain GDT_TS accuracy scores ranging from 63 (a correct topology) to 87 (competitive with experiment).

Keywords: CASP; COVID; EMA; SARS-CoV-2; model accuracy; protein structure prediction.

PubMed Disclaimer

Figures

**FIGURE 1**
Screenshot of the model consensus table (https://predictioncenter.org/caspcommons/models_consensus2.cgi) for the SARS‐CoV‐2 M‐protein (target C1906) showing local structural agreement along the sequence of the selected model (second column) with the remaining models. The black box shows the region where many models agree, suggesting a relatively easy to model domain

**FIGURE 2**
Maximum consensus scores on CASP‐COVID targets (EMA‐jury—gray bars; overall consensus—black). Targets are ordered by increasing EMA‐jury values. The gray bars are always longer than black ones, indicating that the EMA‐jury method successfully selects subsets of models that are more structurally consistent. The vertical dashed line corresponds to the consensus level of 0.6, which represents 100^th percentile of overall consensus scores for all models (Figure SFQA4). CASP, Critical Assessment of Structure Prediction; CASP‐COVID, CASP community‐wide experiment on modeling SARS‐CoV‐2 proteins causing the coronavirus disease; EMA, estimates of model accuracy;

**FIGURE 3**
Selection of the top model by the estimates of model accuracy (EMA)‐jury (top panel) and simple structural consensus (bottom panel) on 80 CASP13 targets. Maximum per‐target CAD‐scores are shown as pointing up triangles; the CAD‐scores of models selected by the EMA‐jury approach (top) and simple structural consensus method (bottom) are shown as pointing down triangles. The hardest to predict targets (FM) are in red, others in green. Vertical lines between the corresponding triangles represent the error in the selection process. Comparison of the top and bottom panels demonstrates that the EMA‐jury method selects models closer to the best absolute value more often than the simple consensus

**FIGURE 4**
Round 1 three‐dimensional (3D) and accuracy estimation results for SARS2 ORF3a (C1905). (A) Each green cross represents a 3D model, black squares indicate models selected as high accuracy by accuracy estimation methods, and orange circles indicate models selected by the estimates of model accuracy (EMA)‐Jury method. 3D model accuracy is shown in terms of LDDT (y‐axis) and GDT_TS (x‐axis). Only one accuracy estimation method selected a higher accuracy model. (B) Locally inaccurate regions of the highest‐scoring model, AF‐COV_2, according to the ULR definition (left) and as predicted for the same model by the BAKER EMA method (right). The superpositions are identical; the crystal structure is in yellow, ULRs and predicted inaccurate regions are in red and the rest of the model in green

**FIGURE 5**
Round 2 3D and accuracy estimation results for two domains of SARS‐CoV‐2 ORF3a protein (A) C1905‐D1 and (B) C1905‐D2. 3D model accuracy is shown in terms of LDDT (y‐axis) and GDT_TS (x‐axis) (green crosses). The panels show both models from CASP‐COVID and AF‐COV models added in the post‐CASP EMA experiment (pink stars). The models selected by EMA methods as top1 during CASP‐COVID are shown as black hollow squares; models selected in the post‐CASP experiment are in pink hollow squares. For Domain 1, three out of four EMA groups selected one of the higher accuracy AlphaFold models, with many low accuracy models also selected. There is a similar pattern for Domain 2, where two of four methods picked two different AlphaFold models

**FIGURE 6**
Round 1 3D modeling and accuracy estimation (EMA) results for SARS‐CoV‐2 protein ORF8 (C1908). 3D model accuracy for submissions in terms of LDDT (y‐axis) and GDT_TS (x‐axis) (green crosses) and EMA selections (black squares for CASP‐COVID, pink squares for post‐CASP experiment, orange circles for EMA‐Jury). Five AF2 models added in the post‐CASP experiment are shown as pink stars. Two of the AF2 models are impressively accurate. Two post‐CASP EMA methods succeeded in selecting those models as best

See this image and copyright information in PMC

References

1. Kryshtafovych A, Schwede T, Topf M, Fidelis K, Moult J. Critical assessment of methods of protein structure prediction (CASP)‐round XIII. Proteins. 2019;87(12):1011‐1020. - PMC - PubMed
1. Moult J, Fidelis K, Kryshtafovych A, Schwede T, Tramontano A. Critical assessment of methods of protein structure prediction (CASP)‐round XII. Proteins. 2018;86(suppl 1):7‐15. - PMC - PubMed
1. Moult J, Fidelis K, Kryshtafovych A, Schwede T, Tramontano A. Critical assessment of methods of protein structure prediction: Progress and new directions in round XI. Proteins. 2016;84(suppl 1):4‐14. - PMC - PubMed
1. Moult J, Fidelis K, Kryshtafovych A, Schwede T, Tramontano A. Critical assessment of methods of protein structure prediction (CASP)—round x. Proteins. 2014;82(suppl 2):1‐6. - PMC - PubMed
1. Moult J, Fidelis K, Kryshtafovych A, Tramontano A. Critical assessment of methods of protein structure prediction (CASP)—round IX. Proteins. 2011;79(Suppl 10):1‐5. - PMC - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Modeling SARS-CoV-2 proteins in the CASP-commons experiment

Affiliations

Modeling SARS-CoV-2 proteins in the CASP-commons experiment

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Research Materials

Miscellaneous