. 2019 Dec;87(12):1378-1387.

doi: 10.1002/prot.25815. Epub 2019 Oct 16.

Introducing "best single template" models as reference baseline for the Continuous Automated Model Evaluation (CAMEO)

Juergen Haas¹, Rafal Gumienny², Alessandro Barbato³, Flavio Ackermann¹, Gerardo Tauriello¹, Martino Bertoni³, Gabriel Studer¹, Anna Smolinski¹, Torsten Schwede¹

Affiliations

¹ Computational Structural Biology, University of Basel, Switzerland.
² Computational Structural Biology, Swiss Institute of Bioinformatics, Switzerland.
³ Computational Structural Biology, Universitat Basel Department Biozentrum, Switzerland.

PMID: 31571280
PMCID: PMC8196401
DOI: 10.1002/prot.25815

Introducing "best single template" models as reference baseline for the Continuous Automated Model Evaluation (CAMEO)

Juergen Haas et al. Proteins. 2019 Dec.

. 2019 Dec;87(12):1378-1387.

doi: 10.1002/prot.25815. Epub 2019 Oct 16.

Authors

Juergen Haas¹, Rafal Gumienny², Alessandro Barbato³, Flavio Ackermann¹, Gerardo Tauriello¹, Martino Bertoni³, Gabriel Studer¹, Anna Smolinski¹, Torsten Schwede¹

Affiliations

¹ Computational Structural Biology, University of Basel, Switzerland.
² Computational Structural Biology, Swiss Institute of Bioinformatics, Switzerland.
³ Computational Structural Biology, Universitat Basel Department Biozentrum, Switzerland.

PMID: 31571280
PMCID: PMC8196401
DOI: 10.1002/prot.25815

Abstract

Critical blind assessment of structure prediction techniques is crucial for the scientific community to establish the state of the art, identify bottlenecks, and guide future developments. In Critical Assessment of Techniques in Structure Prediction (CASP), human experts assess the performance of participating methods in relation to the difficulty of the prediction task in a biennial experiment on approximately 100 targets. Yet, the development of automated computational modeling methods requires more frequent evaluation cycles and larger sets of data. The "Continuous Automated Model EvaluatiOn (CAMEO)" platform complements CASP by conducting fully automated blind prediction evaluations based on the weekly pre-release of sequences of those structures, which are going to be published in the next release of the Protein Data Bank (PDB). Each week, CAMEO publishes benchmarking results for predictions corresponding to a set of about 20 targets collected during a 4-day prediction window. CAMEO benchmarking data are generated consistently for all methods at the same point in time, enabling developers to cross-validate their method's performance, and referring to their results in publications. Many successful participants of CASP have used CAMEO-either by directly benchmarking their methods within the system or by comparing their own performance to CAMEO reference data. CAMEO offers a variety of scores reflecting different aspects of structure modeling, for example, binding site accuracy, homo-oligomer interface quality, or accuracy of local model confidence estimates. By introducing the "bestSingleTemplate" method based on structure superpositions as a reference for the accuracy of 3D modeling predictions, CAMEO facilitates objective comparison of techniques and fosters the development of advanced methods.

Keywords: CAMEO; CASP; benchmarking; continuous evaluation; homo-oligomer interface accuracy; ligand binding-site accuracy; model confidence; model quality assessment; oligomeric assessment; protein structure modeling; protein structure prediction.

PubMed Disclaimer

Conflict of interest statement

CONFLICT OF INTEREST

The authors declare that they have no conflicts of interest with the contents of this article.

Figures

**FIGURE 1**
Compared with the NaiveBLAST server in units lDDT, the medians are depicted by the horizontal bar in the boxes. The sort order is by the decreasing median. The number of targets used in the comparison by server is indicated at the top of each column, with a maximum of 228 out of a total of 248 targets returned by NaiveBLAST. The data set covers the time from 1 May 2018 to 28 July 2018

**FIGURE 2**
Compared with the “bestSingleTemplate” method in units lDDT, the medians are depicted by the horizontal bar in the boxes. The sort order is by the decreasing median. The number of targets returned by each server is indicated, and the total number of targets is 248. The data set covers the time from 1 May 2018 to 28 July 2018

**FIGURE 3**
A, Partial precision-recall AUC, blue vertical line depicts the threshold of 0.2 FPR; B, partial ROC AUC, the blue vertical line indicates the threshold of 80% recall; C, pROC AUC vs the pPR AUC domain, applying an lDDT threshold of 60. The dashed lines represent the AUCs for the random predictor in the ROC domain and for the expected precision at 100% recall for the PR domain. Areas in grey are below these thresholds and would be considered performing worse than random. D, model quality distribution of the QE target set in units lDDT. The data set covers the time from 1 May 2018 to 28 July. AUC, area under the curves; ROC, receiver-operator characteristic

**FIGURE 4**
Historic development of quality estimation tools. The improvements are impressive spanning early developments and recent approaches over the last 22 years, from well-known tools such as PROSA,^, Verify3D, DFIRE to the latest contestants such as ProQ3, ModFOLD7_lDDT and QMEANDisCo3. The years are assigned roughly to the best server of a particular year. The black empty circle illustrates estimated performance of QMEAN (Version 7.11) based on earlier CAMEO data. The blue star depicts the estimated performance of ProQ3 based on three months (17 May 2019 to 10 August 2019) of CAMEO data

See this image and copyright information in PMC

References

1. Schwede T, Sali A, Honig B, et al. Outcome of a workshop on applications of protein models in biomedical research. Structure. 2009;17: 151–159. - PMC - PubMed
1. Croll TI, Sammito MD, Kryshtafovych A, Read RJ. Evaluation of template-based modeling in CASP13. Proteins. 2019;87(12):1113–1127. 10.1002/prot.25800 - DOI - PMC - PubMed
1. Abriata LA, Tamò GE, Dal Peraro M. A further leap of improvement in tertiary structure prediction in CASP13 prompts new routes for future assessments. Proteins. 2019;87(12):1100–1112. 10.1002/prot.25787 - DOI - PubMed
1. Haas J, Barbato A, Behringer D, et al. Continuous Automated Model EvaluatiOn (CAMEO) complementing the critical assessment of structure prediction in CASP12. Proteins. 2018;86(Suppl 1):387–398. - PMC - PubMed
1. Berman H, Henrick K, Nakamura H, Markley JL. The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data. Nucleic Acids Res. 2007;35:D301–D303. - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Introducing "best single template" models as reference baseline for the Continuous Automated Model Evaluation (CAMEO)

Affiliations

Introducing "best single template" models as reference baseline for the Continuous Automated Model Evaluation (CAMEO)

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Molecular Biology Databases