Comparative Study

. 2009 Apr 17:9:23.

doi: 10.1186/1472-6807-9-23.

Systematic comparison of SCOP and CATH: a new gold standard for protein structure analysis

Gergely Csaba¹, Fabian Birzele, Ralf Zimmer

Affiliations

PMID: 19374763
PMCID: PMC2678134
DOI: 10.1186/1472-6807-9-23

Comparative Study

Systematic comparison of SCOP and CATH: a new gold standard for protein structure analysis

Gergely Csaba et al. BMC Struct Biol. 2009.

. 2009 Apr 17:9:23.

doi: 10.1186/1472-6807-9-23.

Authors

Gergely Csaba¹, Fabian Birzele, Ralf Zimmer

Affiliation

¹ Department of Informatics, Ludwig-Maximilians-Universität München, Munich, Germany. gergely.csaba@bio.ifi.lmu.de

PMID: 19374763
PMCID: PMC2678134
DOI: 10.1186/1472-6807-9-23

Abstract

Background: SCOP and CATH are widely used as gold standards to benchmark novel protein structure comparison methods as well as to train machine learning approaches for protein structure classification and prediction. The two hierarchies result from different protocols which may result in differing classifications of the same protein. Ignoring such differences leads to problems when being used to train or benchmark automatic structure classification methods. Here, we propose a method to compare SCOP and CATH in detail and discuss possible applications of this analysis.

Results: We create a new mapping between SCOP and CATH and define a consistent benchmark set which is shown to largely reduce errors made by structure comparison methods such as TM-Align and has useful further applications, e.g. for machine learning methods being trained for protein structure classification. Additionally, we extract additional connections in the topology of the protein fold space from the orthogonal features contained in SCOP and CATH.

Conclusion: Via an all-to-all comparison, we find that there are large and unexpected differences between SCOP and CATH w.r.t. their domain definitions as well as their hierarchic partitioning of the fold space on every level of the two classifications. A consistent mapping of SCOP and CATH can be exploited for automated structure comparison and classification.

Availability: Benchmark sets and an interactive SCOP-CATH browser are available at http://www.bio.ifi.lmu.de/SCOPCath.

PubMed Disclaimer

Figures

**Figure 1**
**Detailed comparison of protein structure benchmark sets**. The figure compares the performance of TM-align on the complete set of similarity relationships defined by SCOP (left column) and the performance on the novel SCOP-CATH consensus benchmark set proposed in this study (right column). For this purpose, the TM-Align performance is visualized via various plots which show in some detail the evaluation of classification errors. Panels (a) and (b) shows the distribution of scores for the various levels of the classifications. Although the fold scores are somewhat shifted to the right, the score distributions overlap significantly, which allows no clear thresholds for safe classifications of structure pairs. Panels (b)-(f) compare the various errors for the comprehensive and consensus benchmark sets. As errors we count wrong domains scored better than correct domains. The errors are significantly reduced on the consensus set (d) and (f). Finally, in panels (g)-(h) the errors (number of wrong folds scored better than certain correct folds) are summarized as boxplots. Again less errors are observed in the consensus set: whereas for the best scored correct domains quite few wrong folds are scored better in both sets, quite many better scoring but wrong folds are observed for the correct members with low scores. See main text for a more detailed description. Overall the number of errors is reduced over-proportionally (about 50% error reduction) as compared to the reduction of pairs in the consensus benchmark (about 16% pairs reduction).

**Figure 2**
**Linking different folds via consistency checks**. a) Shows the method of connecting different folds in i.e. SCOP via a link proposed by the mapping of SCOP and CATH. Nodes in the graph represent SCOP folds, edges connect two nodes *iff* at least 5 members of the SCOP fold are mapped to the same CATH topology b) Shows the interfold similarity of α-hairpin proteins in SCOP which are clustered in the same fold according to CATH (1.10.287). c) Shows a more complicated fold graph clustering proteins of immunoglobulin (CATH 2.60.40) and jelly-roll topologies (CATH 2.60.120) in a non-clique subgraph. All fold graphs may be interactively explored on .

See this image and copyright information in PMC

Cited by

Complex evolutionary footprints revealed in an analysis of reused protein segments of diverse lengths.
Nepomnyachiy S, Ben-Tal N, Kolodny R. Nepomnyachiy S, et al. Proc Natl Acad Sci U S A. 2017 Oct 31;114(44):11703-11708. doi: 10.1073/pnas.1707642114. Epub 2017 Oct 19. Proc Natl Acad Sci U S A. 2017. PMID: 29078314 Free PMC article.
A review of visualisations of protein fold networks and their relationship with sequence and function.
Sykes J, Holland BR, Charleston MA. Sykes J, et al. Biol Rev Camb Philos Soc. 2023 Feb;98(1):243-262. doi: 10.1111/brv.12905. Epub 2022 Oct 9. Biol Rev Camb Philos Soc. 2023. PMID: 36210328 Free PMC article. Review.
Multi-criteria protein structure comparison and structural similarities analysis using pyMCPSC.
Sharma A, Manolakos ES. Sharma A, et al. PLoS One. 2018 Oct 17;13(10):e0204587. doi: 10.1371/journal.pone.0204587. eCollection 2018. PLoS One. 2018. PMID: 30332415 Free PMC article.
Benchmarking the next generation of homology inference tools.
Saripella GV, Sonnhammer EL, Forslund K. Saripella GV, et al. Bioinformatics. 2016 Sep 1;32(17):2636-41. doi: 10.1093/bioinformatics/btw305. Epub 2016 Jun 1. Bioinformatics. 2016. PMID: 27256311 Free PMC article.
DomHR: accurately identifying domain boundaries in proteins using a hinge region strategy.
Zhang XY, Lu LJ, Song Q, Yang QQ, Li DP, Sun JM, Li TH, Cong PS. Zhang XY, et al. PLoS One. 2013 Apr 11;8(4):e60559. doi: 10.1371/journal.pone.0060559. Print 2013. PLoS One. 2013. PMID: 23593247 Free PMC article.

See all "Cited by" articles

References

1. Berman H, Westbrook J, Feng Z, Gilliland G, Bhat T, Weissig H, Shindyalov I, Bourne P. The Protein Data Bank. Nucleic Acids Research. 2000;28:235–242. doi: 10.1093/nar/28.1.235. - DOI - PMC - PubMed
1. Andreeva A, Howorth D, Chandonia J, Brenner S, Hubbard T, Chothia C, Murzin A. Data growth and its impact on the SCOP database: new developments. Nucleic Acids Res. 2008;36:D419–425. doi: 10.1093/nar/gkm993. - DOI - PMC - PubMed
1. Greene LH, Lewis TE, Addou S, Cuff A, Dallman T, Dibley M, Redfern O, Pearl F, Nambudiry R, Reid A, Sillitoe I, Yeats C, Thornton JM, Orengo CA. The CATH domain structure database: new protocols and classification levels give a more comprehensive resource for exploring evolution. Nucleic acids research. 2007:D291–7. doi: 10.1093/nar/gkl959. - DOI - PMC - PubMed
1. Reeves G, Dallman T, Redfern O, Akpor A, Orengo C. Structural diversity of domain superfamilies in the CATH database. J Mol Biol. 2006;360:725–741. doi: 10.1016/j.jmb.2006.05.035. - DOI - PubMed
1. Todd A, Orengo C, Thornton J. Evolution of function in protein superfamilies, from a structural perspective. J Mol Biol. 2001;307:1113–1143. doi: 10.1006/jmbi.2001.4513. - DOI - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Systematic comparison of SCOP and CATH: a new gold standard for protein structure analysis

Affiliation

Systematic comparison of SCOP and CATH: a new gold standard for protein structure analysis

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

LinkOut - more resources

Full Text Sources