Systematic comparison of SCOP and CATH: a new gold standard for protein structure analysis
- PMID: 19374763
- PMCID: PMC2678134
- DOI: 10.1186/1472-6807-9-23
Systematic comparison of SCOP and CATH: a new gold standard for protein structure analysis
Abstract
Background: SCOP and CATH are widely used as gold standards to benchmark novel protein structure comparison methods as well as to train machine learning approaches for protein structure classification and prediction. The two hierarchies result from different protocols which may result in differing classifications of the same protein. Ignoring such differences leads to problems when being used to train or benchmark automatic structure classification methods. Here, we propose a method to compare SCOP and CATH in detail and discuss possible applications of this analysis.
Results: We create a new mapping between SCOP and CATH and define a consistent benchmark set which is shown to largely reduce errors made by structure comparison methods such as TM-Align and has useful further applications, e.g. for machine learning methods being trained for protein structure classification. Additionally, we extract additional connections in the topology of the protein fold space from the orthogonal features contained in SCOP and CATH.
Conclusion: Via an all-to-all comparison, we find that there are large and unexpected differences between SCOP and CATH w.r.t. their domain definitions as well as their hierarchic partitioning of the fold space on every level of the two classifications. A consistent mapping of SCOP and CATH can be exploited for automated structure comparison and classification.
Availability: Benchmark sets and an interactive SCOP-CATH browser are available at http://www.bio.ifi.lmu.de/SCOPCath.
Figures


Similar articles
-
AutoSCOP: automated prediction of SCOP classifications using unique pattern-class mappings.Bioinformatics. 2007 May 15;23(10):1203-10. doi: 10.1093/bioinformatics/btm089. Epub 2007 Mar 22. Bioinformatics. 2007. PMID: 17379694
-
Cross-over between discrete and continuous protein structure space: insights into automatic classification and networks of protein structures.PLoS Comput Biol. 2009 Mar;5(3):e1000331. doi: 10.1371/journal.pcbi.1000331. Epub 2009 Mar 27. PLoS Comput Biol. 2009. PMID: 19325884 Free PMC article.
-
Automatic classification of protein structures using low-dimensional structure space mappings.BMC Bioinformatics. 2014;15 Suppl 2(Suppl 2):S1. doi: 10.1186/1471-2105-15-S2-S1. Epub 2014 Jan 24. BMC Bioinformatics. 2014. PMID: 24564500 Free PMC article.
-
The history of the CATH structural classification of protein domains.Biochimie. 2015 Dec;119:209-17. doi: 10.1016/j.biochi.2015.08.004. Epub 2015 Aug 4. Biochimie. 2015. PMID: 26253692 Free PMC article. Review.
-
TAPO: A combined method for the identification of tandem repeats in protein structures.FEBS Lett. 2015 Sep 14;589(19 Pt A):2611-9. doi: 10.1016/j.febslet.2015.08.025. Epub 2015 Aug 29. FEBS Lett. 2015. PMID: 26320412 Review.
Cited by
-
Complex evolutionary footprints revealed in an analysis of reused protein segments of diverse lengths.Proc Natl Acad Sci U S A. 2017 Oct 31;114(44):11703-11708. doi: 10.1073/pnas.1707642114. Epub 2017 Oct 19. Proc Natl Acad Sci U S A. 2017. PMID: 29078314 Free PMC article.
-
A review of visualisations of protein fold networks and their relationship with sequence and function.Biol Rev Camb Philos Soc. 2023 Feb;98(1):243-262. doi: 10.1111/brv.12905. Epub 2022 Oct 9. Biol Rev Camb Philos Soc. 2023. PMID: 36210328 Free PMC article. Review.
-
Multi-criteria protein structure comparison and structural similarities analysis using pyMCPSC.PLoS One. 2018 Oct 17;13(10):e0204587. doi: 10.1371/journal.pone.0204587. eCollection 2018. PLoS One. 2018. PMID: 30332415 Free PMC article.
-
Benchmarking the next generation of homology inference tools.Bioinformatics. 2016 Sep 1;32(17):2636-41. doi: 10.1093/bioinformatics/btw305. Epub 2016 Jun 1. Bioinformatics. 2016. PMID: 27256311 Free PMC article.
-
DomHR: accurately identifying domain boundaries in proteins using a hinge region strategy.PLoS One. 2013 Apr 11;8(4):e60559. doi: 10.1371/journal.pone.0060559. Print 2013. PLoS One. 2013. PMID: 23593247 Free PMC article.
References
-
- Greene LH, Lewis TE, Addou S, Cuff A, Dallman T, Dibley M, Redfern O, Pearl F, Nambudiry R, Reid A, Sillitoe I, Yeats C, Thornton JM, Orengo CA. The CATH domain structure database: new protocols and classification levels give a more comprehensive resource for exploring evolution. Nucleic acids research. 2007:D291–7. doi: 10.1093/nar/gkl959. - DOI - PMC - PubMed
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources