The CATH Domain Structure Database and related resources Gene3D and DHS provide comprehensive domain family information for genome analysis

Affiliations

PMID: 15608188
PMCID: PMC539978
DOI: 10.1093/nar/gki024

The CATH Domain Structure Database and related resources Gene3D and DHS provide comprehensive domain family information for genome analysis

Frances Pearl et al. Nucleic Acids Res. 2005.

. 2005 Jan 1;33(Database issue):D247-51.

doi: 10.1093/nar/gki024.

Affiliation

¹ Biochemistry and Molecular Biology Department, University College London, University of London, Gower Street, London WC1E 6BT, UK.

PMID: 15608188
PMCID: PMC539978
DOI: 10.1093/nar/gki024

Abstract

The CATH database of protein domain structures (http://www.biochem.ucl.ac.uk/bsm/cath/) currently contains 43,229 domains classified into 1467 superfamilies and 5107 sequence families. Each structural family is expanded with sequence relatives from GenBank and completed genomes, using a variety of efficient sequence search protocols and reliable thresholds. This extended CATH protein family database contains 616,470 domain sequences classified into 23,876 sequence families. This results in the significant expansion of the CATH HMM model library to include models built from the CATH sequence relatives, giving a 10% increase in coverage for detecting remote homologues. An improved Dictionary of Homologous superfamilies (DHS) (http://www.biochem.ucl.ac.uk/bsm/dhs/) containing specific sequence, structural and functional information for each superfamily in CATH considerably assists manual validation of homologues. Information on sequence relatives in CATH superfamilies, GenBank and completed genomes is presented in the CATH associated DHS and Gene3D resources. Domain partnership information can be obtained from Gene3D (http://www.biochem.ucl.ac.uk/bsm/cath/Gene3D/). A new CATH server has been implemented (http://www.biochem.ucl.ac.uk/cgi-bin/cath/CathServer.pl) providing automatic classification of newly determined sequences and structures using a suite of rapid sequence and structure comparison methods. The statistical significance of matches is assessed and links are provided to the putative superfamily or fold group to which the query sequence or structure is assigned.

PubMed Disclaimer

Figures

**Figure 1**
The proportion (%) of structures from the PDB that have been classified in CATH over the last two years using different sequence comparison or structure comparison methods. Blue segment: PDB sequences with 95% sequence identity or more to existing CATH domains, recognized using SSEARCH. Magenta segment: PDB sequences with 30% sequence identity or more to existing CATH domains, recognized using SSEARCH. Yellow segment: PDB entries that can be assigned to existing CATH superfamilies by scanning the HMM library. Green segment: PDB entries that can be assigned to CATH superfamilies by structure comparisons against CATH representatives using SSAP. Purple segment: PDB entries that can be assigned to CATH fold groups by structure comparisons against CATH representatives using SSAP. Orange segment: PDB entries that do not match any CATH structure and represent novel folds.

**Figure 2**
CATHerine wheels (a) illustrating the distribution of domain structures from the PDB among the different levels in the CATH hierarchy. The three classes are illustrated in colour, mainly α pink, mainly β yellow and α−β green. The inner wheel corresponds to different architectures in the classification and the outer wheel to different fold groups. Each fold group has been subdivided according to the numbers and populations of different homologous superfamilies adopting that fold. (b) Illustrating the distribution of CATH domains among the sequences from 150 completed genomes, in Gene3D. In this case, the fold groups labelled in the outer circle have been divided according to the number and size of close sequence families within each fold group.

See this image and copyright information in PMC

References

1. Bray J.E., Todd,A.E., Pearl,F.M., Thornton,J.M. and Orengo,C.A. (2000) The CATH Dictionary of Homologous Superfamilies (DHS): a consensus approach for identifying distant structural homologues. Protein Eng., 13, 153–165. - PubMed
1. Taylor W. and Orengo,C. (1989) Protein structure alignment. J. Mol. Biol., 208, 1–22. - PubMed
1. Orengo C. (1999) CORA—topological fingerprints for protein structural families. Protein Sci., 8, 699–715. - PMC - PubMed
1. Berman H.M., Westbrook,J., Feng,Z., Gilliland,G., Bhat,T.N., Weissig,H., Shindyalov,I.N. and Bourne,P.E. (2000) The Protein Data Bank. Nucleic Acids Res., 28, 235–242. - PMC - PubMed
1. Benson D.A., Karsch-Mizrachi,I., Lipman,D.J., Ostell,J. and Wheeler,D.L. (2004) GenBank: update. Nucleic Acids Res., 32, 23–26. - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

The CATH Domain Structure Database and related resources Gene3D and DHS provide comprehensive domain family information for genome analysis

Affiliation

The CATH Domain Structure Database and related resources Gene3D and DHS provide comprehensive domain family information for genome analysis

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources