CATH: expanding the horizons of structure-based functional annotations for genome sequences

Affiliations

¹ Structural and Molecular Biology, University College London WC1E 6BT, UK.
² European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK.

PMID: 30398663
PMCID: PMC6323983
DOI: 10.1093/nar/gky1097

CATH: expanding the horizons of structure-based functional annotations for genome sequences

Ian Sillitoe et al. Nucleic Acids Res. 2019.

. 2019 Jan 8;47(D1):D280-D284.

doi: 10.1093/nar/gky1097.

Affiliations

¹ Structural and Molecular Biology, University College London WC1E 6BT, UK.
² European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK.

PMID: 30398663
PMCID: PMC6323983
DOI: 10.1093/nar/gky1097

Abstract

This article provides an update of the latest data and developments within the CATH protein structure classification database (http://www.cathdb.info). The resource provides two levels of release: CATH-B, a daily snapshot of the latest structural domain boundaries and superfamily assignments, and CATH+, which adds layers of derived data, such as predicted sequence domains, functional annotations and functional clustering (known as Functional Families or FunFams). The most recent CATH+ release (version 4.2) provides a huge update in the coverage of structural data. This release increases the number of fully- classified domains by over 40% (from 308 999 to 434 857 structural domains), corresponding to an almost two- fold increase in sequence data (from 53 million to over 95 million predicted domains) organised into 6119 superfamilies. The coverage of high-resolution, protein PDB chains that contain at least one assigned CATH domain is now 90.2% (increased from 82.3% in the previous release). A number of highly requested features have also been implemented in our web pages: allowing the user to view an alignment between their query sequence and a representative FunFam structure and providing tools that make it easier to view the full structural context (multi-domain architecture) of domains and chains.

PubMed Disclaimer

Figures

**Figure 1.**
Comparison of the structural domains and predicted (sequence) domains between CATH+ releases 4.1 and 4.2.

**Figure 2.**
Superfamilies in CATH v4.1 and v4.2 highlighting the number of structural domains and predicted domains (shown with a linear and logarithmic scale). Each dot represents a superfamily and the largest 100 superfamilies (according to number of predicted domains in CATH v4.1) have been highlighted in red. These superfamilies contain more than half of all known protein domains. To help illustrate the growth of data, an example superfamily (3.60.20.10) has been circled in each plot. The number of structural domains in this superfamily increased 2.7-fold from v4.1 to v4.2 (3074 to 8328 domains), however this corresponded to only a small increase in the number of predicted domains. On investigation, this superfamily contains domains from a number of large proteasomes, which contain many copies of identical (or very similar) structural domains.

**Figure 3.**
Screenshot showing a query sequence aligned to a matching CATH FunFam (following a sequence search). Sequence conservation is calculated for each position in the alignment (blue is low conservation, red is high conservation) and these colours are mapped to a representative structure (if one is available).

**Figure 4.**
Interactive links between 3D structure (3DMol.js) and multi-domain architecture (MDA) allow the user to view each domain in the context of the full chain.

See this image and copyright information in PMC

References

1. Berman H., Henrick K., Nakamura H., Markley J.L.. The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data. Nucleic Acids Res. 2007; 35:D301–D303. - PMC - PubMed
1. Dawson N.L., Lewis T.E., Das S., Lees J.G., Lee D., Ashford P., Orengo C.A., Sillitoe I.. CATH: an expanded resource to predict protein function through structure and sequence. Nucleic Acids Res. 2017; 45:D289–D295. - PMC - PubMed
1. Orengo C.A., Taylor W.R.. SSAP: sequential structure alignment program for protein structure comparison. Methods Enzymol. 1996; 266:617–635. - PubMed
1. Brandt B.W., Heringa J.. webPRC: the Profile Comparer for alignment-based searching of public domain databases. Nucleic Acids Res. 2009; 37:W48–W52. - PMC - PubMed
1. UniProt Consortium, T UniProt: the universal protein knowledgebase. Nucleic Acids Res. 2018; 46:2699. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

CATH: expanding the horizons of structure-based functional annotations for genome sequences

Affiliations

CATH: expanding the horizons of structure-based functional annotations for genome sequences

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources