Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jan 8;47(D1):D280-D284.
doi: 10.1093/nar/gky1097.

CATH: expanding the horizons of structure-based functional annotations for genome sequences

Affiliations

CATH: expanding the horizons of structure-based functional annotations for genome sequences

Ian Sillitoe et al. Nucleic Acids Res. .

Abstract

This article provides an update of the latest data and developments within the CATH protein structure classification database (http://www.cathdb.info). The resource provides two levels of release: CATH-B, a daily snapshot of the latest structural domain boundaries and superfamily assignments, and CATH+, which adds layers of derived data, such as predicted sequence domains, functional annotations and functional clustering (known as Functional Families or FunFams). The most recent CATH+ release (version 4.2) provides a huge update in the coverage of structural data. This release increases the number of fully- classified domains by over 40% (from 308 999 to 434 857 structural domains), corresponding to an almost two- fold increase in sequence data (from 53 million to over 95 million predicted domains) organised into 6119 superfamilies. The coverage of high-resolution, protein PDB chains that contain at least one assigned CATH domain is now 90.2% (increased from 82.3% in the previous release). A number of highly requested features have also been implemented in our web pages: allowing the user to view an alignment between their query sequence and a representative FunFam structure and providing tools that make it easier to view the full structural context (multi-domain architecture) of domains and chains.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Comparison of the structural domains and predicted (sequence) domains between CATH+ releases 4.1 and 4.2.
Figure 2.
Figure 2.
Superfamilies in CATH v4.1 and v4.2 highlighting the number of structural domains and predicted domains (shown with a linear and logarithmic scale). Each dot represents a superfamily and the largest 100 superfamilies (according to number of predicted domains in CATH v4.1) have been highlighted in red. These superfamilies contain more than half of all known protein domains. To help illustrate the growth of data, an example superfamily (3.60.20.10) has been circled in each plot. The number of structural domains in this superfamily increased 2.7-fold from v4.1 to v4.2 (3074 to 8328 domains), however this corresponded to only a small increase in the number of predicted domains. On investigation, this superfamily contains domains from a number of large proteasomes, which contain many copies of identical (or very similar) structural domains.
Figure 3.
Figure 3.
Screenshot showing a query sequence aligned to a matching CATH FunFam (following a sequence search). Sequence conservation is calculated for each position in the alignment (blue is low conservation, red is high conservation) and these colours are mapped to a representative structure (if one is available).
Figure 4.
Figure 4.
Interactive links between 3D structure (3DMol.js) and multi-domain architecture (MDA) allow the user to view each domain in the context of the full chain.

References

    1. Berman H., Henrick K., Nakamura H., Markley J.L.. The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data. Nucleic Acids Res. 2007; 35:D301–D303. - PMC - PubMed
    1. Dawson N.L., Lewis T.E., Das S., Lees J.G., Lee D., Ashford P., Orengo C.A., Sillitoe I.. CATH: an expanded resource to predict protein function through structure and sequence. Nucleic Acids Res. 2017; 45:D289–D295. - PMC - PubMed
    1. Orengo C.A., Taylor W.R.. SSAP: sequential structure alignment program for protein structure comparison. Methods Enzymol. 1996; 266:617–635. - PubMed
    1. Brandt B.W., Heringa J.. webPRC: the Profile Comparer for alignment-based searching of public domain databases. Nucleic Acids Res. 2009; 37:W48–W52. - PMC - PubMed
    1. UniProt Consortium, T UniProt: the universal protein knowledgebase. Nucleic Acids Res. 2018; 46:2699. - PMC - PubMed

Publication types