Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jan 8;49(D1):D266-D273.
doi: 10.1093/nar/gkaa1079.

CATH: increased structural coverage of functional space

Affiliations

CATH: increased structural coverage of functional space

Ian Sillitoe et al. Nucleic Acids Res. .

Abstract

CATH (https://www.cathdb.info) identifies domains in protein structures from wwPDB and classifies these into evolutionary superfamilies, thereby providing structural and functional annotations. There are two levels: CATH-B, a daily snapshot of the latest domain structures and superfamily assignments, and CATH+, with additional derived data, such as predicted sequence domains, and functionally coherent sequence subsets (Functional Families or FunFams). The latest CATH+ release, version 4.3, significantly increases coverage of structural and sequence data, with an addition of 65,351 fully-classified domains structures (+15%), providing 500 238 structural domains, and 151 million predicted sequence domains (+59%) assigned to 5481 superfamilies. The FunFam generation pipeline has been re-engineered to cope with the increased influx of data. Three times more sequences are captured in FunFams, with a concomitant increase in functional purity, information content and structural coverage. FunFam expansion increases the structural annotations provided for experimental GO terms (+59%). We also present CATH-FunVar web-pages displaying variations in protein sequences and their proximity to known or predicted functional sites. We present two case studies (1) putative cancer drivers and (2) SARS-CoV-2 proteins. Finally, we have improved links to and from CATH including SCOP, InterPro, Aquaria and 2DProt.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Number of structural domains classified in CATH releases over time.
Figure 2.
Figure 2.
FunFam annotations of SARS-CoV-2 Spike protein as shown in Aquaria (https://aquaria.ws/P0DTC2/6zxn/A). Each FunFam domain in the sequence viewer matches the same domain in the 3D representation.
Figure 3.
Figure 3.
EC codes purity histograms for CATH 4.2 and 4.3 FunFams. The plot represents the number of FunFams with the same associated EC4 term across all sequences within the FunFam alignment. Only experimentally characterised EC terms were used in the validation. Pure FunFams have 1 EC4 term associated with them, two or more could be a potential indication of functional pollution. The overall EC purity in FunFams increased between releases.
Figure 4.
Figure 4.
2DProts diagrams in the new CATH v4.3 pages provide a simplified view of the consensus topology for the domains within a given superfamily (SuperFamily 2.140.10.30, which adopts a beta propeller arrangement).
Figure 5.
Figure 5.
CATH FunVar web interface, highlighting all putative cancer mutations identified in CATH FunFams (top). On the bottom, we show mutations in one example FunFam. The left hand panel shows the degree of chemical change for each mutation, measured by the Grantham Score (25). Whilst the right hand panel shows a 3D representative, highlighting the locations of the mutations.

References

    1. Orengo C., Michie A., Jones S., Jones D., Swindells M., Thornton J.. CATH – a hierarchic classification of protein domain structures. Structure. 1997; 5:1093–1109. - PubMed
    1. Pearl F.M.G., Bennett C.F., Bray J.E., Harrison A.P., Martin N., Shepherd A., Sillitoe I., Thornton J., Orengo C.A.. The CATH database: an extended protein family resource for structural and functional genomics. Nucleic Acids Res. 2003; 31:452–455. - PMC - PubMed
    1. Sillitoe I., Dawson N., Lewis T.E., Das S., Lees J.G., Ashford P., Tolulope A., Scholes H.M., Senatorov I., Bujan A. et al. .. CATH: expanding the horizons of structure-based functional annotations for genome sequences. Nucleic Acids Res. 2019; 47:D280–D284. - PMC - PubMed
    1. Lewis T.E., Sillitoe I., Dawson N., Lam S.D., Clarke T., Lee D., Orengo C., Lees J.. Gene3D: Extensive prediction of globular domains in proteins. Nucleic Acids Res. 2018; 46:D435–D439. - PMC - PubMed
    1. The UniProt Consortium UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res. 2019; 47:D506–D515. - PMC - PubMed

Publication types

MeSH terms