Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Jan 4;45(D1):D289-D295.
doi: 10.1093/nar/gkw1098. Epub 2016 Nov 28.

CATH: an expanded resource to predict protein function through structure and sequence

Affiliations

CATH: an expanded resource to predict protein function through structure and sequence

Natalie L Dawson et al. Nucleic Acids Res. .

Abstract

The latest version of the CATH-Gene3D protein structure classification database has recently been released (version 4.1, http://www.cathdb.info). The resource comprises over 300 000 domain structures and over 53 million protein domains classified into 2737 homologous superfamilies, doubling the number of predicted protein domains in the previous version. The daily-updated CATH-B, which contains our very latest domain assignment data, provides putative classifications for over 100 000 additional protein domains. This article describes developments to the CATH-Gene3D resource over the last two years since the publication in 2015, including: significant increases to our structural and sequence coverage; expansion of the functional families in CATH; building a support vector machine (SVM) to automatically assign domains to superfamilies; improved search facilities to return alignments of query sequences against multiple sequence alignments; the redesign of the web pages and download site.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Visualisation of functional and structural diversity in the HUP superfamily using Cytoscape (25). The nodes in the network represent FunFams and the edges represent sequence similarities between the FunFam HMMs calculated using Profile Comparer (PRC) (16). The size of the nodes (FunFams) reflects their number of sequences and the nodes are linked by edges if the similarity of their HMMs is above a PRC score of 10. (A) This network highlights the functional diversity of the HUP superfamily where all nodes are coloured according to the EC numbers of their constituent sequences and grey nodes indicate those without any EC annotation (including non-enzymes). (B) This network shows the available structure data among the FunFams with high information content in the HUP superfamily. The purple coloured nodes indicate FunFams with known structure and the grey nodes indicate FunFams without any known structure. Structural representatives of selected FunFams (encircled and numbered in red) are shown at the bottom of the figure to highlight the structural diversity of the superfamily.
Figure 2.
Figure 2.
The precision and recall results from the first benchmark where equal sets of carefully chosen homologous and non-homologous domain pairs were using for the training and testing of various scoring algorithms.
Figure 3.
Figure 3.
Redesigned home page for CATH-Gene3D.
Figure 4.
Figure 4.
Section of the CATH web pages displaying the multiple sequence alignment for a CATH FunFam (3.40.50.620/FF/89168) underneath the structural domain chosen to represent the cluster (1ct9C02). The degree of sequence conservation is highlighted on a sliding colour scale on both the alignment and the structure (blue-red signifying low-high conservation). Clicking on the alignment positions on the alignment highlights accompanying residues in the structure. The sequence alignment and 3D structure are displayed using open source tools: MSAViewer and 3DMol.js, respectively.

References

    1. Sillitoe I., Lewis T.E., Cuff A., Das S., Ashford P., Dawson N.L., Furnham N., Laskowski R.A., Lee D., Lees J.G., et al. CATH: comprehensive structural and functional annotations for genome sequences. Nucleic Acids Res. 2015;43:D376–D381. - PMC - PubMed
    1. Rose P.W., Prlić A., Bi C., Bluhm W.F., Christie C.H., Dutta S., Green R.K., Goodsell D.S., Westbrook J.D., Woo J., et al. The RCSB Protein Data Bank: views of structural biology for basic and applied research and education. Nucleic Acids Res. 2015;43:D345–D356. - PMC - PubMed
    1. The UniProt Consortium. UniProt: a hub for protein information. Nucleic Acids Res. 2014;43:D204–D212. - PMC - PubMed
    1. Aken B.L., Ayling S., Barrell D., Clarke L., Curwen V., Fairley S., Fernandez Banet J., Billis K., García Girón C., Hourlier T. The Ensembl gene annotation system. Database (Oxford). 2016;44:D710–D716. - PMC - PubMed
    1. Lam S.D., Dawson N.L., Das S., Sillitoe I., Ashford P., Lee D., Lehtinen S., Orengo C.A., Lees J.G. Gene3D: expanding the utility of domain assignments. Nucleic Acids Res. 2016;44:D404–D409. - PMC - PubMed

Publication types