Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2003 Jan 1;31(1):452-5.
doi: 10.1093/nar/gkg062.

The CATH database: an extended protein family resource for structural and functional genomics

Affiliations

The CATH database: an extended protein family resource for structural and functional genomics

F M G Pearl et al. Nucleic Acids Res. .

Abstract

The CATH database of protein domain structures (http://www.biochem.ucl.ac.uk/bsm/cath_new) currently contains 34 287 domain structures classified into 1383 superfamilies and 3285 sequence families. Each structural family is expanded with domain sequence relatives recruited from GenBank using a variety of efficient sequence search protocols and reliable thresholds. This extended resource, known as the CATH-protein family database (CATH-PFDB) contains a total of 310 000 domain sequences classified into 26 812 sequence families. New sequence search protocols have been designed, based on these intermediate sequence libraries, to allow more regular updating of the classification. Further developments include the adaptation of a recently developed method for rapid structure comparison, based on secondary structure matching, for domain boundary assignment. The philosophy behind CATHEDRAL is the recognition of recurrent folds already classified in CATH. Benchmarking of CATHEDRAL, using manually validated domain assignments, demonstrated that 43% of domains boundaries could be completely automatically assigned. This is an improvement on a previous consensus approach for which only 10-20% of domains could be reliably processed in a completely automated fashion. Since domain boundary assignment is a significant bottleneck in the classification of new structures, CATHEDRAL will also help to increase the frequency of CATH updates.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Flowchart of the new CATH protocol which uses intermediate sequence searching to classify newly determined structures. Sequences are BLASTed against an intermediate sequence library (CATH-PFDB) and potential matches are structurally validated before clustering into sequence families.
Figure 2
Figure 2
Coverage plots showing the proportion of structures which are assigned to the correct fold group, using the GRATH algorithm, within the top N ranked matches returned from a database search of non-identical representatives from CATH.

References

    1. Harrison A., Pearl,F., Sillitoe,I., Slidel,T., Mott,R., Thornton,J. and Orengo,C. (2002) A fast method for reliably recognising the fold of a protein structure. Submitted to Bioinformatics. - PubMed
    1. Harrison A., Pearl,F., Sillitoe,I., Thornton,J. and Orengo,C. (2002) CATHEDRAL: an effective algorithm to delineate previously seen folds within a multi-domain structure. In preparation.
    1. Jones D.T., Taylor,W.R. and Thornton,J.M. (1992) A new approach to protein fold recognition. Nature, 358, 86–89. - PubMed
    1. Brenner S.E., Chothia,C. and Hubbard,T.J.P. (1998) Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships. Proc. Natl Acad. Sci. USA, 95, 6073–6078. - PMC - PubMed
    1. Pearl F., Todd,A.E., Bray,J.E., Martin,A.C., Salamov,A.A., Suwa,M., Swindells,M.B., Thornton,J.M. and Orengo,C.A. (2000) Using the CATH domain database to assign structures and functions to the genome sequences. Biochem. Soc. Trans., 28, 269–275. - PubMed

Publication types