. 2007 Nov;3(11):e232.

doi: 10.1371/journal.pcbi.0030232.

CATHEDRAL: a fast and effective algorithm to predict folds and domain boundaries from multidomain protein structures

Oliver C Redfern¹, Andrew Harrison, Tim Dallman, Frances M G Pearl, Christine A Orengo

Affiliations

PMID: 18052539
PMCID: PMC2098860
DOI: 10.1371/journal.pcbi.0030232

CATHEDRAL: a fast and effective algorithm to predict folds and domain boundaries from multidomain protein structures

Oliver C Redfern et al. PLoS Comput Biol. 2007 Nov.

. 2007 Nov;3(11):e232.

doi: 10.1371/journal.pcbi.0030232.

Authors

Oliver C Redfern¹, Andrew Harrison, Tim Dallman, Frances M G Pearl, Christine A Orengo

Affiliation

¹ Department of Biochemistry and Molecular Biology, University College London, London, United Kingdom. ollie@biochem.ucl.ac.uk

PMID: 18052539
PMCID: PMC2098860
DOI: 10.1371/journal.pcbi.0030232

Abstract

We present CATHEDRAL, an iterative protocol for determining the location of previously observed protein folds in novel multidomain protein structures. CATHEDRAL builds on the features of a fast secondary-structure-based method (using graph theory) to locate known folds within a multidomain context and a residue-based, double-dynamic programming algorithm, which is used to align members of the target fold groups against the query protein structure to identify the closest relative and assign domain boundaries. To increase the fidelity of the assignments, a support vector machine is used to provide an optimal scoring scheme. Once a domain is verified, it is excised, and the search protocol is repeated in an iterative fashion until all recognisable domains have been identified. We have performed an initial benchmark of CATHEDRAL against other publicly available structure comparison methods using a consensus dataset of domains derived from the CATH and SCOP domain classifications. CATHEDRAL shows superior performance in fold recognition and alignment accuracy when compared with many equivalent methods. If a novel multidomain structure contains a known fold, CATHEDRAL will locate it in 90% of cases, with <1% false positives. For nearly 80% of assigned domains in a manually validated test set, the boundaries were correctly delineated within a tolerance of ten residues. For the remaining cases, previously classified domains were very remotely related to the query chain so that embellishments to the core of the fold caused significant differences in domain sizes and manual refinement of the boundaries was necessary. To put this performance in context, a well-established sequence method based on hidden Markov models was only able to detect 65% of domains, with 33% of the subsequent boundaries assigned within ten residues. Since, on average, 50% of newly determined protein structures contain more than one domain unit, and typically 90% or more of these domains are already classified in CATH, CATHEDRAL will considerably facilitate the automation of protein structure classification.

PubMed Disclaimer

Conflict of interest statement

Competing interests. The authors have declared that no competing interests exist.

Figures

**Figure 1. Percentage of Multidomain Chains with a Given Number of Component Domains**

**Figure 2. Example of a Multidomain Protein (PDB: 1cg2) Chain Containing a Discontiguous Domain**
Domain two (blue) is inserted between two segments of domain one (red).

Figure 3. ROC (True Positive Rate Versus False Positive Rate) Curve Plotted for Different Structural Comparison Methods Based on the SAS, Where a Positive Match Represents a True CATH–SCOP Fold Match
TPR, true positive rate; FPR, false positive rate.

**Figure 4. Graph of the Percentage of Correct Folds Matched Against the Ranked Native Score for the CATH–SCOP Dataset**

**Figure 5. Comparison of Alignment Quality of Domains Adopting the Same CATH Fold Using Two Geometric Scoring Schemes**
(A) Percentage of correct fold pairs for a given SAS threshold. (B) Percentage of correct fold pairs for a given SiMax threshold.

**Figure 6. Average Number of Aligned Residues per SAS**

**Figure 7. Graph Showing How the Alignments of Each Method Compared with Manually Validated BAliBASE Alignments**
The higher the curve (or the curve with the greatest area underneath) represents the method that most agrees with the manually curated BAliBASE alignments.

**Figure 8. Comparison of GT and DDP Scores with SVM Score for Assigning Domains to Multidomain Chains**

**Figure 9. Percentage of Domain Assigned (Blue) and Percentage of Domain Boundaries within Ten Residues of Verified Boundaries (Pink) at a Range of SVM Score Cutoffs**

**Figure 10. Domain Coverage Versus Quality of Domain Boundaries**

**Figure 11. Percentage of Domains with Correct Domain Boundaries (within Ten Residues) When Varying the Number of Representatives Taken from Each Superfamily in the Targeted Fold Groups**

**Figure 12. Graph of the Percentage of Correct (within Ten Residues) Domain Boundaries against the Sequence Identity between the Assigned Region and the Matched Domain**

Figure 13. Superposition of the Catalase HPII (PDB 1iph; First Domain of Chain A) as It Is Classified in the CATH Database and Its Match to Bovine Beta-Lactoglobulin, Coloured Red, (PDB 1beb; Chain A), the Closest Relative Identified by CATHEDRAL

**Figure 14. Flowchart of CATHEDRAL Algorithm for Assigning Folds and Domain Boundaries to Protein Chains**

See this image and copyright information in PMC

References

1. Apic G, Gough J, Teichmann SA. Domain combinations in archaeal, eubacterial and eukaryotic proteomes. J Mol Biol. 2001;310:311–325. - PubMed
1. Orengo CA, Jones DT, Thornton JM. Protein superfamilies and domain superfolds. Nature. 1994;372:631–634. - PubMed
1. Coulson AF, Moult J. A unifold, mesofold, and superfold model of protein fold use. Proteins. 2002;46:61–71. - PubMed
1. Grant A, Lee D, Orengo C. Progress towards mapping the universe of protein folds. Genome Biol. 2004;5:107. - PMC - PubMed
1. Murzin AG, Brenner SE, Hubbard T, Chothia C. SCOP: A structural classification of proteins database for the investigation of sequences and structures. J Mol Biol. 1995;247:536–540. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

CATHEDRAL: a fast and effective algorithm to predict folds and domain boundaries from multidomain protein structures

Affiliation

CATHEDRAL: a fast and effective algorithm to predict folds and domain boundaries from multidomain protein structures

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources