Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Nov 1;31(21):3460-7.
doi: 10.1093/bioinformatics/btv398. Epub 2015 Jul 2.

Functional classification of CATH superfamilies: a domain-based approach for protein function annotation

Affiliations

Functional classification of CATH superfamilies: a domain-based approach for protein function annotation

Sayoni Das et al. Bioinformatics. .

Erratum in

Abstract

Motivation: Computational approaches that can predict protein functions are essential to bridge the widening function annotation gap especially since <1.0% of all proteins in UniProtKB have been experimentally characterized. We present a domain-based method for protein function classification and prediction of functional sites that exploits functional sub-classification of CATH superfamilies. The superfamilies are sub-classified into functional families (FunFams) using a hierarchical clustering algorithm supervised by a new classification method, FunFHMMer.

Results: FunFHMMer generates more functionally coherent groupings of protein sequences than other domain-based protein classifications. This has been validated using known functional information. The conserved positions predicted by the FunFams are also found to be enriched in known functional residues. Moreover, the functional annotations provided by the FunFams are found to be more precise than other domain-based resources. FunFHMMer currently identifies 110,439 FunFams in 2735 superfamilies which can be used to functionally annotate>16 million domain sequences.

Availability and implementation: All FunFam annotation data are made available through the CATH webpages (http://www.cathdb.info). The FunFHMMer webserver (http://www.cathdb.info/search/by_funfhmmer) allows users to submit query sequences for assignment to a CATH FunFam.

Contact: sayoni.das.12@ucl.ac.uk

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Use of SDPs by FunFHMMer to infer functional coherence of cluster alignments. The coloured circles represent the node sequence clusters and each colour denotes a unique function. The schematic representation of the parent node MSA and the child nodes MSA is shown along with the phylogenetic tree. Child nodes are separated by a dashed line. Conserved positions in the MSA are shown in red and the SDPs are shown in green or yellow for different child nodes
Fig. 2.
Fig. 2.
Function prediction using CATH FunFams. Workflow for making function predictions using CATH Functional Families
Fig. 3.
Fig. 3.
EC number variation across protein classifications. Percentage of families or superfamilies having a certain number of EC terms for each of the domain-based protein classifications
Fig. 4.
Fig. 4.
UniProt rollback assessment. Performance of FunFHMMer protocol on the UniProtKB/Swiss-Prot rollback assessment dataset compared with functional annotations predicted by DFX protocol, Pfam (native) family and CDD family assignments
Fig. 5.
Fig. 5.
Network representation of the HUP Superfamily (CATH 3.40.50.620) showing available functional annotations in FunFams. The coloured nodes indicate FunFams annotated with different EC numbers and the grey nodes indicate FunFams without any Enzyme Commission (EC) annotation which include non-enzymes

Similar articles

Cited by

References

    1. Abhiman S., Sonnhammer E.L. (2005) Funshift: a database of function shift analysis on protein subfamilies. Nucleic Acids Res., 33(Suppl. 1), D197–D200. - PMC - PubMed
    1. Akiva E., et al. (2014) The structure–function linkage database. Nucleic Acids Res., 42(Database issue), D521–D530. - PMC - PubMed
    1. Ashburner M., et al. (2000) Gene ontology: tool for the unification of biology. Nat. Genet., 25, 25–29. - PMC - PubMed
    1. Bartlett G.J., et al. (2002) Analysis of catalytic residues in enzyme active sites. J. Mol. Biol., 324, 105–121. - PubMed
    1. Bashton M., Chothia C. (2007) The generation of new protein functions by the combination of domains. Structure, 15, 85–99. - PubMed

Publication types