Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Jan;42(Database issue):D240-5.
doi: 10.1093/nar/gkt1205. Epub 2013 Nov 21.

Gene3D: Multi-domain annotations for protein sequence and comparative genome analysis

Affiliations

Gene3D: Multi-domain annotations for protein sequence and comparative genome analysis

Jonathan G Lees et al. Nucleic Acids Res. 2014 Jan.

Abstract

Gene3D (http://gene3d.biochem.ucl.ac.uk) is a database of protein domain structure annotations for protein sequences. Domains are predicted using a library of profile HMMs from 2738 CATH superfamilies. Gene3D assigns domain annotations to Ensembl and UniProt sequence sets including >6000 cellular genomes and >20 million unique protein sequences. This represents an increase of 45% in the number of protein sequences since our last publication. Thanks to improvements in the underlying data and pipeline, we see large increases in the domain coverage of sequences. We have expanded this coverage by integrating Pfam and SUPERFAMILY domain annotations, and we now resolve domain overlaps to provide highly comprehensive composite multi-domain architectures. To make these data more accessible for comparative genome analyses, we have developed novel search algorithms for searching genomes to identify related multi-domain architectures. In addition to providing domain family annotations, we have now developed a pipeline for 3D homology modelling of domains in Gene3D. This has been applied to the human genome and will be rolled out to other major organisms over the next year.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Workflow of the Gene3D pipeline from data input from external resources (Parallelogram shaped boxes) to useful functions for the user (Diamond shaped boxes). Rectangular boxes represent data processing steps.
Figure 2.
Figure 2.
The MDA alignment method in Gene3D uses the Needleman–Wunsch (NW) algorithm. (A) Domain matches in the substitution matrix for the NW algorithm can take place at multiple levels. The highest scoring match is the FunFam level (FunFam Match). The next highest scoring match is between different FunFams from the same superfamily scored by their similarity in a hierarchical tree of FunFams built from profile–profile comparisons (FunFam-Tree Match). The next highest scoring match is at the homologous superfamily level (Superfamily Match). Finally, domains with the same fold can also contribute a positive similarity score in the domain alignment (Fold Match). (B) Domain alignments can be used to find functionally similar proteins by identifying proteins with a similar MDA. (C) All versus All MDA alignments have been carried out to identify those proteins with distinctive domain combinations in a genome (C).
Figure 3.
Figure 3.
Individual domain summary page (example is the C-terminal domain of human siah2,SIAH2_HUMAN) showing a modelled structure along with the Ramachandran plot from the Rampage software package (25) used as part of the quality control step. Residues in the sequence and structure are coloured by conservation across the FunFam (blue->red indicates increasing conservation).

References

    1. Cuff AL, Sillitoe I, Lewis T, Clegg AB, Rentzsch R, Furnham N, Pellegrini-Calace M, Jones D, Thornton J, Orengo CA. Extending CATH: increasing coverage of the protein structure universe and linking structure with function. Nucleic Acids Res. 2011;39:D420–D426. - PMC - PubMed
    1. Sillitoe I, Cuff AL, Dessailly BH, Dawson NL, Furnham N, Lee D, Lees JG, Lewis TE, Studer RA, Rentzsch R, et al. New functional families (FunFams) in CATH to improve the mapping of conserved functional sites to 3D structures. Nucleic Acids Res. 2013;41:D490–D498. - PMC - PubMed
    1. Rentzsch R, Orengo CA. Protein function prediction using domain families. BMC Bioinformatics. 2013;14(Suppl 3):S5. - PMC - PubMed
    1. Radivojac P, Clark WT, Oron TR, Schnoes AM, Wittkop T, Sokolov A, Graim K, Funk C, Verspoor K, Ben-Hur A, et al. A large-scale evaluation of computational protein function prediction. Nat. Methods. 2013;10:221–227. - PMC - PubMed
    1. Yeats C, Redfern OC, Orengo C. A fast and automated solution for accurately resolving protein domain architectures. Bioinformatics (Oxford, England) 2010;26:745–751. - PubMed

Publication types