Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 Jan;35(Database issue):D237-40.
doi: 10.1093/nar/gkl951. Epub 2006 Nov 29.

CDD: a conserved domain database for interactive domain family analysis

Affiliations

CDD: a conserved domain database for interactive domain family analysis

Aron Marchler-Bauer et al. Nucleic Acids Res. 2007 Jan.

Abstract

The conserved domain database (CDD) is part of NCBI's Entrez database system and serves as a primary resource for the annotation of conserved domain footprints on protein sequences in Entrez. Entrez's global query interface can be accessed at http://www.ncbi.nlm.nih.gov/Entrez and will search CDD and many other databases. Domain annotation for proteins in Entrez has been pre-computed and is readily available in the form of 'Conserved Domain' links. Novel protein sequences can be scanned against CDD using the CD-Search service; this service searches databases of CDD-derived profile models with protein sequence queries using BLAST heuristics, at http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi. Protein query sequences submitted to NCBI's protein BLAST search service are scanned for conserved domain signatures by default. The CDD collection contains models imported from Pfam, SMART and COG, as well as domain models curated at NCBI. NCBI curated models are organized into hierarchies of domains related by common descent. Here we report on the status of the curation effort and present a novel helper application, CDTree, which enables users of the CDD resource to examine curated hierarchies. More importantly, CDD and CDTree used in concert, serve as a powerful tool in protein classification, as they allow users to analyze protein sequences in the context of domain family hierarchies.

PubMed Disclaimer

Figures

Figure 1
Figure 1
(a) Pre-computed domain annotation is retrieved for a protein sequence in the Entrez database, gi|47123187 from Xenopus laevis. The graphical annotation can be obtained by following the ‘Conserved Domains’ link on the Entrez document summary for gi|47123187. By default, graphical domain summaries hide redundant information from the user. By clicking on the red balloon (b), representing a conserved domain footprint for the model cd02907, the user launches a summary view of that domain model, which also preserves information about the (query) sequence of interest (c). The CD summary page displays details about the actual model and its hierarchy. A section labeled ‘Links’ (data not shown), for example, provides links to all protein sequences in Entrez that match the current domain model, to references in PubMed and Entrez Books, and to the original source of the curated family, which may be a model imported from outside databases such as Pfam. Clicking on the button labeled ‘Interactive Display with CDTree’ (d) launches CDTree on the user's computer as a local application, which retrieves its data via the web-browser. The CDTree view corresponding to this example is shown in Figure 2. CDTree launching is not enabled for alignment models imported from outside sources.
Figure 2
Figure 2
CDTree default display, as launched from a web browser, showing a curated domain hierarchy with an embedded user query sequence. A protein sequence found in NCBI's Entrez database was inspected for the presence of conserved domains. The user has followed links from one of the domain footprints annotated on the model, and inspected a particular domain model, cd02907. From the CD summary page, as shown in Figure 1, CDTree was launched as a helper application. The main window (a) presents the organization of the conserved domain hierarchy, as already visible on the CD summary page (Figure 1). In this case, cd02749 or ‘Macro’ is a generic ‘parent’ model, which has been split up into several more specific ‘children’. The sequence tree shown in panel (c) provides evidence for this particular subfamily structure. Groups of branches rendered in the same color correspond to alignment rows that have been assigned to a particular subgroup. Sequence trees are always calculated from the curated CD alignments; in this particular example, the distance data have been obtained from pair-wise alignment scores. Aligned residue pairs are scored with the BLOSUM62 matrix. Pair-wise scores are subtracted from the highest observed pair-wise score to yield distances, and the distance units plotted here correspond to BLOSUM62 scores. By default, a taxonomy viewer window is opened as well (b). Users may select and highlight whole branches in the sequence tree view and examine corresponding highlights in the taxonomy viewer, to understand the taxonomic scope of particular subfamilies, or select/highlight taxa in the taxonomy viewer and examine their distribution in the sequence tree. In this example, a user query sequence has been added to one of the models, cd02907. cd02907 gave the best scoring hit in a database search for a particular region of the user's query sequence. In the sequence tree display, the user query is highlighted by default (d). It appears that the user query is a typical member of this particular subfamily, as it clusters tightly with all the other members, and therefore transfer of annotation from the model to the sequence - or functional inference - may be appropriate.

References

    1. Marchler-Bauer A., Anderson J.B., Cherukuri P.F., DeWeese-Scott C., Geer L.Y., Gwadz M., He S., Hurwitz D.I., Jackson J.D., Ke Z., et al. CDD: a Conserved Domain Database for protein classification. Nucleic Acids Res. 2005;33:D192–D196. - PMC - PubMed
    1. Brown D., Sjolander K. Functional classification using phylogenomic inference. PLoS Comput. Biol. 2006;2:e77. - PMC - PubMed
    1. Altschul S.F., Madden T.L., Schaffer A.A., Zhang J., Zhang Z., Miller W., Lipman D.J. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. - PMC - PubMed
    1. Marchler-Bauer A., Bryant S.H. CD-Search: protein domain annotations on the fly. Nucleic Acids Res. 2004;32:327–331. - PMC - PubMed
    1. Wang Y., Geer L.Y., Chappey C., Kans J.A., Bryant S.H. Cn3D: sequence and structure views for Entrez. Trends Biochem. Sci. 2000;25:300–302. - PubMed

Publication types