Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2002 Oct;12(10):1619-23.
doi: 10.1101/gr.278202.

CDART: protein homology by domain architecture

Affiliations
Comparative Study

CDART: protein homology by domain architecture

Lewis Y Geer et al. Genome Res. 2002 Oct.

Abstract

The Conserved Domain Architecture Retrieval Tool (CDART) performs similarity searches of the NCBI Entrez Protein Database based on domain architecture, defined as the sequential order of conserved domains in proteins. The algorithm finds protein similarities across significant evolutionary distances using sensitive protein domain profiles rather than by direct sequence similarity. Proteins similar to a query protein are grouped and scored by architecture. Relying on domain profiles allows CDART to be fast, and, because it relies on annotated functional domains, informative. Domain profiles are derived from several collections of domain definitions that include functional annotation. Searches can be further refined by taxonomy and by selecting domains of interest. CDART is available at http://www.ncbi.nlm.nih.gov/Structure/lexington/lexington.cgi.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The growth over time of the number of proteins known versus the growth in the number of unique domains. The left axis and red line are the cumulative sum of proteins in the NCBI Entrez Protein Database discovered in that year and in all previous years. The right axis and blue line are the total number of unique domains from the Conserved Domain Database that can be found in the cumulative set of proteins for each year, found by running RPS-BLAST on the set of proteins. Note that the zero slope in recent years may be caused by the need to accumulate multiple sequences to create a domain profile. However, this does not explain the inflection point in the curve beginning in 1990.
Figure 2
Figure 2
CDART results page for the tumor suppressor protein BRCA1 (accession NP_009225). Domains found in BRCA1 are shown in beads-on-a-string style at the top of the page and include zinc fingers and BRCT protein–protein interaction domains. Similar domain architectures are listed below using the same style. If an architecture contains more than one protein, it is preceded by a graphical icon, and clicking on the icon gives the full list of proteins with that architecture. At the bottom of the page are controls to subset the list of architectures by taxonomy and by domain.
Figure 3
Figure 3
The top left window allows users to select taxonomic groups of interest to subset CDART results. The lower right window is the results page for the search using BRCA1 subset by taxonomy to include only bacterial sequences. Note the large family of DNA ligases that contain the BRCT domain.

References

    1. Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. - PMC - PubMed
    1. Apweiler R, Attwood TK, Bairoch A, Bateman A, Birney E, Biswas M, Bucher P, Cerutti L, Corpet F, Croning MDR, et al. The InterPro database, an integrated documentation resource for protein families, domains and functional sites. Nucleic Acids Res. 2001;29:37–40. - PMC - PubMed
    1. Bairoch A, Apweiler R. The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res. 2000;28:45–48. - PMC - PubMed
    1. Bateman A, Birney E, Cerruti L, Durbin R, Etwiller L, Eddy SR, Griffiths-Jones S, Howe KL, Marshall M, Sonnhammer ELL. The Pfam Protein Families Database. Nucleic Acids Res. 2002;30:276–280. - PMC - PubMed
    1. Bork P, Hofmann K, Bucher P, Neuwald AF, Altschul SF, Koonin EV. A superfamily of conserved domains in DNA damage-responsive cell cycle checkpoint proteins. FASEB J. 1997;11:68–76. - PubMed

Publication types