Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2005 Jan 1;33(Database issue):D188-91.
doi: 10.1093/nar/gki096.

ADDA: a domain database with global coverage of the protein universe

Affiliations

ADDA: a domain database with global coverage of the protein universe

Andreas Heger et al. Nucleic Acids Res. .

Abstract

We used the Automatic Domain Decomposition Algorithm (ADDA) to generate a database of protein domain families with complete coverage of all protein sequences. Sequences are split into domains and domains are grouped into protein domain families in a completely automated process. The current database contains domains for more than 1.5 million sequences in more than 40,000 domain families. In particular, there are 3828 novel domain families that do not overlap with the curated domain databases Pfam, SCOP and InterPro. The data are freely available for downloading and querying via a web interface (http://ekhidna.biocenter.helsinki.fi:9801/sqgraph/pairsdb).

PubMed Disclaimer

Figures

Figure 1
Figure 1
Overview of domain families in ADDA. The number of families is given in the last row of each category label. (A) Mobile modules, domain families that co-occur with a variety of different domain families, constitute only a fraction of all domain families. Many domains only occur in single-domain proteins or are always associated with the same domain family (associated families). The majority of domain families contain only a single representative sequence on the 40% similarity level (singletons). (B) Taxonomic distribution of domain families over the three superkingdoms (Archaea, Bacteria and Eukaryota). Left: only associated domain families excluding singletons. Right: only mobile modules. Mobile modules tend to be more widely distributed than associated domains. (C) Annotation of domain families. Left: only associated domain families excluding singletons. Right: only mobile modules. Novel domain families do not overlap with domain families from Pfam, SCOP and InterPro. Mobile modules are well known to curated domain databases, but there are many novel domain families left to be explored.

References

    1. Bateman A., Coin,L., Durbin,R., Finn,R.D., Hollich,V., Griffiths-Jones,S., Khanna,A., Marshall,M., Moxon,S., Sonnhammer,E.L. et al. (2004) The Pfam protein families database. Nucleic Acids Res., 32, D138–D141. - PMC - PubMed
    1. Hulo N., Sigrist,C.J.A., Le Saux,V., Langendijk-Genevaux,P.S., Bordoli,L., Gattiker,A., De Castro,E., Bucher,P. and Bairoch,A. (2004) Recent improvements to the PROSITE database. Nucleic Acids Res., 32, D134–D137. - PMC - PubMed
    1. Letunic I., Copley,R.R., Schmidt,S., Ciccarelli,F.D., Doerks,T., Schultz,J., Ponting,C.P. and Bork,P. (2004) SMART 4.0: towards genomic data integration. Nucleic Acids Res., 32, D142–D144. - PMC - PubMed
    1. Mulder N.J., Apweiler,R., Attwood,T.K., Bairoch,A., Barrell,D., Bateman,A., Binns,D., Biswas,M., Bradley,P., Bork,P. et al. (2003) The InterPro Database, 2003 brings increased coverage and new features. Nucleic Acids Res., 31, 315–318. - PMC - PubMed
    1. Servant F., Bru,C., Carrere,S., Courcelle,E., Gouzy,J., Peyruc,D. and Kahn,D. (2002) ProDom: automated clustering of homologous domains. Brief Bioinformatics, 3, 246–251. - PubMed