Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2004;5(5):107.
doi: 10.1186/gb-2004-5-5-107. Epub 2004 Apr 29.

Progress towards mapping the universe of protein folds

Affiliations

Progress towards mapping the universe of protein folds

Alastair Grant et al. Genome Biol. 2004.

Abstract

Although the precise aims differ between the various international structural genomics initiatives currently aiming to illuminate the universe of protein folds, many selectively target protein families for which the fold is unknown. How well can the current set of known protein families and folds be used to estimate the total number of folds in nature, and will structural genomics initiatives yield representatives for all the major protein families within a reasonable time scale?

PubMed Disclaimer

Figures

Figure 1
Figure 1
The proportion of domain families represented by CATH fold groups. Within the CATH database [19,20], structures are grouped into fold groups on the basis of both overall shape and connectivity of their secondary structures. Domain families are related at the 35% sequence identity level by complete linkage clustering. The number of domain families within each fold group gives a measure of the sequence diversity of that fold group. A group of 54 CATH fold groups (only 6.6% of the cumulative total of CATH fold groups) accounts for 76% of domain families, as shown by the dotted lines.
Figure 2
Figure 2
Log-log plots of the sizes of (a) CATH, (b) Pfam and (c) NewFam (uncharacterized) families show power-law-like behavior. (d) Fitted power law functions and their exponents are shown for comparison. Most NewFam families have relatively few members. See text for further details.
Figure 3
Figure 3
Gene coverage in Gene3D. The chart indicates the percentage of genes in the indicated genome that have at least one non-overlapping assignment from CATH or Pfam. Three representative genomes from each kingdom of life show low, average and high coverage, respectively. The species shown are Pyrobaculum aerophilum, Methanococcus jannaschii, Thermoplasma acidophilum, Helicobacter pylori, Escherichia coli K12, Wigglesworthia glossinidia brevipalpis, Plasmodium falciparum, Encephalitozoon cuniculi and Schizosaccharomyces pombe.
Figure 4
Figure 4
The cumulative number of domains within domain superfamilies (ranked by decreasing size). The 1,000 largest domain superfamilies account for nearly 60% of all domain sequences (see dotted lines). The figure excludes singleton domain families, and is derived from our own unpublished work.

Similar articles

Cited by

References

    1. Bernal A, Ear U, Kyrpides N. Genomes OnLine Database (GOLD): a monitor of genomes projects world-wide. Nucleic Acids Res. 2001;29:126–127. doi: 10.1093/nar/29.1.126. - DOI - PMC - PubMed
    1. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE. The Protein Data Bank. Nucleic Acids Res. 2000;28:235–242. doi: 10.1093/nar/28.1.235. - DOI - PMC - PubMed
    1. Protein Data Bank http://www.rcsb.org/pdb/
    1. Gene3D http://www.biochem.ucl.ac.uk/bsm/cath/Gene3D/
    1. Madera M, Vogel C, Kummerfeld SK, Chothia C, Gough J. The SUPERFAMILY database in 2004: additions and improvements. Nucleic Acids Res. 2004;32 Database issue:D235–D239. doi: 10.1093/nar/gkh117. - DOI - PMC - PubMed

LinkOut - more resources