Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Jan;19(1):124-30.
doi: 10.1002/pro.297.

A galaxy of folds

Affiliations

A galaxy of folds

Vikram Alva et al. Protein Sci. 2010 Jan.

Abstract

Many protein classification systems capture homologous relationships by grouping domains into families and superfamilies on the basis of sequence similarity. Superfamilies with similar 3D structures are further grouped into folds. In the absence of discernable sequence similarity, these structural similarities were long thought to have originated independently, by convergent evolution. However, the growth of databases and advances in sequence comparison methods have led to the discovery of many distant evolutionary relationships that transcend the boundaries of superfamilies and folds. To investigate the contributions of convergent versus divergent evolution in the origin of protein folds, we clustered representative domains of known structure by their sequence similarity, treating them as point masses in a virtual 2D space which attract or repel each other depending on their pairwise sequence similarities. As expected, families in the same superfamily form tight clusters. But often, superfamilies of the same fold are linked with each other, suggesting that the entire fold evolved from an ancient prototype. Strikingly, some links connect superfamilies with different folds. They arise from modular peptide fragments of between 20 and 40 residues that co-occur in the connected folds in disparate structural contexts. These may be descendants of an ancestral pool of peptide modules that evolved as cofactors in the RNA world and from which the first folded proteins arose by amplification and recombination. Our galaxy of folds summarizes, in a single image, most known and many yet undescribed homologous relationships between protein superfamilies, providing new insights into the evolution of protein domains.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Galaxy of folds colored by classes. Domains from the same class come to lie in similar regions of the galaxy. Domains in SCOP20 were clustered in CLANS based on their all-against-all pairwise similarities as measured by HHsearch P-values. Dots represent domains. Line coloring reflects HHsearch P-values; the brighter a line, the lower the P-value. Domains are colored according to their SCOP class: all-α (blue), all-β (cyan), α/β (red), α+β (yellow), small proteins (green), multi-domain proteins (orange), and membrane proteins (magenta).
Figure 2
Figure 2
Galaxy of folds colored by folds. Some clusters connect domains of different fold, pointing to common, homologous fragments of similar sequence and structure. These might represent descendants of a set of ancient peptide modules, from which the first protein domains have been assembled.
Figure 3
Figure 3
Galaxy of folds colored by superfamilies. Many tight clusters contain various superfamilies of the same fold, indicating that folds with multiple independent origins are rather the exception than the rule.

Similar articles

Cited by

References

    1. Brocchieri L, Karlin S. Protein length in eukaryotic and prokaryotic proteomes. Nucleic Acids Res. 2005;33:3390–3400. - PMC - PubMed
    1. Marsden RL, Lee D, Maibaum M, Yeats C, Orengo CA. Comprehensive genome analysis of 203 genomes provides structural genomics with new insights into protein family space. Nucleic Acids Res. 2006;34:1066–1080. - PMC - PubMed
    1. Hunter S, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, Bork P, Das U, Daugherty L, Duquenne L, et al. InterPro: the integrative protein signature database. Nucleic Acids Res. 2009;37:D211–D215. - PMC - PubMed
    1. Orengo CA, Thornton JM. Protein families and their evolution-a structural perspective. Annu Rev Biochem. 2005;74:867–900. - PubMed
    1. Rao ST, Rossmann MG. Comparison of super-secondary structures in proteins. J Mol Biol. 1973;76:241–256. - PubMed