A galaxy of folds
- PMID: 19937658
- PMCID: PMC2817847
- DOI: 10.1002/pro.297
A galaxy of folds
Abstract
Many protein classification systems capture homologous relationships by grouping domains into families and superfamilies on the basis of sequence similarity. Superfamilies with similar 3D structures are further grouped into folds. In the absence of discernable sequence similarity, these structural similarities were long thought to have originated independently, by convergent evolution. However, the growth of databases and advances in sequence comparison methods have led to the discovery of many distant evolutionary relationships that transcend the boundaries of superfamilies and folds. To investigate the contributions of convergent versus divergent evolution in the origin of protein folds, we clustered representative domains of known structure by their sequence similarity, treating them as point masses in a virtual 2D space which attract or repel each other depending on their pairwise sequence similarities. As expected, families in the same superfamily form tight clusters. But often, superfamilies of the same fold are linked with each other, suggesting that the entire fold evolved from an ancient prototype. Strikingly, some links connect superfamilies with different folds. They arise from modular peptide fragments of between 20 and 40 residues that co-occur in the connected folds in disparate structural contexts. These may be descendants of an ancestral pool of peptide modules that evolved as cofactors in the RNA world and from which the first folded proteins arose by amplification and recombination. Our galaxy of folds summarizes, in a single image, most known and many yet undescribed homologous relationships between protein superfamilies, providing new insights into the evolution of protein domains.
Figures



Similar articles
-
Exploring dynamics of protein structure determination and homology-based prediction to estimate the number of superfamilies and folds.BMC Struct Biol. 2006 Mar 20;6:6. doi: 10.1186/1472-6807-6-6. BMC Struct Biol. 2006. PMID: 16549009 Free PMC article.
-
Identification of homology in protein structure classification.Nat Struct Biol. 2001 Nov;8(11):953-7. doi: 10.1038/nsb1101-953. Nat Struct Biol. 2001. PMID: 11685241
-
Cross-over between discrete and continuous protein structure space: insights into automatic classification and networks of protein structures.PLoS Comput Biol. 2009 Mar;5(3):e1000331. doi: 10.1371/journal.pcbi.1000331. Epub 2009 Mar 27. PLoS Comput Biol. 2009. PMID: 19325884 Free PMC article.
-
On the evolution of protein folds: are similar motifs in different protein folds the result of convergence, insertion, or relics of an ancient peptide world?J Struct Biol. 2001 May-Jun;134(2-3):191-203. doi: 10.1006/jsbi.2001.4393. J Struct Biol. 2001. PMID: 11551179 Review.
-
Basic units of protein structure, folding, and function.Prog Biophys Mol Biol. 2017 Sep;128:85-99. doi: 10.1016/j.pbiomolbio.2016.09.009. Epub 2016 Sep 30. Prog Biophys Mol Biol. 2017. PMID: 27697476 Review.
Cited by
-
Similar protein segments shared between domains of different evolutionary lineages.Protein Sci. 2022 Sep;31(9):e4407. doi: 10.1002/pro.4407. Protein Sci. 2022. PMID: 36040261 Free PMC article.
-
Development of a motif-based topology-independent structure comparison method to identify evolutionarily related folds.Proteins. 2016 Dec;84(12):1859-1874. doi: 10.1002/prot.25169. Epub 2016 Oct 11. Proteins. 2016. PMID: 27671894 Free PMC article.
-
A horizontal alignment tool for numerical trend discovery in sequence data: application to protein hydropathy.PLoS Comput Biol. 2013;9(10):e1003247. doi: 10.1371/journal.pcbi.1003247. Epub 2013 Oct 10. PLoS Comput Biol. 2013. PMID: 24130469 Free PMC article.
-
Clustering of disulfide-rich peptides provides scaffolds for hit discovery by phage display: application to interleukin-23.BMC Bioinformatics. 2016 Nov 23;17(1):481. doi: 10.1186/s12859-016-1350-9. BMC Bioinformatics. 2016. PMID: 27881076 Free PMC article.
-
ProtGPT2 is a deep unsupervised language model for protein design.Nat Commun. 2022 Jul 27;13(1):4348. doi: 10.1038/s41467-022-32007-7. Nat Commun. 2022. PMID: 35896542 Free PMC article.
References
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials