Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013;9(11):e1003325.
doi: 10.1371/journal.pcbi.1003325. Epub 2013 Nov 14.

Exploring fold space preferences of new-born and ancient protein superfamilies

Affiliations

Exploring fold space preferences of new-born and ancient protein superfamilies

Hannah Edwards et al. PLoS Comput Biol. 2013.

Abstract

The evolution of proteins is one of the fundamental processes that has delivered the diversity and complexity of life we see around ourselves today. While we tend to define protein evolution in terms of sequence level mutations, insertions and deletions, it is hard to translate these processes to a more complete picture incorporating a polypeptide's structure and function. By considering how protein structures change over time we can gain an entirely new appreciation of their long-term evolutionary dynamics. In this work we seek to identify how populations of proteins at different stages of evolution explore their possible structure space. We use an annotation of superfamily age to this space and explore the relationship between these ages and a diverse set of properties pertaining to a superfamily's sequence, structure and function. We note several marked differences between the populations of newly evolved and ancient structures, such as in their length distributions, secondary structure content and tertiary packing arrangements. In particular, many of these differences suggest a less elaborate structure for newly evolved superfamilies when compared with their ancient counterparts. We show that the structural preferences we report are not a residual effect of a more fundamental relationship with function. Furthermore, we demonstrate the robustness of our results, using significant variation in the algorithm used to estimate the ages. We present these age estimates as a useful tool to analyse protein populations. In particularly, we apply this in a comparison of domains containing greek key or jelly roll motifs.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. What do we mean by the age of a superfamily?
Ages are generated using a phylogenetic species tree and an occurrence profile of a superfamily across the genomes of these species. Parsimony algorithms predict the simplest scenario of loss and gain events on internal nodes of the tree which explain the occurrence profile at its leaves. Ages are normalised between 0, at the leaves of the tree, and 1, at its root. Ancient superfamilies are predicted an age of 1 and new-born superfamilies are estimated to have an evolutionary age formula image.
Figure 2
Figure 2. The relationships between superfamily ages, secondary structure and length.
Figure A gives a percentile plot of the age distributions of 5 SCOP classes. For ease of interpretation, plots of multi-domain and membrane proteins have been omitted. Each line represents the distribution of ages generated using a different phylogenetic tree. Noticeably, formula image superfamilies' age distributions rise quicker than those of the other classes. Moreover, superfamilies classified as small under SCOP are significantly younger than the other classes. Figure B gives a boxplot of the length distributions for these SCOP classes. Roughly speaking, the ordering of the classes by length corresponds to their ordering by age. formula image superfamilies are longer and small proteins are shorter than the other classes. Figure C gives a percentile plot of the age distributions of superfamilies with different average domain lengths. Multi-domain superfamilies were omitted from this analysis. Ancient superfamilies are significantly longer than their new-born counterparts. Figure D gives a percentile plot of the age distributions of two populations of superfamilies: those containing a majority parallel strand direction and those with more antiparallel strands. The parallel population is significantly older than the antiparallel superfamilies.
Figure 3
Figure 3. Superfamily ages of greek key and jelly roll motifs.
Percentile plots for the age distributions of superfamilies containing a greek key or a jelly roll motif within their beta-sheet topologies. Domains annotated as containing at least one greek key motif are significantly older than those containing the jelly roll motif.

References

    1. Ponting CP, Russell RR (2002) The natural history of protein domains. Annual review of biophysics and biomolecular structure 31: 45–71. - PubMed
    1. Sadowski MI, Taylor WR (2010) On the evolutionary origins of “Fold Space Continuity”: a study of topological convergence and divergence in mixed alpha-beta domains. Journal of structural biology 172: 244–52. - PubMed
    1. Lin J, Gerstein M (2000) Whole-genome trees based on the occurrence of folds and orthologs: implications for comparing genomes on different levels. Genome Research 10: 808–818. - PMC - PubMed
    1. Orengo C, Michie A, Jones S, Jones D, Swindells M, et al. (1997) CATH-a hierarchic classification of protein domain structures. Structure 5: 1093–108. - PubMed
    1. Lo Conte L, Ailey B, Brenner SE, Brenner SE, Murzin AG, et al. (2000) SCOP: a structural classification of proteins database. Nucleic Acids Research 28: 257–259. - PMC - PubMed

Publication types