Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Oct 31;114(44):11703-11708.
doi: 10.1073/pnas.1707642114. Epub 2017 Oct 19.

Complex evolutionary footprints revealed in an analysis of reused protein segments of diverse lengths

Affiliations

Complex evolutionary footprints revealed in an analysis of reused protein segments of diverse lengths

Sergey Nepomnyachiy et al. Proc Natl Acad Sci U S A. .

Erratum in

Abstract

Proteins share similar segments with one another. Such "reused parts"-which have been successfully incorporated into other proteins-are likely to offer an evolutionary advantage over de novo evolved segments, as most of the latter will not even have the capacity to fold. To systematically explore the evolutionary traces of segment "reuse" across proteins, we developed an automated methodology that identifies reused segments from protein alignments. We search for "themes"-segments of at least 35 residues of similar sequence and structure-reused within representative sets of 15,016 domains [Evolutionary Classification of Protein Domains (ECOD) database] or 20,398 chains [Protein Data Bank (PDB)]. We observe that theme reuse is highly prevalent and that reuse is more extensive when the length threshold for identifying a theme is lower. Structural domains, the best characterized form of reuse in proteins, are just one of many complex and intertwined evolutionary traces. Others include long themes shared among a few proteins, which encompass and overlap with shorter themes that recur in numerous proteins. The observed complexity is consistent with evolution by duplication and divergence, and some of the themes might include descendants of ancestral segments. The observed recursive footprints, where the same amino acid can simultaneously participate in several intertwined themes, could be a useful concept for protein design. Data are available at http://trachel-srv.cs.haifa.ac.il/rachel/ppi/themes/.

Keywords: ancestral segments; protein evolutionary patterns; protein function annotation; protein space.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1.
Fig. 1.
(A) The most reused themes in a protein P are derived from the set of meaningful alignments of P and other proteins: in this example, proteins 1–4. For any possible theme (for example, theme T that spans residues s-e), we can consider the parts in the alignments that are restricted to these residues, which are marked here by black rectangles. (B) We assign a score for every theme within the protein P based on the scores of these restricted parts, which is the sum over the BLOSUM-62 scores for the aligned parts. (C) Our goal is to identify the largest set of nonoverlapping themes (for example, theme_A and theme_B), such that the sum of these scores is optimal. Rather than exhaustively scoring all possible theme end points to find the optimal one, we find it more efficiently using dynamic programming (SI Appendix, Methods has details).
Fig. 2.
Fig. 2.
Usage of the protein themes. Log–log plot of the number of themes vs. theme size [i.e., number of variations per theme for the PDB datasets (for ECOD dataset see SI Appendix, Fig. S1A)]. Results for themes of different minimal lengths are presented using different colors. In all cases, we see that there are many themes with a small size and few large-sized themes; we also see that reuse increases as the minimal theme length decreases.
Fig. 3.
Fig. 3.
Recursive reuse of parts of e2xyiA1 in the ECOD dataset. Reuse manifests a Russian nested dolls effect (in sequence; not to be confused with the structural one described in refs. and 9). The themes are marked on the e2xyiA1 domain. The shortest theme, shown in purple, appears in the largest set of domains (listed within the purple box). A longer (encompassing) theme, shown in blue, appears in fewer domains. Similarly, increasingly longer themes of e2xyiA1, shown in light blue, green, yellow, orange, and red, are found in increasingly smaller sets of domains. This example manifests the complexity of the reuse pattern in evolution, where the same amino acid can appear in more than one theme, and shows that, to accurately describe reuse of a domain, we must consider a per residue resolution.
Fig. 4.
Fig. 4.
Reuse in protein space is greater when considering sets of themes of increasingly shorter minimal lengths. Reuse in the ECOD (A and B) and PDB (C and D) datasets. (A and C) The number of recurring residues (i.e., amino acids that appear in any theme) (Eq. 1) using different sets of themes. (B and D) The number of unique residues (Eq. 2) obtained using sets of themes with different minimal lengths.

Similar articles

Cited by

References

    1. Lupas AN, Ponting CP, Russell RB. On the evolution of protein folds: Are similar motifs in different protein folds the result of convergence, insertion, or relics of an ancient peptide world? J Struct Biol. 2001;134:191–203. - PubMed
    1. Söding J, Lupas AN. More than the sum of their parts: On the evolution of proteins from peptides. Bioessays. 2003;25:837–846. - PubMed
    1. Alva V, Söding J, Lupas AN. A vocabulary of ancient peptides at the origin of folded proteins. Elife. 2015;4:e09410. - PMC - PubMed
    1. Vogel C, Bashton M, Kerrison ND, Chothia C, Teichmann SA. Structure, function and evolution of multidomain proteins. Curr Opin Struct Biol. 2004;14:208–216. - PubMed
    1. Petrey D, Fischer M, Honig B. Structural relationships among proteins with different global topologies and their implications for function annotation strategies. Proc Natl Acad Sci USA. 2009;106:17377–17382. - PMC - PubMed

Publication types