Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2023 Feb;98(1):243-262.
doi: 10.1111/brv.12905. Epub 2022 Oct 9.

A review of visualisations of protein fold networks and their relationship with sequence and function

Affiliations
Review

A review of visualisations of protein fold networks and their relationship with sequence and function

Janan Sykes et al. Biol Rev Camb Philos Soc. 2023 Feb.

Abstract

Proteins form arguably the most significant link between genotype and phenotype. Understanding the relationship between protein sequence and structure, and applying this knowledge to predict function, is difficult. One way to investigate these relationships is by considering the space of protein folds and how one might move from fold to fold through similarity, or potential evolutionary relationships. The many individual characterisations of fold space presented in the literature can tell us a lot about how well the current Protein Data Bank represents protein fold space, how convergence and divergence may affect protein evolution, how proteins affect the whole of which they are part, and how proteins themselves function. A synthesis of these different approaches and viewpoints seems the most likely way to further our knowledge of protein structure evolution and thus, facilitate improved protein structure design and prediction.

Keywords: protein evolution; protein fold switches; protein folds; protein similarity networks; protein structure networks.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Lymphotactin exists in its monomeric (left) and dimeric (right) forms in approximately equal amounts in vivo (Bryan & Orban, 2010). In the transition from monomer to dimer, the C‐terminus helix becomes unstructured and the N‐terminus strands form new hydrogen bonds. An unstructured region at the N‐terminus also forms a new strand. The equilibrium between these two forms shifts with changes in salt and temperature levels.
Fig. 2
Fig. 2
Progression from Cro protein Xfaso 1 to Cro protein Pfl 6 via two hybrids. These two naturally occurring and two designed proteins demonstrate a possible path for a fold switch. Xfaso 1 shares 85% sequence identity (ID) with XPH1, and is structurally very similar. XPH1 is about 70% sequentially similar to XPH2, although the transition between them involves loss of alpha helices and gain of beta sheets. XPH2 is, once again, about 85% sequentially similar to Pfl 6, and structurally very similar. The path via which this fold switch could occur is demonstrated by the bold arrows. The original Cro proteins share about 40% of their sequence, and each hybrid shares 55% of its sequence with the original protein it is not based on.
Fig. 3
Fig. 3
Different network visualisations of fold space applied to the same four‐subdomain ‘proteins’. ‘Proteins’ that take the same fold are encircled, with different background colours for different folds. Connections labelled (A), (B), and (C) are different types of networks. Subdomains of the same sequence are represented by the same shapes. All subdomains can be considered structurally equivalent for the purpose of this diagram. (A) Similarity connections represent instances wherein some of the proteins that take one fold are similar to those that take another (in this case, structurally). (B) Informational connections are directional and represent instances wherein a protein sequence originally taking one fold could be ‘lost’ to another fold via residue changes. These networks generally do not retain sequence information, as each edge represents the migration of a high volume of sequences from one fold to another. (C) Physics‐based connections also represent instances where one protein sequence could mutate to a different fold, but the likelihood of this happening is calculated based on first‐principles physics. Some are found through simulation to be more likely than others, as represented in this case by arrow thickness. Physics‐based models are complex; their purpose is generally to look at how accurately we can model protein evolution rather than to answer questions about overall protein fold space (for which simpler but more broadly applicable models are generally used).
Fig. 4
Fig. 4
Structural similarity networks for four different protein structure alignment (PSA) methods. Note the differing network shapes between different protein structure alignment methods and the ‘network collapse’ as the similarity score threshold for edges (shown in the top row) is increased. Figure reproduced from Edwards & Deane (2015) in accordance with the Creative Commons Attribution (CC BY) license.
Fig. 5
Fig. 5
The landscapes of folding scores for near‐native serum amyloid‐P (SAP) sequences. The native sequence of this protein is not near the optimum stability when calculated with (A) an informational and (B) a physics‐based model. The physics‐based model does, however, exhibit more selection pressure, with steeper slopes and a more dramatic minimum (most stable region). Reproduced from Grahnen et al. (2011) in accordance with the Creative Commons Attribution (CC BY) license.
Fig. 6
Fig. 6
Number of folds recorded in the protein database (PDB) over time. Note the slowing rate of discovery in the last decade.
Fig. 7
Fig. 7
Convergent and divergent protein evolution. The simplified molecules could be an indicator of divergence from (left) or convergence towards (right) a U‐shaped fold. The overall number of discrete folds increases with divergence but decreases with convergence.

References

    1. Alexander, P. A. , He, Y. , Chen, Y. , Orban, J. & Bryan, P. N. (2007). The design and characterization of two proteins with 88% sequence identity but different structure and function. Proceedings of the National Academy of Sciences 104, 11963–11968. - PMC - PubMed
    1. Alexander, P. A. , He, Y. , Chen, Y. , Orban, J. & Bryan, P. N. (2009). A minimal sequence code for switching protein structure and function. Proceedings of the National Academy of Sciences 106, 21149–21154. - PMC - PubMed
    1. Alexander, P. A. , Rozak, D. A. , Orban, J. & Bryan, P. N. (2005). Directed evolution of highly homologous proteins with different folds by phage display: implications for the protein folding code. Biochemistry 44, 14045–14054. - PubMed
    1. Alva, V. , Remmert, M. , Biegert, A. , Lupas, A. N. & Soding, J. (2010). A galaxy of folds. Protein Science 19, 124–130. - PMC - PubMed
    1. Alva, V. , Soding, J. & Lupas, A. N. (2015). A vocabulary of ancient peptides at the origin of folded proteins. eLife 4, e09410. - PMC - PubMed

Publication types