Review

. 2023 Feb;98(1):243-262.

doi: 10.1111/brv.12905. Epub 2022 Oct 9.

A review of visualisations of protein fold networks and their relationship with sequence and function

Janan Sykes¹, Barbara R Holland¹, Michael A Charleston¹

Affiliations

PMID: 36210328
PMCID: PMC10092621
DOI: 10.1111/brv.12905

Review

A review of visualisations of protein fold networks and their relationship with sequence and function

Janan Sykes et al. Biol Rev Camb Philos Soc. 2023 Feb.

. 2023 Feb;98(1):243-262.

doi: 10.1111/brv.12905. Epub 2022 Oct 9.

Authors

Janan Sykes¹, Barbara R Holland¹, Michael A Charleston¹

Affiliation

¹ School of Natural Sciences, University of Tasmania, Private Bag 37, Hobart, Tasmania, 7001, Australia.

PMID: 36210328
PMCID: PMC10092621
DOI: 10.1111/brv.12905

Abstract

Proteins form arguably the most significant link between genotype and phenotype. Understanding the relationship between protein sequence and structure, and applying this knowledge to predict function, is difficult. One way to investigate these relationships is by considering the space of protein folds and how one might move from fold to fold through similarity, or potential evolutionary relationships. The many individual characterisations of fold space presented in the literature can tell us a lot about how well the current Protein Data Bank represents protein fold space, how convergence and divergence may affect protein evolution, how proteins affect the whole of which they are part, and how proteins themselves function. A synthesis of these different approaches and viewpoints seems the most likely way to further our knowledge of protein structure evolution and thus, facilitate improved protein structure design and prediction.

Keywords: protein evolution; protein fold switches; protein folds; protein similarity networks; protein structure networks.

PubMed Disclaimer

Figures

**Fig. 1**
Lymphotactin exists in its monomeric (left) and dimeric (right) forms in approximately equal amounts *in vivo* (Bryan & Orban, 2010). In the transition from monomer to dimer, the C‐terminus helix becomes unstructured and the N‐terminus strands form new hydrogen bonds. An unstructured region at the N‐terminus also forms a new strand. The equilibrium between these two forms shifts with changes in salt and temperature levels.

**Fig. 2**
Progression from Cro protein Xfaso 1 to Cro protein Pfl 6 *via* two hybrids. These two naturally occurring and two designed proteins demonstrate a possible path for a fold switch. Xfaso 1 shares 85% sequence identity (ID) with XPH1, and is structurally very similar. XPH1 is about 70% sequentially similar to XPH2, although the transition between them involves loss of alpha helices and gain of beta sheets. XPH2 is, once again, about 85% sequentially similar to Pfl 6, and structurally very similar. The path *via* which this fold switch could occur is demonstrated by the bold arrows. The original Cro proteins share about 40% of their sequence, and each hybrid shares 55% of its sequence with the original protein it is not based on.

**Fig. 3**
Different network visualisations of fold space applied to the same four‐subdomain ‘proteins’. ‘Proteins’ that take the same fold are encircled, with different background colours for different folds. Connections labelled (A), (B), and (C) are different types of networks. Subdomains of the same sequence are represented by the same shapes. All subdomains can be considered structurally equivalent for the purpose of this diagram. (A) Similarity connections represent instances wherein some of the proteins that take one fold are similar to those that take another (in this case, structurally). (B) Informational connections are directional and represent instances wherein a protein sequence originally taking one fold could be ‘lost’ to another fold *via* residue changes. These networks generally do not retain sequence information, as each edge represents the migration of a high volume of sequences from one fold to another. (C) Physics‐based connections also represent instances where one protein sequence could mutate to a different fold, but the likelihood of this happening is calculated based on first‐principles physics. Some are found through simulation to be more likely than others, as represented in this case by arrow thickness. Physics‐based models are complex; their purpose is generally to look at how accurately we can model protein evolution rather than to answer questions about overall protein fold space (for which simpler but more broadly applicable models are generally used).

**Fig. 4**
Structural similarity networks for four different protein structure alignment (PSA) methods. Note the differing network shapes between different protein structure alignment methods and the ‘network collapse’ as the similarity score threshold for edges (shown in the top row) is increased. Figure reproduced from Edwards & Deane (2015) in accordance with the Creative Commons Attribution (CC BY) license.

**Fig. 5**
The landscapes of folding scores for near‐native serum amyloid‐P (SAP) sequences. The native sequence of this protein is not near the optimum stability when calculated with (A) an informational and (B) a physics‐based model. The physics‐based model does, however, exhibit more selection pressure, with steeper slopes and a more dramatic minimum (most stable region). Reproduced from Grahnen *et al*. (2011) in accordance with the Creative Commons Attribution (CC BY) license.

**Fig. 6**
Number of folds recorded in the protein database (PDB) over time. Note the slowing rate of discovery in the last decade.

**Fig. 7**
Convergent and divergent protein evolution. The simplified molecules could be an indicator of divergence from (left) or convergence towards (right) a U‐shaped fold. The overall number of discrete folds increases with divergence but decreases with convergence.

See this image and copyright information in PMC

References

1. Alexander, P. A. , He, Y. , Chen, Y. , Orban, J. & Bryan, P. N. (2007). The design and characterization of two proteins with 88% sequence identity but different structure and function. Proceedings of the National Academy of Sciences 104, 11963–11968. - PMC - PubMed
1. Alexander, P. A. , He, Y. , Chen, Y. , Orban, J. & Bryan, P. N. (2009). A minimal sequence code for switching protein structure and function. Proceedings of the National Academy of Sciences 106, 21149–21154. - PMC - PubMed
1. Alexander, P. A. , Rozak, D. A. , Orban, J. & Bryan, P. N. (2005). Directed evolution of highly homologous proteins with different folds by phage display: implications for the protein folding code. Biochemistry 44, 14045–14054. - PubMed
1. Alva, V. , Remmert, M. , Biegert, A. , Lupas, A. N. & Soding, J. (2010). A galaxy of folds. Protein Science 19, 124–130. - PMC - PubMed
1. Alva, V. , Soding, J. & Lupas, A. N. (2015). A vocabulary of ancient peptides at the origin of folded proteins. eLife 4, e09410. - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A review of visualisations of protein fold networks and their relationship with sequence and function

Affiliation

A review of visualisations of protein fold networks and their relationship with sequence and function

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources