Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2007 Jul 10;104(28):11627-32.
doi: 10.1073/pnas.0701393104. Epub 2007 Jun 27.

The network of sequence flow between protein structures

Affiliations
Comparative Study

The network of sequence flow between protein structures

Leonid Meyerguz et al. Proc Natl Acad Sci U S A. .

Abstract

Sequence-structure relationships in proteins are highly asymmetric because many sequences fold into relatively few structures. What is the number of sequences that fold into a particular protein structure? Is it possible to switch between stable protein folds by point mutations? To address these questions, we compute a directed graph of sequences and structures of proteins, which is based on 2,060 experimentally determined protein shapes from the Protein Data Bank. The directed graph is highly connected at native energies with "sinks" that attract many sequences from other folds. The sinks are rich in beta-sheets. The number of sequences that transition between folds is significantly smaller than the number of sequences retained by their fold. The sequence flow into a particular protein shape from other proteins correlates with the number of sequences that matches this shape in empirically determined genomes. Properties of strongly connected components of the graph are correlated with protein length and secondary structure.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1.
Fig. 1.
The two largest strongly connected components of the network of sequence flow between protein folds. Protein space is presented as a directed graph in which a node is a protein shape and the directed edge denotes a flow of sequences from one fold to another. Sequence flow is created when a sequence that is energetically compatible with one structure becomes more compatible with another structure as the result of a single point mutation.
Fig. 2.
Fig. 2.
The log of the number of proteins (the number of nodes in the directed graph) as a function of in-degree (the total number of edges directed into a fold). The in-degree is an indicator of the stability of a particular shape and its ability to “steal” sequences from other structures.
Fig. 3.
Fig. 3.
Sequence retention at the energy E* as a function of the contact density. For every fold, E* is the energy at which the fraction of sequences retained by that fold is maximal. In our model, some proteins retain all sequences at E* and all energy levels below. For other proteins, the fraction of retained sequences reaches a maximum at their E* and then falls again as energy is lowered. Some protein folds even have zero sequence retention rate throughout the energy landscape, meaning that they are almost entirely energetically dominated by other folds.
Fig. 4.
Fig. 4.
A contour plot of the number of strongly connected components in the graph as a function of the log of the cutoff value for establishing an edge (y axis) and as a function of the energy in the range E* and Enat (x axis). An edge is established when the fraction of sequences that flow between one protein to the next exceed a cutoff value. (A strongly connected component of a directed graph is a maximal set inside which every node has a path to every node).
Fig. 5.
Fig. 5.
The density of sequence capacity (without and with competition log[N(En)/20L] and log[C(En)/20L]) as a function of the contact density (the total number of contacts divided by the number of amino acids).

Similar articles

Cited by

References

    1. Saven JG, Wolynes PG. J Phys Chem B. 1997;101:8375–8389.
    1. Shakhnovich E. Fold Des. 1998;3:45–58. - PubMed
    1. Betancourt MR, Thirumalai D. J Phys Chem. 2002;106:599–609.
    1. Lau KF, Dill K. Proc Natl Acad Sci USA. 1990;87:638–642. - PMC - PubMed
    1. Shakhnovich EI. Phys Rev Lett. 1994;72:3907–3911. - PubMed

Publication types

LinkOut - more resources