Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2006 Jul 7;2(7):e85.
doi: 10.1371/journal.pcbi.0020085. Epub 2006 May 26.

Emergence of protein fold families through rational design

Affiliations

Emergence of protein fold families through rational design

Feng Ding et al. PLoS Comput Biol. .

Abstract

Diverse proteins with similar structures are grouped into families of homologs and analogs, if their sequence similarity is higher or lower, respectively, than 20%-30%. It was suggested that protein homologs and analogs originate from a common ancestor and diverge in their distinct evolutionary time scales, emerging as a consequence of the physical properties of the protein sequence space. Although a number of studies have determined key signatures of protein family organization, the sequence-structure factors that differentiate the two evolution-related protein families remain unknown. Here, we stipulate that subtle structural changes, which appear due to accumulating mutations in the homologous families, lead to distinct packing of the protein core and, thus, novel compositions of core residues. The latter process leads to the formation of distinct families of homologs. We propose that such differentiation results in the formation of analogous families. To test our postulate, we developed a molecular modeling and design toolkit, Medusa, to computationally design protein sequences that correspond to the same fold family. We find that analogous proteins emerge when a backbone structure deviates only 1-2 angstroms root-mean-square deviation from the original structure. For close homologs, core residues are highly conserved. However, when the overall sequence similarity drops to approximately 25%-30%, the composition of core residues starts to diverge, thereby forming novel families of protein homologs. This direct observation of the formation of protein homologs within a specific fold family supports our hypothesis. The conservation of amino acids in designed sequences recapitulates that of the naturally occurring sequences, thereby validating our computational design methodology.

PubMed Disclaimer

Conflict of interest statement

Competing interests. The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. The Native (Green) and Redesigned (Cyan) Chey Protein (PDB: 3chy), Using Medusa Fixed-Backbone Redesign
The backbone structure is shown in cartoon, and the sidechains of recapitulated residues are shown in stick representation.
Figure 2
Figure 2. The Sequence Entropy Computed from Simulations versus the Naturally Occurring Sequence Entropy Computed from HSSP
Three families of protein homologs were studied: HPR domain (A,D,G), ROSSMAN fold (B,E,H), and SH3 domain (C,F,I). The open circles (○) in (A–F) correspond to the functionally important residues. In (G–I), these functionally important residues are shown in stick representation. In (G,H), the SO4 2− ions are used to mimic the phosphate anion in crystal preparation. In (I), the poly-proline peptide are shown in yellow and the peptide-binding residues form a continuous surface, shown in mesh representation.
Figure 3
Figure 3. The Sequence Identity for the Constructed Homologous Structures
Three different protein folds are studied: HPR domain (A,B), ROSSMAN fold (D,E), and SH3 domain (G,H). (A,C,E) The sequence identities of the redesigned proteins using the flexible-backbone design simulation are presented as the function of the backbone-RMSD from the reference protein. (B,D,F) The sequence identity of the core is also plotted against the overall sequence identity. The “twilight zone” of sequence identity (20%–30%) corresponds to regions between horizontal (A,C,E) or vertical (B,D,F) lines.

Similar articles

Cited by

References

    1. Levitt M, Chothia C. Structural patterns in globular proteins. Nature. 1976;261:552–558. - PubMed
    1. Govindarajan S, Goldstein RA. Why are some proteins structures so common? Proc Natl Acad Sci U S A. 1996;93:3341–3345. - PMC - PubMed
    1. Orengo CA, Jones DT, Thornton JM. Protein superfamilies and domain superfolds. Nature. 1994;372:631–634. - PubMed
    1. Govindarajan S, Goldstein RA. The foldability landscape of model proteins. Biopolymers. 1997;42:427–438. - PubMed
    1. Finkelstein AV, Gutun AM, Badretdinov AY. Why are the same protein folds used to perform different functions? FEBS Lett. 1993;325:23–28. - PubMed

Publication types