Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2010 Aug 1;78(10):2338-48.
doi: 10.1002/prot.22746.

Improving computational protein design by using structure-derived sequence profile

Affiliations
Comparative Study

Improving computational protein design by using structure-derived sequence profile

Liang Dai et al. Proteins. .

Abstract

Designing a protein sequence that will fold into a predefined structure is of both practical and fundamental interest. Many successful, computational designs in the last decade resulted from improved understanding of hydrophobic and polar interactions between side chains of amino acid residues in stabilizing protein tertiary structures. However, the coupling between main-chain backbone structure and local sequence has yet to be fully addressed. Here, we attempt to account for such coupling by using a sequence profile derived from the sequences of five residue fragments in a fragment library that are structurally matched to the five-residue segments contained in a target structure. We further introduced a term to reduce low complexity regions of designed sequences. These two terms together with optimized reference states for amino-acid residues were implemented in the RosettaDesign program. The new method, called RosettaDesign-SR, makes a 12% increase (from 34 to 46%) in fraction of proteins whose designed sequences are more than 35% identical to wild-type sequences. Meanwhile, it reduces 8% (from 22% to 14%) to the number of designed sequences that are not homologous to any known protein sequences according to psi-blast. More importantly, the sequences designed by RosettaDesign-SR have 2-3% more polar residues at the surface and core regions of proteins and these surface and core polar residues have about 4% higher sequence identity to wild-type sequences than by RosettaDesign. Thus, the proteins designed by RosettaDesign-SR should be less likely to aggregate and more likely to have unique structures due to more specific polar interactions.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Population density of protein pairs as a function of the sequence identity between a protein pair calculated by FASTA (X axis) and the TM-score describing the structural similarity between a pair of proteins (Y axis). The population density is normalized for each sequence identity. The population density is described in the unit of the population density at a uniform distribution. The initial data is a 18 × 20 grid with sequence identity stepsize of 0.05 and TMscore stepsize of 0.05 based on pairwise comparison of 6665 proteins. The data is further smoothed by interpolating to a grid of 900*1000.
Figure 2
Figure 2
Average sequence identity between designed sequences and wild type sequences for 33 training proteins as a function of the weight wprofile for the sequence profile term.
Figure 3
Figure 3
The frequency of 20 amino acid residues in sequences of 944 test proteins. Black bar corresponds to wild-type sequences. Gray bar corresponds to designed sequences by the original RosettaDesign. Open bar corresponds to designed sequence by RosettaDesign-SR.
Figure 4
Figure 4
The sequence identity between top-1 ranked, designed sequences for 944 proteins and wild-type sequences as a function of fraction of surface residues (residue contact number <9). Open circles are those designed sequences that do not have any hit in a psi-blast search.
Figure 5
Figure 5
The sequence identity between designed and wild-type sequences versus the highest sequence identity between the designed and the sequence obtained by performing a psi-blast search for designed sequences. Circle denotes the sequences designed by RosettaDesign-SR, and plus denotes the sequences designed by original RosettaDesign. Points with zero sequence identity between the designed and the sequence obtained by performing a psi-blast search for designed sequences refer to those designed sequences without any hits from a psi-blast search.

Similar articles

Cited by

References

    1. Dahiyat BI, Mayo SL. De novo protein design: fully automated sequence selection. Science. 1997;278:82–87. - PubMed
    1. Harbury PB, Plecs JJ, Tidor B, Alber T, Kim PS. High-resolution protein design with backbone freedom. Science. 1998;282:1462–1467. - PubMed
    1. Bryson JW, Desjarlais JR, Handel TM, DeGrado WF. From coiled coils to small globular proteins: design of a native-like three-helix bundle. Protein Sci. 1998;7:1404–1414. - PMC - PubMed
    1. Walsh ST, Cheng H, Bryson JW, Roder H, DeGrado WF. Solution structure and dynamics of a de novo designed three-helix bundle protein. Proc Natl Acad Sci USA. 1999;96:5486–5491. - PMC - PubMed
    1. Shah PS, Hom GK, Ross SA, Lassila JK, Crowhurst KA, Mayo SL. Full-sequence computational design and solution structure of a thermostable protein variant. J Mol Biol. 2007;372:1–6. - PubMed

Publication types