Improving computational protein design by using structure-derived sequence profile
- PMID: 20544969
- PMCID: PMC3058783
- DOI: 10.1002/prot.22746
Improving computational protein design by using structure-derived sequence profile
Abstract
Designing a protein sequence that will fold into a predefined structure is of both practical and fundamental interest. Many successful, computational designs in the last decade resulted from improved understanding of hydrophobic and polar interactions between side chains of amino acid residues in stabilizing protein tertiary structures. However, the coupling between main-chain backbone structure and local sequence has yet to be fully addressed. Here, we attempt to account for such coupling by using a sequence profile derived from the sequences of five residue fragments in a fragment library that are structurally matched to the five-residue segments contained in a target structure. We further introduced a term to reduce low complexity regions of designed sequences. These two terms together with optimized reference states for amino-acid residues were implemented in the RosettaDesign program. The new method, called RosettaDesign-SR, makes a 12% increase (from 34 to 46%) in fraction of proteins whose designed sequences are more than 35% identical to wild-type sequences. Meanwhile, it reduces 8% (from 22% to 14%) to the number of designed sequences that are not homologous to any known protein sequences according to psi-blast. More importantly, the sequences designed by RosettaDesign-SR have 2-3% more polar residues at the surface and core regions of proteins and these surface and core polar residues have about 4% higher sequence identity to wild-type sequences than by RosettaDesign. Thus, the proteins designed by RosettaDesign-SR should be less likely to aggregate and more likely to have unique structures due to more specific polar interactions.
(c) 2010 Wiley-Liss, Inc.
Figures





Similar articles
-
Direct prediction of profiles of sequences compatible with a protein structure by neural networks with fragment-based local and energy-based nonlocal profiles.Proteins. 2014 Oct;82(10):2565-73. doi: 10.1002/prot.24620. Epub 2014 Jun 19. Proteins. 2014. PMID: 24898915 Free PMC article.
-
Identification of amino acids involved in protein structural uniqueness: implication for de novo protein design.Protein Eng. 2002 Jul;15(7):555-60. doi: 10.1093/protein/15.7.555. Protein Eng. 2002. PMID: 12200537
-
Solution structure of a de novo protein from a designed combinatorial library.Proc Natl Acad Sci U S A. 2003 Nov 11;100(23):13270-3. doi: 10.1073/pnas.1835644100. Epub 2003 Oct 30. Proc Natl Acad Sci U S A. 2003. PMID: 14593201 Free PMC article.
-
The construction of an amino acid network for understanding protein structure and function.Amino Acids. 2014 Jun;46(6):1419-39. doi: 10.1007/s00726-014-1710-6. Epub 2014 Mar 13. Amino Acids. 2014. PMID: 24623120 Review.
-
Energy functions in de novo protein design: current challenges and future prospects.Annu Rev Biophys. 2013;42:315-35. doi: 10.1146/annurev-biophys-083012-130315. Epub 2013 Feb 28. Annu Rev Biophys. 2013. PMID: 23451890 Free PMC article. Review.
Cited by
-
ProDCoNN: Protein design using a convolutional neural network.Proteins. 2020 Jul;88(7):819-829. doi: 10.1002/prot.25868. Epub 2020 Jan 6. Proteins. 2020. PMID: 31867753 Free PMC article.
-
Use of designed sequences in protein structure recognition.Biol Direct. 2018 May 9;13(1):8. doi: 10.1186/s13062-018-0209-6. Biol Direct. 2018. PMID: 29776380 Free PMC article.
-
The 3-ketoacyl-CoA thiolase: an engineered enzyme for carbon chain elongation of chemical compounds.Appl Microbiol Biotechnol. 2020 Oct;104(19):8117-8129. doi: 10.1007/s00253-020-10848-w. Epub 2020 Aug 24. Appl Microbiol Biotechnol. 2020. PMID: 32830293 Review.
-
DDIG-in: discriminating between disease-associated and neutral non-frameshifting micro-indels.Genome Biol. 2013 Mar 13;14(3):R23. doi: 10.1186/gb-2013-14-3-r23. Genome Biol. 2013. PMID: 23497682 Free PMC article.
-
Characterizing the existing and potential structural space of proteins by large-scale multiple loop permutations.J Mol Biol. 2011 May 6;408(3):585-95. doi: 10.1016/j.jmb.2011.02.056. Epub 2011 Mar 2. J Mol Biol. 2011. PMID: 21376059 Free PMC article.
References
-
- Dahiyat BI, Mayo SL. De novo protein design: fully automated sequence selection. Science. 1997;278:82–87. - PubMed
-
- Harbury PB, Plecs JJ, Tidor B, Alber T, Kim PS. High-resolution protein design with backbone freedom. Science. 1998;282:1462–1467. - PubMed
-
- Shah PS, Hom GK, Ross SA, Lassila JK, Crowhurst KA, Mayo SL. Full-sequence computational design and solution structure of a thermostable protein variant. J Mol Biol. 2007;372:1–6. - PubMed
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Research Materials