De novo structure generation using chemical shifts for proteins with high-sequence identity but different folds

Yang Shen¹, Philip N Bryan, Yanan He, John Orban, David Baker, Ad Bax

Affiliations

Affiliation

¹ Laboratory of Chemical Physics, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland 20892-0520, USA.

PMID: 19998407
PMCID: PMC2865713
DOI: 10.1002/pro.303

De novo structure generation using chemical shifts for proteins with high-sequence identity but different folds

Yang Shen et al. Protein Sci. 2010 Feb.

. 2010 Feb;19(2):349-56.

doi: 10.1002/pro.303.

Authors

Yang Shen¹, Philip N Bryan, Yanan He, John Orban, David Baker, Ad Bax

Affiliation

¹ Laboratory of Chemical Physics, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland 20892-0520, USA.

PMID: 19998407
PMCID: PMC2865713
DOI: 10.1002/pro.303

Abstract

Proteins with high-sequence identity but very different folds present a special challenge to sequence-based protein structure prediction methods. In particular, a 56-residue three-helical bundle protein (GA(95)) and an alpha/beta-fold protein (GB(95)), which share 95% sequence identity, were targets in the CASP-8 structure prediction contest. With only 12 out of 300 submitted server-CASP8 models for GA(95) exhibiting the correct fold, this protein proved particularly challenging despite its small size. Here, we demonstrate that the information contained in NMR chemical shifts can readily be exploited by the CS-Rosetta structure prediction program and yields adequate convergence, even when input chemical shifts are limited to just amide (1)H(N) and (15)N or (1)H(N) and (1)H(alpha) values.

PubMed Disclaimer

Figures

**Figure 1**
Amino acid sequences of GA^wt, GB^wt, and their variants. The secondary structure of GA^wt and GB^wt, as identified by DSSP for GA⁸⁸ (PDB entry 2JWS) and GB⁸⁸ (2JWU), is indicated at the top and bottom of the figure, respectively. Residues that exhibit high-local disorder in the experimental NMR structures (>0.5 Å backbone atom rmsd for the tripeptides centered at this residue) are italicized. Residues that are changed from their wild-type sequences are highlighted in red and cyan for the variants of GA^wt and GB^wt, respectively. The unique amino acids in the variant pairs of GA⁹⁵ and GB⁹⁵, GA⁸⁸ and GB⁸⁸ are highlighted in yellow and green, respectively.

**Figure 2**
Quality of Rosetta/CS-Rosetta fragments used as input for deriving GA and GB models, shown as plots of the lowest (lines with dots) and average (bold lines) backbone coordinate rmsd's (N, C^α, and C′) between any given segment in the experimental structure and 200 nine-residue (upper panel)/three-residue (lower panel) fragments, as a function of starting position of the query segment. Results from the standard Rosetta fragment selection method are plotted in black, whereas those selected using the standard MFR method with chemical shifts are displayed in red. (A) GA^wt; (B) GB^wt; (C) GA⁸⁸; (D) GB⁸⁸; (E) GA⁹⁵; and (F) GB⁹⁵. Note that for nine-residue fragments, the last residue starting number in the 56-residue protein is 48, whereas for three-residue fragments, the last starting position is 54.

**Figure 3**
CS-Rosetta structure generation for proteins GA^wt, GB^wt and variants GA^88/95 and GB^88/95. (A–F) Plot of Rosetta all-atom energy, rescored by using the input chemical shifts, versus C^α rmsd relative to the experimental structure, for all CS-Rosetta models of proteins GA^wt (A), GB^wt (B), GA⁸⁸ (C), GB⁸⁸ (D), GA⁹⁵ (E), and GB⁹⁵ (F). Following the protocol of Shen *et al*., for all models the residues identified as disordered based on their RCI-derived order parameter (e.g., 1–8 and 52–56 in GA⁸⁸) are excluded from the calculation of the C^α rmsd and from the Rosetta energy during model selection. Backbone ribbon representation of the lowest-energy CS-Rosetta structure (red) superimposed on the experimental structure (blue) of proteins is shown at the lower right corner of each panel. (A′–F′) Analogous plots of Rosetta all-atom energy, rescored by using the input chemical shifts (δ¹H^α and δ¹H^N only), for the CS-Rosetta models obtained when using only ¹H chemical shifts.

See this image and copyright information in PMC

References

1. Burley SK. An overview of structural genomics. Nat Struct Biol. 2000;7:932–934. - PubMed
1. Marti-Renom MA, Stuart AC, Fiser A, Sanchez R, Melo F, Sali A. Comparative protein structure modeling of genes and genomes. Annu Rev Biophys Biomol Struct. 2000;29:291–325. - PubMed
1. Domingues FS, Lackner P, Andreeva A, Sippl MJ. Structure-based evaluation of sequence comparison and fold recognition alignment accuracy. J Mol Biol. 2000;297:1003–1013. - PubMed
1. Baker D, Sali A. Protein structure prediction and structural genomics. Science. 2001;294:93–96. - PubMed
1. Das R, Baker D. Macromolecular modeling with Rosetta. Annu Rev Biochem. 2008;77:363–382. - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

De novo structure generation using chemical shifts for proteins with high-sequence identity but different folds

Affiliation

De novo structure generation using chemical shifts for proteins with high-sequence identity but different folds

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Miscellaneous