Nonlinearities in protein space limit the utility of informatics in protein biophysics

S Rackovsky^{1

2}

Affiliations

¹ Department of Chemistry and Chemical Biology, Cornell University, Ithaca, New York, 14853.
² Department of Pharmacology and Systems Therapeutics, Icahn School of Medicine at Mount Sinai, New York, New York, 10029.

PMID: 26315852
PMCID: PMC4609284
DOI: 10.1002/prot.24916

Nonlinearities in protein space limit the utility of informatics in protein biophysics

S Rackovsky. Proteins. 2015 Nov.

. 2015 Nov;83(11):1923-8.

doi: 10.1002/prot.24916. Epub 2015 Sep 10.

Author

S Rackovsky^{1

2}

Affiliations

¹ Department of Chemistry and Chemical Biology, Cornell University, Ithaca, New York, 14853.
² Department of Pharmacology and Systems Therapeutics, Icahn School of Medicine at Mount Sinai, New York, New York, 10029.

PMID: 26315852
PMCID: PMC4609284
DOI: 10.1002/prot.24916

Abstract

We examine the utility of informatic-based methods in computational protein biophysics. To do so, we use newly developed metric functions to define completely independent sequence and structure spaces for a large database of proteins. By investigating the relationship between these spaces, we demonstrate quantitatively the limits of knowledge-based correlation between the sequences and structures of proteins. It is shown that there are well-defined, nonlinear regions of protein space in which dissimilar structures map onto similar sequences (the conformational switch), and dissimilar sequences map onto similar structures (remote homology). These nonlinearities are shown to be quite common-almost half the proteins in our database fall into one or the other of these two regions. They are not anomalies, but rather intrinsic properties of structural encoding in amino acid sequences. It follows that extreme care must be exercised in using bioinformatic data as a basis for computational structure prediction. The implications of these results for protein evolution are examined.

Keywords: Fourier analysis; conformational switches; distant homology; sequence space; structure space.

PubMed Disclaimer

Figures

**Figure 1**
The environment space for the 12011 proteins in our database. The average structure distance between a protein P and its 20 nearest sequence neighbors is plotted against the corresponding average sequence distance. Variables are shown as centered values, X−X̄, where the overbar denotes a global average over all the proteins of the dataset. Positive values are therefore greater than average, and negative values are less than average.

See this image and copyright information in PMC

References

1. Yang Y, Faraggi E, Zhao H, Zhou Y. Improving protein fold recognition and template-based modeling by employing probabilistic-based matching between predicted one-dimensional structural properties of query and corresponding native properties of templates. Bioinformatics. 2011;27:2076–2082. - PMC - PubMed
1. Ben-Hur A, Brutlag D. Remote homology detection: a motif based approach. Bioinformatics. 2003;19(Suppl 1):i26–i33. - PubMed
1. Alexander PA, He Y, Chen Y, Orban J, Bryan PN. A minimal sequence code for switching protein structure and function. Proc Nat Acad Sci USA. 2009;106:21149–21154. - PMC - PubMed
1. Anderson TA, Cordes MH, Sauer RT. Sequence Determinants of a conformational switch in a protein structure. Proc Nat Acad Sci USA. 2005;102:18344–18349. - PMC - PubMed
1. Roessler CG, et al. Transitive homology-guided structural studies lead to discovery of Cro proteins with 40% sequence identity but different folds. Proc Nat Acad Sci USA. 2008;105:2343–2348. - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Nonlinearities in protein space limit the utility of informatics in protein biophysics

Affiliations

Nonlinearities in protein space limit the utility of informatics in protein biophysics

Author

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources