Nonlinearities in protein space limit the utility of informatics in protein biophysics
- PMID: 26315852
- PMCID: PMC4609284
- DOI: 10.1002/prot.24916
Nonlinearities in protein space limit the utility of informatics in protein biophysics
Abstract
We examine the utility of informatic-based methods in computational protein biophysics. To do so, we use newly developed metric functions to define completely independent sequence and structure spaces for a large database of proteins. By investigating the relationship between these spaces, we demonstrate quantitatively the limits of knowledge-based correlation between the sequences and structures of proteins. It is shown that there are well-defined, nonlinear regions of protein space in which dissimilar structures map onto similar sequences (the conformational switch), and dissimilar sequences map onto similar structures (remote homology). These nonlinearities are shown to be quite common-almost half the proteins in our database fall into one or the other of these two regions. They are not anomalies, but rather intrinsic properties of structural encoding in amino acid sequences. It follows that extreme care must be exercised in using bioinformatic data as a basis for computational structure prediction. The implications of these results for protein evolution are examined.
Keywords: Fourier analysis; conformational switches; distant homology; sequence space; structure space.
© 2015 Wiley Periodicals, Inc.
Figures

Similar articles
-
Filling-in void and sparse regions in protein sequence space by protein-like artificial sequences enables remarkable enhancement in remote homology detection capability.J Mol Biol. 2014 Feb 20;426(4):962-79. doi: 10.1016/j.jmb.2013.11.026. Epub 2013 Dec 4. J Mol Biol. 2014. PMID: 24316367
-
Computing motif correlations in proteins.J Comput Chem. 2003 Dec;24(16):2032-43. doi: 10.1002/jcc.10332. J Comput Chem. 2003. PMID: 14531057
-
Sequence-based protein structure prediction using a reduced state-space hidden Markov model.Comput Biol Med. 2007 Sep;37(9):1211-24. doi: 10.1016/j.compbiomed.2006.10.014. Epub 2006 Dec 11. Comput Biol Med. 2007. PMID: 17161834
-
[An informatic perspective on structural proteomics].Tanpakushitsu Kakusan Koso. 2002 Jun;47(8 Suppl):1058-63. Tanpakushitsu Kakusan Koso. 2002. PMID: 12099023 Review. Japanese. No abstract available.
-
A tour of structural genomics.Nat Rev Genet. 2001 Oct;2(10):801-9. doi: 10.1038/35093574. Nat Rev Genet. 2001. PMID: 11584296 Review.
Cited by
-
Reduced alphabet of prebiotic amino acids optimally encodes the conformational space of diverse extant protein folds.BMC Evol Biol. 2019 Jul 30;19(1):158. doi: 10.1186/s12862-019-1464-6. BMC Evol Biol. 2019. PMID: 31362700 Free PMC article.
-
Global informatics and physical property selection in protein sequences.Proc Natl Acad Sci U S A. 2016 Feb 16;113(7):1808-10. doi: 10.1073/pnas.1525745113. Epub 2016 Feb 1. Proc Natl Acad Sci U S A. 2016. PMID: 26831093 Free PMC article.
-
Design and characterization of a protein fold switching network.Nat Commun. 2023 Jan 26;14(1):431. doi: 10.1038/s41467-023-36065-3. Nat Commun. 2023. PMID: 36702827 Free PMC article.
-
Homology modeling in a dynamical world.Protein Sci. 2017 Nov;26(11):2195-2206. doi: 10.1002/pro.3274. Epub 2017 Sep 28. Protein Sci. 2017. PMID: 28815769 Free PMC article.
-
Application of artificial intelligence and machine learning techniques to the analysis of dynamic protein sequences.Proteins. 2024 Oct;92(10):1234-1241. doi: 10.1002/prot.26704. Epub 2024 May 29. Proteins. 2024. PMID: 38808365
References
-
- Yang Y, Faraggi E, Zhao H, Zhou Y. Improving protein fold recognition and template-based modeling by employing probabilistic-based matching between predicted one-dimensional structural properties of query and corresponding native properties of templates. Bioinformatics. 2011;27:2076–2082. - PMC - PubMed
-
- Ben-Hur A, Brutlag D. Remote homology detection: a motif based approach. Bioinformatics. 2003;19(Suppl 1):i26–i33. - PubMed
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources