Effects of amino acid composition, finite size of proteins, and sparse statistics on distance-dependent statistical pair potentials
- PMID: 17335003
- DOI: 10.1002/prot.21279
Effects of amino acid composition, finite size of proteins, and sparse statistics on distance-dependent statistical pair potentials
Abstract
Statistical distance dependent pair potentials are frequently used in a variety of folding, threading, and modeling studies of proteins. The applicability of these types of potentials is tightly connected to the reliability of statistical observations. We explored the possible origin and extent of false positive signals in statistical potentials by analyzing their distance dependence in a variety of randomized protein-like models. While on average potentials derived from such models are expected to equal zero at any distance, we demonstrate that systematic and significant distortions exist. These distortions originate from the limited statistical counts in local environments of proteins and from the limited size of protein structures at large distances. We suggest that these systematic errors in statistical potentials are connected to the dependence of amino acid composition on protein size and to variation in protein sizes. Additionally, atom-based potentials are dominated by a false positive signal that is due to correlation among distances measured from atoms of one residue to atoms of another residue. The significance of residue-based pairwise potentials at various spatial pair separations was assessed in this study and it was found that as few as approximately 50% of potential values were statistically significant at distances below 4 A, and only at most approximately 80% of them were significant at larger pair separations. A new definition for reference state, free of the observed systematic errors, is suggested. It has been demonstrated to generate statistical potentials that compare favorably to other publicly available ones.
2007 Wiley-Liss, Inc.
Similar articles
-
Statistical potentials extracted from protein structures: how accurate are they?J Mol Biol. 1996 Mar 29;257(2):457-69. doi: 10.1006/jmbi.1996.0175. J Mol Biol. 1996. PMID: 8609636
-
Protein refolding in silico with atom-based statistical potentials and conformational search using a simple genetic algorithm.J Mol Biol. 2006 Jun 23;359(5):1456-67. doi: 10.1016/j.jmb.2006.04.033. Epub 2006 Apr 27. J Mol Biol. 2006. PMID: 16678202
-
Discriminative ability with respect to amino acid types: assessing the performance of knowledge-based potentials without threading.Proteins. 2002 Nov 1;49(2):266-84. doi: 10.1002/prot.10211. Proteins. 2002. PMID: 12211006
-
Computational modeling of protein mutant stability: analysis and optimization of statistical potentials and structural features reveal insights into prediction model development.BMC Struct Biol. 2007 Aug 16;7:54. doi: 10.1186/1472-6807-7-54. BMC Struct Biol. 2007. PMID: 17705837 Free PMC article.
-
First passage time analysis of protein folding via nucleation and of barrierless protein denaturation.Adv Colloid Interface Sci. 2009 Feb 28;146(1-2):18-30. doi: 10.1016/j.cis.2008.09.006. Epub 2008 Oct 2. Adv Colloid Interface Sci. 2009. PMID: 19006782 Review.
Cited by
-
3dRNAscore: a distance and torsion angle dependent evaluation function of 3D RNA structures.Nucleic Acids Res. 2015 May 26;43(10):e63. doi: 10.1093/nar/gkv141. Epub 2015 Feb 24. Nucleic Acids Res. 2015. PMID: 25712091 Free PMC article.
-
What is the best reference state for building statistical potentials in RNA 3D structure evaluation?RNA. 2019 Jul;25(7):793-812. doi: 10.1261/rna.069872.118. Epub 2019 Apr 17. RNA. 2019. PMID: 30996105 Free PMC article.
-
Development of a motif-based topology-independent structure comparison method to identify evolutionarily related folds.Proteins. 2016 Dec;84(12):1859-1874. doi: 10.1002/prot.25169. Epub 2016 Oct 11. Proteins. 2016. PMID: 27671894 Free PMC article.
-
Modeling proteins using a super-secondary structure library and NMR chemical shift information.Structure. 2013 Jun 4;21(6):891-9. doi: 10.1016/j.str.2013.04.012. Epub 2013 May 16. Structure. 2013. PMID: 23685209 Free PMC article.
-
A large-scale conformation sampling and evaluation server for protein tertiary structure prediction and its assessment in CASP11.BMC Bioinformatics. 2015 Oct 23;16:337. doi: 10.1186/s12859-015-0775-x. BMC Bioinformatics. 2015. PMID: 26493701 Free PMC article.
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources