New efficient statistical sequence-dependent structure prediction of short to medium-sized protein loops based on an exhaustive loop classification
- PMID: 10373380
- DOI: 10.1006/jmbi.1999.2826
New efficient statistical sequence-dependent structure prediction of short to medium-sized protein loops based on an exhaustive loop classification
Abstract
A bank of 13,563 loops from three to eight amino acid residues long, representing motifs between two consecutive regular secondary structures, has been derived from protein structures presenting less than 95 % sequence identity. Statistical analyses of occurrences of conformations and residues revealed length-dependent over-representations of particular amino acids (glycine, proline, asparagine, serine, and aspartate) and conformations (alphaL, epsilon, betaPregions of the Ramachandran plot). A position-dependent distribution of these occurrences was observed for N and C-terminal residues, which are correlated to the nature of the flanking regions. Loops of the same length were clustered into statistically meaningful families on the basis of their backbone structures when placed in a common reference frame, independent of the flanks. These clusters present significantly different distributions of sequence, conformations, and endpoint residue Calphadistances. On the basis of the sequence-structure correlation of this clustering, an automatic loop modeling algorithm was developed. Based on the knowledge of its sequence and of its flank backbone structures each query loop is assigned to a family and target loop supports are selected in this family. The support backbones of these target loops are then adjusted on flanking structures by partial exploration of the conformational space. Loop closure is performed by energy minimization for each support and the final model is chosen among connected supports based upon energy criteria. The quality of the prediction is evaluated by the root-mean-square deviation (rmsd) between the final model and the native loops when the whole bank is re-attributed on itself with a Jackknife test. This average rmsd ranges from 1.1 A for three-residue loops to 3.8 A for eight-residue loops. A few poorly predicted loops are inescapable, considering the high level of diversity in loops and the lack of environment data. To overcome such modeling problems, a statistical reliability score was assigned for each prediction. This score is correlated to the quality of the prediction, in terms of rmsd, and thus improves the selection accuracy of the model. The algorithm efficiency was compared to CASP3 target loop predictions. Moreover, when tested on a test loop bank, this algorithm was shown to be robust when the loops are not precisely delimited, therefore proving to be a useful tool in practice for protein modeling.
Copyright 1999 Academic Press.
Similar articles
-
A global taxonomy of loops in globular proteins.J Mol Biol. 1996 Jun 21;259(4):855-72. doi: 10.1006/jmbi.1996.0363. J Mol Biol. 1996. PMID: 8683588
-
Conformational analysis and clustering of short and medium size loops connecting regular secondary structures: a database for modeling and prediction.Protein Sci. 1996 Dec;5(12):2600-16. doi: 10.1002/pro.5560051223. Protein Sci. 1996. PMID: 8976569 Free PMC article.
-
PDB-based protein loop prediction: parameters for selection and methods for optimization.J Mol Biol. 1997 Apr 11;267(4):975-1001. doi: 10.1006/jmbi.1996.0857. J Mol Biol. 1997. PMID: 9135125
-
[A turning point in the knowledge of the structure-function-activity relations of elastin].J Soc Biol. 2001;195(2):181-93. J Soc Biol. 2001. PMID: 11727705 Review. French.
-
[Distribution of amino acid residue conformation in three-dimensional protein structures. Analysis of the non-glycine residues in "positive" conformations].Biofizika. 1997 May-Jun;42(3):753-64. Biofizika. 1997. PMID: 9296639 Review. Russian.
Cited by
-
Modeling of loops in protein structures.Protein Sci. 2000 Sep;9(9):1753-73. doi: 10.1110/ps.9.9.1753. Protein Sci. 2000. PMID: 11045621 Free PMC article.
-
Ab initio construction of all-atom loop conformations.J Mol Model. 2006 Jan;12(2):221-8. doi: 10.1007/s00894-005-0030-x. Epub 2005 Oct 25. J Mol Model. 2006. PMID: 16247602
-
"Pinning strategy": a novel approach for predicting the backbone structure in terms of protein blocks from sequence.J Biosci. 2007 Jan;32(1):51-70. doi: 10.1007/s12038-007-0006-3. J Biosci. 2007. PMID: 17426380
-
Including Functional Annotations and Extending the Collection of Structural Classifications of Protein Loops (ArchDB).Bioinform Biol Insights. 2009 Nov 24;1:77-90. Bioinform Biol Insights. 2009. PMID: 20066127 Free PMC article.
-
RPBS: a web resource for structural bioinformatics.Nucleic Acids Res. 2005 Jul 1;33(Web Server issue):W44-9. doi: 10.1093/nar/gki477. Nucleic Acids Res. 2005. PMID: 15980507 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Research Materials