Prediction of protein secondary structure content for the twilight zone sequences
- PMID: 17623861
- DOI: 10.1002/prot.21527
Prediction of protein secondary structure content for the twilight zone sequences
Abstract
Secondary protein structure carries information about local structural arrangements, which include three major conformations: alpha-helices, beta-strands, and coils. Significant majority of successful methods for prediction of the secondary structure is based on multiple sequence alignment. However, multiple alignment fails to provide accurate results when a sequence comes from the twilight zone, that is, it is characterized by low (<30%) homology. To this end, we propose a novel method for prediction of secondary structure content through comprehensive sequence representation, called PSSC-core. The method uses a multiple linear regression model and introduces a comprehensive feature-based sequence representation to predict amount of helices and strands for sequences from the twilight zone. The PSSC-core method was tested and compared with two other state-of-the-art prediction methods on a set of 2187 twilight zone sequences. The results indicate that our method provides better predictions for both helix and strand content. The PSSC-core is shown to provide statistically significantly better results when compared with the competing methods, reducing the prediction error by 5-7% for helix and 7-9% for strand content predictions. The proposed feature-based sequence representation uses a comprehensive set of physicochemical properties that are custom-designed for each of the helix and strand content predictions. It includes composition and composition moment vectors, frequency of tetra-peptides associated with helical and strand conformations, various property-based groups like exchange groups, chemical groups of the side chains and hydrophobic group, auto-correlations based on hydrophobicity, side-chain masses, hydropathy, and conformational patterns for beta-sheets. The PSSC-core method provides an alternative for predicting the secondary structure content that can be used to validate and constrain results of other structure prediction methods. At the same time, it also provides useful insight into design of successful protein sequence representations that can be used in developing new methods related to prediction of different aspects of the secondary protein structure.
(c) 2007 Wiley-Liss, Inc.
Similar articles
-
Highly accurate and consistent method for prediction of helix and strand content from primary protein sequences.Artif Intell Med. 2005 Sep-Oct;35(1-2):19-35. doi: 10.1016/j.artmed.2005.02.006. Artif Intell Med. 2005. PMID: 16081261
-
Prediction of protein structural class for the twilight zone sequences.Biochem Biophys Res Commun. 2007 Jun 1;357(2):453-60. doi: 10.1016/j.bbrc.2007.03.164. Epub 2007 Apr 5. Biochem Biophys Res Commun. 2007. PMID: 17433260
-
Tubulin secondary structure analysis, limited proteolysis sites, and homology to FtsZ.Biochemistry. 1996 Nov 12;35(45):14203-15. doi: 10.1021/bi961357b. Biochemistry. 1996. PMID: 8916905
-
Predicting the conformation of proteins from sequences. Progress and future progress.J Mol Recognit. 1995 Jan-Apr;8(1-2):9-28. doi: 10.1002/jmr.300080104. J Mol Recognit. 1995. PMID: 7598957 Review.
-
Prediction of protein structure from amino acid sequence.Anticancer Drug Des. 1986 Nov;1(3):169-78. Anticancer Drug Des. 1986. PMID: 3329910 Review.
Cited by
-
qNABpredict: Quick, accurate, and taxonomy-aware sequence-based prediction of content of nucleic acid binding amino acids.Protein Sci. 2023 Jan;32(1):e4544. doi: 10.1002/pro.4544. Protein Sci. 2023. PMID: 36519304 Free PMC article.
-
Using amino acid physicochemical distance transformation for fast protein remote homology detection.PLoS One. 2012;7(9):e46633. doi: 10.1371/journal.pone.0046633. Epub 2012 Sep 28. PLoS One. 2012. PMID: 23029559 Free PMC article.
-
General overview on structure prediction of twilight-zone proteins.Theor Biol Med Model. 2015 Sep 4;12:15. doi: 10.1186/s12976-015-0014-1. Theor Biol Med Model. 2015. PMID: 26338054 Free PMC article. Review.
-
SCPRED: accurate prediction of protein structural class for sequences of twilight-zone similarity with predicting sequences.BMC Bioinformatics. 2008 May 1;9:226. doi: 10.1186/1471-2105-9-226. BMC Bioinformatics. 2008. PMID: 18452616 Free PMC article.
-
On the relation between the predicted secondary structure and the protein size.Protein J. 2008 Jun;27(4):234-9. doi: 10.1007/s10930-008-9129-0. Protein J. 2008. PMID: 18299971
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources