Improved prediction for N-termini of alpha-helices using empirical information
- PMID: 15340919
- DOI: 10.1002/prot.20218
Improved prediction for N-termini of alpha-helices using empirical information
Abstract
The prediction of the secondary structure of proteins from their amino acid sequences remains a key component of many approaches to the protein folding problem. The most abundant form of regular secondary structure in proteins is the alpha-helix, in which specific residue preferences exist at the N-terminal locations. Propensities derived from these observed amino acid frequencies in the Protein Data Bank (PDB) database correlate well with experimental free energies measured for residues at different N-terminal positions in alanine-based peptides. We report a novel method to exploit this data to improve protein secondary structure prediction through identification of the correct N-terminal sequences in alpha-helices, based on existing popular methods for secondary structure prediction. With this algorithm, the number of correctly predicted alpha-helix start positions was improved from 30% to 38%, while the overall prediction accuracy (Q3) remained the same, using cross-validated testing. Although the algorithm was developed and tested on multiple sequence alignment-based secondary structure predictions, it was also able to improve the predictions of start locations by methods that use single sequences to make their predictions. Furthermore, the residue frequencies at N-terminal positions of the improved predictions better reflect those seen at the N-terminal positions of alpha-helices in proteins. This has implications for areas such as comparative modeling, where a more accurate prediction of the N-terminal regions of alpha-helices should benefit attempts to model adjacent loop regions. The algorithm is available as a Web tool, located at http://rocky.bms.umist.ac.uk/elephant.
Copyright 2004 Wiley-Liss, Inc.
Similar articles
-
Protein structure prediction begins well but ends badly.Proteins. 2010 Apr;78(5):1282-90. doi: 10.1002/prot.22646. Proteins. 2010. PMID: 20014025
-
The influence of gapped positions in multiple sequence alignments on secondary structure prediction methods.Comput Biol Chem. 2004 Dec;28(5-6):351-66. doi: 10.1016/j.compbiolchem.2004.09.005. Comput Biol Chem. 2004. PMID: 15556476
-
Combining evolutionary and structural information for local protein structure prediction.Proteins. 2004 Sep 1;56(4):782-94. doi: 10.1002/prot.20158. Proteins. 2004. PMID: 15281130
-
Sequence comparison and protein structure prediction.Curr Opin Struct Biol. 2006 Jun;16(3):374-84. doi: 10.1016/j.sbi.2006.05.006. Epub 2006 May 19. Curr Opin Struct Biol. 2006. PMID: 16713709 Review.
-
Moment-based prediction of DNA-binding proteins.J Mol Biol. 2004 Jul 30;341(1):65-71. doi: 10.1016/j.jmb.2004.05.058. J Mol Biol. 2004. PMID: 15312763 Review.
Cited by
-
Position-specific propensities of amino acids in the β-strand.BMC Struct Biol. 2010 Sep 28;10:29. doi: 10.1186/1472-6807-10-29. BMC Struct Biol. 2010. PMID: 20920153 Free PMC article.
-
Sixty-five years of the long march in protein secondary structure prediction: the final stretch?Brief Bioinform. 2018 May 1;19(3):482-494. doi: 10.1093/bib/bbw129. Brief Bioinform. 2018. PMID: 28040746 Free PMC article.
-
Folding by numbers: primary sequence statistics and their use in studying protein folding.Int J Mol Sci. 2009 Apr 8;10(4):1567-1589. doi: 10.3390/ijms10041567. Int J Mol Sci. 2009. PMID: 19468326 Free PMC article. Review.
-
Synonymous codon usage influences the local protein structure observed.Nucleic Acids Res. 2010 Oct;38(19):6719-28. doi: 10.1093/nar/gkq495. Epub 2010 Jun 8. Nucleic Acids Res. 2010. PMID: 20530529 Free PMC article.
MeSH terms
Substances
LinkOut - more resources
Full Text Sources