The influence of gapped positions in multiple sequence alignments on secondary structure prediction methods
- PMID: 15556476
- DOI: 10.1016/j.compbiolchem.2004.09.005
The influence of gapped positions in multiple sequence alignments on secondary structure prediction methods
Abstract
All currently leading protein secondary structure prediction methods use a multiple protein sequence alignment to predict the secondary structure of the top sequence. In most of these methods, prior to prediction, alignment positions showing a gap in the top sequence are deleted, consequently leading to shrinking of the alignment and loss of position-specific information. In this paper we investigate the effect of this removal of information on secondary structure prediction accuracy. To this end, we have designed SymSSP, an algorithm that post-processes the predicted secondary structure of all sequences in a multiple sequence alignment by (i) making use of the alignment's evolutionary information and (ii) re-introducing most of the information that would otherwise be lost. The post-processed information is then given to a new dynamic programming routine that produces an optimally segmented consensus secondary structure for each of the multiple alignment sequences. We have tested our method on the state-of-the-art secondary structure prediction methods PHD, PROFsec, SSPro2 and JNET using the HOMSTRAD database of reference alignments. Our consensus-deriving dynamic programming strategy is consistently better at improving the segmentation quality of the predictions compared to the commonly used majority voting technique. In addition, we have applied several weighting schemes from the literature to our novel consensus-deriving dynamic programming routine. Finally, we have investigated the level of noise introduced by prediction errors into the consensus and show that predictions of edges of helices and strands are half the time wrong for all the four tested prediction methods.
Similar articles
-
Protein secondary structure prediction using local alignments.J Mol Biol. 1997 Apr 25;268(1):31-6. doi: 10.1006/jmbi.1997.0958. J Mol Biol. 1997. PMID: 9149139
-
PROMALS: towards accurate multiple sequence alignments of distantly related proteins.Bioinformatics. 2007 Apr 1;23(7):802-8. doi: 10.1093/bioinformatics/btm017. Epub 2007 Jan 31. Bioinformatics. 2007. PMID: 17267437
-
MUSTANG: a multiple structural alignment algorithm.Proteins. 2006 Aug 15;64(3):559-74. doi: 10.1002/prot.20921. Proteins. 2006. PMID: 16736488
-
A decade of CASP: progress, bottlenecks and prognosis in protein structure prediction.Curr Opin Struct Biol. 2005 Jun;15(3):285-9. doi: 10.1016/j.sbi.2005.05.011. Curr Opin Struct Biol. 2005. PMID: 15939584 Review.
-
Multiple sequence alignment.Curr Opin Struct Biol. 2006 Jun;16(3):368-73. doi: 10.1016/j.sbi.2006.04.004. Epub 2006 May 5. Curr Opin Struct Biol. 2006. PMID: 16679011 Review.
Cited by
-
Integrity of the P-site is probed during maturation of the 60S ribosomal subunit.J Cell Biol. 2012 Jun 11;197(6):747-59. doi: 10.1083/jcb.201112131. J Cell Biol. 2012. PMID: 22689654 Free PMC article.
-
A protein structural study based on the centrality analysis of protein sequence feature networks.PLoS One. 2021 Mar 29;16(3):e0248861. doi: 10.1371/journal.pone.0248861. eCollection 2021. PLoS One. 2021. PMID: 33780482 Free PMC article.
-
Molecular modeling of the Plasmodium falciparum pre-mRNA splicing and nuclear export factor PfU52.Protein J. 2014 Aug;33(4):354-68. doi: 10.1007/s10930-014-9566-x. Protein J. 2014. PMID: 24861003
-
Combine Cryo-EM Density Map and Residue Contact for Protein Structure Prediction - A Case Study.ACM BCB. 2020 Sep;2020:110. doi: 10.1145/3388440.3414708. ACM BCB. 2020. PMID: 35838376 Free PMC article.
-
In Silico Analysis of a Drosophila Parasitoid Venom Peptide Reveals Prevalence of the Cation-Polar-Cation Clip Motif in Knottin Proteins.Pathogens. 2023 Jan 14;12(1):143. doi: 10.3390/pathogens12010143. Pathogens. 2023. PMID: 36678491 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Miscellaneous