A comparison of position-specific score matrices based on sequence and structure alignments
- PMID: 11790846
- PMCID: PMC2373449
- DOI: 10.1110/ps.19902
A comparison of position-specific score matrices based on sequence and structure alignments
Abstract
Sequence comparison methods based on position-specific score matrices (PSSMs) have proven a useful tool for recognition of the divergent members of a protein family and for annotation of functional sites. Here we investigate one of the factors that affects overall performance of PSSMs in a PSI-BLAST search, the algorithm used to construct the seed alignment upon which the PSSM is based. We compare PSSMs based on alignments constructed by global sequence similarity (ClustalW and ClustalW-pairwise), local sequence similarity (BLAST), and local structure similarity (VAST). To assess performance with respect to identification of conserved functional or structural sites, we examine the accuracy of the three-dimensional molecular models predicted by PSSM-sequence alignments. Using the known structures of those sequences as the standard of truth, we find that model accuracy varies with the algorithm used for seed alignment construction in the pattern local-structure (VAST) > local-sequence (BLAST) > global-sequence (ClustalW). Using structural similarity of query and database proteins as the standard of truth, we find that PSSM recognition sensitivity depends primarily on the diversity of the sequences included in the alignment, with an optimum around 30-50% average pairwise identity. We discuss these observations, and suggest a strategy for constructing seed alignments that optimize PSSM-sequence alignment accuracy and recognition sensitivity.
Figures










Similar articles
-
PSSM-based prediction of DNA binding sites in proteins.BMC Bioinformatics. 2005 Feb 19;6:33. doi: 10.1186/1471-2105-6-33. BMC Bioinformatics. 2005. PMID: 15720719 Free PMC article.
-
Use of multiple profiles corresponding to a sequence alignment enables effective detection of remote homologues.Bioinformatics. 2005 Jun 15;21(12):2821-6. doi: 10.1093/bioinformatics/bti432. Epub 2005 Apr 7. Bioinformatics. 2005. PMID: 15817691
-
Accuracy of structure-based sequence alignment of automatic methods.BMC Bioinformatics. 2007 Sep 20;8:355. doi: 10.1186/1471-2105-8-355. BMC Bioinformatics. 2007. PMID: 17883866 Free PMC article.
-
Sequence Similarity Searching.Curr Protoc Protein Sci. 2019 Feb;95(1):e71. doi: 10.1002/cpps.71. Epub 2018 Aug 13. Curr Protoc Protein Sci. 2019. PMID: 30102464 Review.
-
Pairwise statistical significance and empirical determination of effective gap opening penalties for protein local sequence alignment.Int J Comput Biol Drug Des. 2008;1(4):347-67. doi: 10.1504/ijcbdd.2008.022207. Int J Comput Biol Drug Des. 2008. PMID: 20063463 Review.
Cited by
-
Computational protein design: validation and possible relevance as a tool for homology searching and fold recognition.PLoS One. 2010 May 5;5(5):e10410. doi: 10.1371/journal.pone.0010410. PLoS One. 2010. PMID: 20463972 Free PMC article.
-
Finding weak similarities between proteins by sequence profile comparison.Nucleic Acids Res. 2003 Jan 15;31(2):683-9. doi: 10.1093/nar/gkg154. Nucleic Acids Res. 2003. PMID: 12527777 Free PMC article.
-
Fold recognition by combining sequence profiles derived from evolution and from depth-dependent structural alignment of fragments.Proteins. 2005 Feb 1;58(2):321-8. doi: 10.1002/prot.20308. Proteins. 2005. PMID: 15523666 Free PMC article.
-
Exploring the Promoter Generation and Prediction of Halomonas spp. Based on GAN and Multi-Model Fusion Methods.Int J Mol Sci. 2024 Dec 6;25(23):13137. doi: 10.3390/ijms252313137. Int J Mol Sci. 2024. PMID: 39684846 Free PMC article.
-
Prediction of functional sites by analysis of sequence and structure conservation.Protein Sci. 2004 Apr;13(4):884-92. doi: 10.1110/ps.03465504. Epub 2004 Mar 9. Protein Sci. 2004. PMID: 15010543 Free PMC article.
References
-
- Aravind, L. and Koonin, E.V. 1999. Gleaning non-trivial structural, functional and evolutionary information about proteins by iterative database searches. J. Mol. Biol. 287 1023–1040. - PubMed
-
- Berman, H.M., Bhat, T.N., Bourne, P.E., Feng, Z., Gilliland, G., Weissig, H., and Westbrook, J. 2000. The Protein Data Bank and the challenge of structural genomics. Nat. Struct. Biol. (Suppl.) 7 957–959. - PubMed
-
- Chambers, J.M. (1998). Programming with data. A guide to the S language. Springer-Verlag, New York.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Research Materials