Bayesian models and Markov chain Monte Carlo methods for protein motifs with the secondary characteristics
- PMID: 16201915
- DOI: 10.1089/cmb.2005.12.952
Bayesian models and Markov chain Monte Carlo methods for protein motifs with the secondary characteristics
Abstract
Statistical methods have been developed for finding local patterns, also called motifs, in multiple protein sequences. The aligned segments may imply functional or structural core regions. However, the existing methods often have difficulties in aligning multiple proteins when sequence residue identities are low (e.g., less than 25%). In this article, we develop a Bayesian model and Markov chain Monte Carlo (MCMC) methods for identifying subtle motifs in protein sequences. Specifically, a motif is defined not only in terms of specific sites characterized by amino acid frequency vectors, but also as a combination of secondary characteristics such as hydrophobicity, polarity, etc. Markov chain Monte Carlo methods are proposed to search for a motif pattern with high posterior probability under the new model. A special MCMC algorithm is developed, involving transitions between state spaces of different dimensions. The proposed methods were supported by a simulated study. It was then tested by two real datasets, including a group of helix-turn-helix proteins, and one set from the CATH Protein Structure Classification Database. Statistical comparisons showed that the new approach worked better than a typical Gibbs sampling approach which is based only on an amino acid model.
Similar articles
-
Gapped alignment of protein sequence motifs through Monte Carlo optimization of a hidden Markov model.BMC Bioinformatics. 2004 Oct 25;5:157. doi: 10.1186/1471-2105-5-157. BMC Bioinformatics. 2004. PMID: 15504234 Free PMC article.
-
Bayesian restoration of a hidden Markov chain with applications to DNA sequencing.J Comput Biol. 1999 Summer;6(2):261-77. doi: 10.1089/cmb.1999.6.261. J Comput Biol. 1999. PMID: 10421527
-
Prediction of protein interdomain linker regions by a hidden Markov model.Bioinformatics. 2005 May 15;21(10):2264-70. doi: 10.1093/bioinformatics/bti363. Epub 2005 Mar 3. Bioinformatics. 2005. PMID: 15746283
-
Bayesian and Markov chain Monte Carlo methods for identifying nonlinear systems in the presence of uncertainty.Philos Trans A Math Phys Eng Sci. 2015 Sep 28;373(2051):20140405. doi: 10.1098/rsta.2014.0405. Philos Trans A Math Phys Eng Sci. 2015. PMID: 26303916 Free PMC article. Review.
-
Developments of inverse analysis by Kalman filters and Bayesian methods applied to geotechnical engineering.Proc Jpn Acad Ser B Phys Biol Sci. 2023;99(9):352-388. doi: 10.2183/pjab.99.023. Proc Jpn Acad Ser B Phys Biol Sci. 2023. PMID: 37952976 Free PMC article. Review.
Cited by
-
An analysis of single amino acid repeats as use case for application specific background models.BMC Bioinformatics. 2011 May 19;12:173. doi: 10.1186/1471-2105-12-173. BMC Bioinformatics. 2011. PMID: 21595908 Free PMC article.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources