Using 3D Hidden Markov Models that explicitly represent spatial coordinates to model and compare protein structures
- PMID: 14715091
- PMCID: PMC344530
- DOI: 10.1186/1471-2105-5-2
Using 3D Hidden Markov Models that explicitly represent spatial coordinates to model and compare protein structures
Abstract
Background: Hidden Markov Models (HMMs) have proven very useful in computational biology for such applications as sequence pattern matching, gene-finding, and structure prediction. Thus far, however, they have been confined to representing 1D sequence (or the aspects of structure that could be represented by character strings).
Results: We develop an HMM formalism that explicitly uses 3D coordinates in its match states. The match states are modeled by 3D Gaussian distributions centered on the mean coordinate position of each alpha carbon in a large structural alignment. The transition probabilities depend on the spread of the neighboring match states and on the number of gaps found in the structural alignment. We also develop methods for aligning query structures against 3D HMMs and scoring the result probabilistically. For 1D HMMs these tasks are accomplished by the Viterbi and forward algorithms. However, these will not work in unmodified form for the 3D problem, due to non-local quality of structural alignment, so we develop extensions of these algorithms for the 3D case. Several applications of 3D HMMs for protein structure classification are reported. A good separation of scores for different fold families suggests that the described construct is quite useful for protein structure analysis.
Conclusion: We have created a rigorous 3D HMM representation for protein structures and implemented a complete set of routines for building 3D HMMs in C and Perl. The code is freely available from http://www.molmovdb.org/geometry/3dHMM, and at this site we also have a simple prototype server to demonstrate the features of the described approach.
Figures









Similar articles
-
Algorithms for incorporating prior topological information in HMMs: application to transmembrane proteins.BMC Bioinformatics. 2006 Apr 5;7:189. doi: 10.1186/1471-2105-7-189. BMC Bioinformatics. 2006. PMID: 16597327 Free PMC article.
-
Hidden Markov models that use predicted local structure for fold recognition: alphabets of backbone geometry.Proteins. 2003 Jun 1;51(4):504-14. doi: 10.1002/prot.10369. Proteins. 2003. PMID: 12784210
-
Applications of generalized pair hidden Markov models to alignment and gene finding problems.J Comput Biol. 2002;9(2):389-99. doi: 10.1089/10665270252935520. J Comput Biol. 2002. PMID: 12015888
-
Hidden Markov Models for prediction of protein features.Methods Mol Biol. 2008;413:173-98. doi: 10.1007/978-1-59745-574-9_7. Methods Mol Biol. 2008. PMID: 18075166 Review.
-
Five hierarchical levels of sequence-structure correlation in proteins.Appl Bioinformatics. 2004;3(2-3):97-104. doi: 10.2165/00822942-200403020-00004. Appl Bioinformatics. 2004. PMID: 15693735 Review.
Cited by
-
Actin-interacting and flagellar proteins in Leishmania spp.: Bioinformatics predictions to functional assignments in phagosome formation.Genet Mol Biol. 2009 Jul;32(3):652-65. doi: 10.1590/S1415-47572009000300033. Epub 2009 Sep 1. Genet Mol Biol. 2009. PMID: 21637533 Free PMC article.
-
A discriminative method for family-based protein remote homology detection that combines inductive logic programming and propositional models.BMC Bioinformatics. 2011 Mar 23;12:83. doi: 10.1186/1471-2105-12-83. BMC Bioinformatics. 2011. PMID: 21429187 Free PMC article.
-
Gaussian-weighted RMSD superposition of proteins: a structural comparison for flexible proteins and predicted protein structures.Biophys J. 2006 Jun 15;90(12):4558-73. doi: 10.1529/biophysj.105.066654. Epub 2006 Mar 24. Biophys J. 2006. PMID: 16565070 Free PMC article.
-
Superimposition of protein structures with dynamically weighted RMSD.J Mol Model. 2010 Feb;16(2):211-22. doi: 10.1007/s00894-009-0538-6. Epub 2009 Jul 1. J Mol Model. 2010. PMID: 19568776
-
Recent applications of Hidden Markov Models in computational biology.Genomics Proteomics Bioinformatics. 2004 May;2(2):84-96. doi: 10.1016/s1672-0229(04)02014-5. Genomics Proteomics Bioinformatics. 2004. PMID: 15629048 Free PMC article.
References
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources