Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2004 Jan 9:5:2.
doi: 10.1186/1471-2105-5-2.

Using 3D Hidden Markov Models that explicitly represent spatial coordinates to model and compare protein structures

Affiliations
Comparative Study

Using 3D Hidden Markov Models that explicitly represent spatial coordinates to model and compare protein structures

Vadim Alexandrov et al. BMC Bioinformatics. .

Abstract

Background: Hidden Markov Models (HMMs) have proven very useful in computational biology for such applications as sequence pattern matching, gene-finding, and structure prediction. Thus far, however, they have been confined to representing 1D sequence (or the aspects of structure that could be represented by character strings).

Results: We develop an HMM formalism that explicitly uses 3D coordinates in its match states. The match states are modeled by 3D Gaussian distributions centered on the mean coordinate position of each alpha carbon in a large structural alignment. The transition probabilities depend on the spread of the neighboring match states and on the number of gaps found in the structural alignment. We also develop methods for aligning query structures against 3D HMMs and scoring the result probabilistically. For 1D HMMs these tasks are accomplished by the Viterbi and forward algorithms. However, these will not work in unmodified form for the 3D problem, due to non-local quality of structural alignment, so we develop extensions of these algorithms for the 3D case. Several applications of 3D HMMs for protein structure classification are reported. A good separation of scores for different fold families suggests that the described construct is quite useful for protein structure analysis.

Conclusion: We have created a rigorous 3D HMM representation for protein structures and implemented a complete set of routines for building 3D HMMs in C and Perl. The code is freely available from http://www.molmovdb.org/geometry/3dHMM, and at this site we also have a simple prototype server to demonstrate the features of the described approach.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Typical 1D HMM topology (adapted from [7]). Squares, diamonds and circles represent match (Mk), insert (Ik) and delete (Dk) states, respectively. Arrows indicate state-to-state transitions, which may occur according to the corresponding transition probabilities.
Figure 2
Figure 2
Structural alignment of two protein backbones (PDB ids: 1ECD.pdb and 1HLB.pdb). Aligned parts are shown in yellow.
Figure 3
Figure 3
Discretization of the coordinate probability distribution in one dimension.
Figure 4
Figure 4
Separation of scores for IgV (red) and globin (blue) domains scored against globin 3D HMM.
Figure 5
Figure 5
Histogram of RMSD values for globin domains (yellow) and IgV domains (blue) calculated for the alignment of these domains against the IgV core. RMSD values were calculated for the alignment of these domains against the globin core. RMSD histogram scores for globin domains are shown in yellow, for IgV domains in blue and their overlap in green.
Figure 6
Figure 6
Separation of scores for NAD(P)-binding domains (red) and FAD-binding domains (blue) scored against FAD 3D HMM.
Figure 7
Figure 7
Separation of scores for Thioredoxin domains (red) and Flavodoxin domains (blue) scored against Thioredoxin 3D HMM.
Figure 8
Figure 8
Separation of scores for Lysozyme domains (red) and Ferrodoxin domains (blue) scored against Lysozyme 3D HMM.
Figure 9
Figure 9
Separation of scores for Thioredoxin (red) and all other less than 95% identical SCOP domains (blue) scored against Thioredoxin 3D HMM.

Similar articles

Cited by

References

    1. Krogh A, Brown M, Mian IS, Sjolander K, Haussler D. Hidden Markov models in computational biology. Applications to protein modeling. J Mol Biol. 1994;235:1501–1531. doi: 10.1006/jmbi.1994.1104. - DOI - PubMed
    1. Rabiner LR. A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of the IEEE. 1989;77:257–285. doi: 10.1109/5.18626. - DOI
    1. Gribskov M, Lüthy R, Eisenberg D. Profile Analysis. Meth Enz. 1990;183:146–159. - PubMed
    1. Teichmann S, Park J, Chothia C. Structural assignments to the proteins of Mycoplasma genitalium show that they have been formed by extensive gene duplications and domain rearrangements. Proc Natl Acad Sci. 1998;95:14658–14663. doi: 10.1073/pnas.95.25.14658. - DOI - PMC - PubMed
    1. Reese MG, Kulp D, Tammana H, Haussler D. Genie--gene finding in Drosophila melanogaster [see comments] Genome Res. 2000;10:529–538. doi: 10.1101/gr.10.4.529. - DOI - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources