Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2006 Dec 13:6:25.
doi: 10.1186/1472-6807-6-25.

Analysis of an optimal hidden Markov model for secondary structure prediction

Affiliations

Analysis of an optimal hidden Markov model for secondary structure prediction

Juliette Martin et al. BMC Struct Biol. .

Abstract

Background: Secondary structure prediction is a useful first step toward 3D structure prediction. A number of successful secondary structure prediction methods use neural networks, but unfortunately, neural networks are not intuitively interpretable. On the contrary, hidden Markov models are graphical interpretable models. Moreover, they have been successfully used in many bioinformatic applications. Because they offer a strong statistical background and allow model interpretation, we propose a method based on hidden Markov models.

Results: Our HMM is designed without prior knowledge. It is chosen within a collection of models of increasing size, using statistical and accuracy criteria. The resulting model has 36 hidden states: 15 that model alpha-helices, 12 that model coil and 9 that model beta-strands. Connections between hidden states and state emission probabilities reflect the organization of protein structures into secondary structure segments. We start by analyzing the model features and see how it offers a new vision of local structures. We then use it for secondary structure prediction. Our model appears to be very efficient on single sequences, with a Q3 score of 68.8%, more than one point above PSIPRED prediction on single sequences. A straightforward extension of the method allows the use of multiple sequence alignments, rising the Q3 score to 75.5%.

Conclusion: The hidden Markov model presented here achieves valuable prediction results using only a limited number of parameters. It provides an interpretable framework for protein secondary structure architecture. Furthermore, it can be used as a tool for generating protein sequences with a given secondary structure content.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Final 36 hidden states HMM learned using DSSP assignment. Upper part: hidden state graph. Only transitions associated with probabilities greater than 0.1 are shown. The larger the transition probabilities the thicker the arrows. States are colored according to their amino acid preference (hydrophobic versus hydrophilic). Purple state indicates no strong amino acid preference and red states strongly favor glycine. The two groups of coil states (c1, c6, c12, c5, c4) in green and (c3, c2, c8, c10, c9) in red are discussed in the text. For periodic secondary structures, helix and strand, the entry and exit states are indicated by different symbols. Lower part: amino-acid propensities of each hidden state. Propensities are measured by log-odd scores. The propensity score of amino-acid a for state s is given by: S=log2P(a|s)f(a) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGtbWucqGH9aqpieGacqWFSbaBcqWFVbWBcqWFNbWzdaWgaaWcbaGaeGOmaidabeaakmaalaaabaGaemiuaaLaeiikaGIaemyyaeMaeiiFaWNaem4CamNaeiykaKcabaGaemOzayMaeiikaGIaemyyaeMaeiykaKcaaaaa@3F9E@, with P(a | s) the emission probability of amino-acid a in state s and f(a) the background frequency of a in the dataset. A score equal to 1 means that the amino-acid is twice as frequent in state s as in the whole dataset.
Figure 2
Figure 2
Principal component analysis of the association between hidden states and loop type. Data are obtained from the Viterbi decoding using secondary structure labeling on the cross-validation data.
Figure 3
Figure 3
Q3 obtained for each protein of the EVA 212 dataset, by PSIPRED and OSS-HMM. Globular proteins are shown as triangles and membrane proteins as crosses. Proteins shorter than 50 residues are indicated with gray symbols. The diagonal where both PSIPRED and OSS-HMM Q3s are equal is shown as a dashed line. OSS-HMM refers to the HMM presented in this article.
Figure 4
Figure 4
Q3 score as a function of the posterior probability value. The Q3 score is computed on the subset of residues that are predicted with probabilities in given ranges. The distribution of residues in the probability ranges is shown as a gray bar-plot. The right axis is related to this distribution.
Figure 5
Figure 5
Example of a multiple sequence prediction for d1jyoa_. Each plot represents the posterior probabilities of α-helix, β-strand and coil as a function of the position in the sequence, with the color scheme : magenta = helix, green = strand, grey = coil. "Query" indicates the sequence of the initial sequence d1jyoa_ and "sequence 1" to "sequence 7" are the homologuous sequences retrieved by PSI-BLAST. The Henikoff weight of each sequence is indicated on each plot. "Consensus" indicate the consensus prediction for the sequence family. The predicted secondary structure in each case is shown as a colored bar in the upper part of each plot. The observed secondary structure of d1jyoa_ is plotted in the lower part of the figure. Ellipses focus on zones of the secondary structures that are modified in the prediction using only the query sequence and in the prediction using the multiple sequence alignment.

Similar articles

Cited by

References

    1. Karchin R, Cline M, Mandel-Gutfreund Y, Karplus K. Hidden Markov models that use predicted local structure for fold recognition: alphabets of backbone geometry. Proteins. 2003;51:504–14. doi: 10.1002/prot.10369. - DOI - PubMed
    1. Bradley P, Malmstrom L, Qian B, Schonbrun J, Chivian D, Kim DE, Meiler J, Misura KMS, Baker D. Free modeling with Rosetta in CASP6. Proteins. 2005;61:128–134. doi: 10.1002/prot.20729. - DOI - PubMed
    1. Koh I, Eyrich V, Marti-Renom M, Przybylski D, Madhusudhan M, Eswar N, Grana O, Pazos F, Valencia A, Sali A, Rost B. EVA: evaluation of protein structure prediction servers. Nucleic Acids Res. 2003;31:3311–5. doi: 10.1093/nar/gkg619. - DOI - PMC - PubMed
    1. Jones D. Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol. 1999;292:195–202. doi: 10.1006/jmbi.1999.3091. - DOI - PubMed
    1. Przybylski D, Rost B. Alignments grow, secondary structure prediction improves. Proteins. 2002;46:197–205. doi: 10.1002/prot.10029. - DOI - PubMed

Publication types

LinkOut - more resources