. 2006 Dec 13:6:25.

doi: 10.1186/1472-6807-6-25.

Analysis of an optimal hidden Markov model for secondary structure prediction

Juliette Martin¹, Jean-François Gibrat, François Rodolphe

Affiliations

Affiliation

¹ INSERM U726, Equipe de Bioinformatique Génomique et Moléculaire Université Denis Diderot Paris 7, 2 place jussieu, 75251 Paris Cedex 05, France. juliette.martin@jouy.inra.fr

PMID: 17166267
PMCID: PMC1769381
DOI: 10.1186/1472-6807-6-25

Analysis of an optimal hidden Markov model for secondary structure prediction

Juliette Martin et al. BMC Struct Biol. 2006.

. 2006 Dec 13:6:25.

doi: 10.1186/1472-6807-6-25.

Authors

Juliette Martin¹, Jean-François Gibrat, François Rodolphe

Affiliation

¹ INSERM U726, Equipe de Bioinformatique Génomique et Moléculaire Université Denis Diderot Paris 7, 2 place jussieu, 75251 Paris Cedex 05, France. juliette.martin@jouy.inra.fr

PMID: 17166267
PMCID: PMC1769381
DOI: 10.1186/1472-6807-6-25

Abstract

Background: Secondary structure prediction is a useful first step toward 3D structure prediction. A number of successful secondary structure prediction methods use neural networks, but unfortunately, neural networks are not intuitively interpretable. On the contrary, hidden Markov models are graphical interpretable models. Moreover, they have been successfully used in many bioinformatic applications. Because they offer a strong statistical background and allow model interpretation, we propose a method based on hidden Markov models.

Results: Our HMM is designed without prior knowledge. It is chosen within a collection of models of increasing size, using statistical and accuracy criteria. The resulting model has 36 hidden states: 15 that model alpha-helices, 12 that model coil and 9 that model beta-strands. Connections between hidden states and state emission probabilities reflect the organization of protein structures into secondary structure segments. We start by analyzing the model features and see how it offers a new vision of local structures. We then use it for secondary structure prediction. Our model appears to be very efficient on single sequences, with a Q3 score of 68.8%, more than one point above PSIPRED prediction on single sequences. A straightforward extension of the method allows the use of multiple sequence alignments, rising the Q3 score to 75.5%.

Conclusion: The hidden Markov model presented here achieves valuable prediction results using only a limited number of parameters. It provides an interpretable framework for protein secondary structure architecture. Furthermore, it can be used as a tool for generating protein sequences with a given secondary structure content.

PubMed Disclaimer

Figures

**Figure 1**
**Final 36 hidden states HMM learned using DSSP assignment**. Upper part: hidden state graph. Only transitions associated with probabilities greater than 0.1 are shown. The larger the transition probabilities the thicker the arrows. States are colored according to their amino acid preference (hydrophobic versus hydrophilic). Purple state indicates no strong amino acid preference and red states strongly favor glycine. The two groups of coil states (c1, c6, c12, c5, c4) in green and (c3, c2, c8, c10, c9) in red are discussed in the text. For periodic secondary structures, helix and strand, the entry and exit states are indicated by different symbols. Lower part: amino-acid propensities of each hidden state. Propensities are measured by log-odd scores. The propensity score of amino-acid a for state s is given by: S=log2P(a|s)f(a) MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGtbWucqGH9aqpieGacqWFSbaBcqWFVbWBcqWFNbWzdaWgaaWcbaGaeGOmaidabeaakmaalaaabaGaemiuaaLaeiikaGIaemyyaeMaeiiFaWNaem4CamNaeiykaKcabaGaemOzayMaeiikaGIaemyyaeMaeiykaKcaaaaa@3F9E@, with P(a | s) the emission probability of amino-acid a in state s and f(a) the background frequency of a in the dataset. A score equal to 1 means that the amino-acid is twice as frequent in state s as in the whole dataset.

**Figure 2**
**Principal component analysis of the association between hidden states and loop type**. Data are obtained from the Viterbi decoding using secondary structure labeling on the cross-validation data.

**Figure 3**
**Q₃obtained for each protein of the EVA 212 dataset, by PSIPRED and OSS-HMM**. Globular proteins are shown as triangles and membrane proteins as crosses. Proteins shorter than 50 residues are indicated with gray symbols. The diagonal where both PSIPRED and OSS-HMM Q₃s are equal is shown as a dashed line. OSS-HMM refers to the HMM presented in this article.

**Figure 4**
**Q₃score as a function of the posterior probability value**. The Q3 score is computed on the subset of residues that are predicted with probabilities in given ranges. The distribution of residues in the probability ranges is shown as a gray bar-plot. The right axis is related to this distribution.

**Figure 5**
**Example of a multiple sequence prediction for d1jyoa_**. Each plot represents the posterior probabilities of α-helix, β-strand and coil as a function of the position in the sequence, with the color scheme : magenta = helix, green = strand, grey = coil. "Query" indicates the sequence of the initial sequence d1jyoa_ and "sequence 1" to "sequence 7" are the homologuous sequences retrieved by PSI-BLAST. The Henikoff weight of each sequence is indicated on each plot. "Consensus" indicate the consensus prediction for the sequence family. The predicted secondary structure in each case is shown as a colored bar in the upper part of each plot. The observed secondary structure of d1jyoa_ is plotted in the lower part of the figure. Ellipses focus on zones of the secondary structures that are modified in the prediction using only the query sequence and in the prediction using the multiple sequence alignment.

See this image and copyright information in PMC

Cited by

Complementarity of the residue-level protein function and structure predictions in human proteins.
Biró B, Zhao B, Kurgan L. Biró B, et al. Comput Struct Biotechnol J. 2022 May 6;20:2223-2234. doi: 10.1016/j.csbj.2022.05.003. eCollection 2022. Comput Struct Biotechnol J. 2022. PMID: 35615015 Free PMC article.
MHTAPred-SS: A Highly Targeted Autoencoder-Driven Deep Multi-Task Learning Framework for Accurate Protein Secondary Structure Prediction.
Feng R, Wang X, Xia Z, Han T, Wang H, Yu W. Feng R, et al. Int J Mol Sci. 2024 Dec 15;25(24):13444. doi: 10.3390/ijms252413444. Int J Mol Sci. 2024. PMID: 39769208 Free PMC article.
An evolutionary method for learning HMM structure: prediction of protein secondary structure.
Won KJ, Hamelryck T, Prügel-Bennett A, Krogh A. Won KJ, et al. BMC Bioinformatics. 2007 Sep 21;8:357. doi: 10.1186/1471-2105-8-357. BMC Bioinformatics. 2007. PMID: 17888163 Free PMC article.
Exact distribution of a pattern in a set of random sequences generated by a Markov source: applications to biological data.
Nuel G, Regad L, Martin J, Camproux AC. Nuel G, et al. Algorithms Mol Biol. 2010 Jan 26;5:15. doi: 10.1186/1748-7188-5-15. Algorithms Mol Biol. 2010. PMID: 20205909 Free PMC article.
Protein secondary structure prediction using a small training set (compact model) combined with a Complex-valued neural network approach.
Rashid S, Saraswathi S, Kloczkowski A, Sundaram S, Kolinski A. Rashid S, et al. BMC Bioinformatics. 2016 Sep 13;17(1):362. doi: 10.1186/s12859-016-1209-0. BMC Bioinformatics. 2016. PMID: 27618812 Free PMC article.

See all "Cited by" articles

References

1. Karchin R, Cline M, Mandel-Gutfreund Y, Karplus K. Hidden Markov models that use predicted local structure for fold recognition: alphabets of backbone geometry. Proteins. 2003;51:504–14. doi: 10.1002/prot.10369. - DOI - PubMed
1. Bradley P, Malmstrom L, Qian B, Schonbrun J, Chivian D, Kim DE, Meiler J, Misura KMS, Baker D. Free modeling with Rosetta in CASP6. Proteins. 2005;61:128–134. doi: 10.1002/prot.20729. - DOI - PubMed
1. Koh I, Eyrich V, Marti-Renom M, Przybylski D, Madhusudhan M, Eswar N, Grana O, Pazos F, Valencia A, Sali A, Rost B. EVA: evaluation of protein structure prediction servers. Nucleic Acids Res. 2003;31:3311–5. doi: 10.1093/nar/gkg619. - DOI - PMC - PubMed
1. Jones D. Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol. 1999;292:195–202. doi: 10.1006/jmbi.1999.3091. - DOI - PubMed
1. Przybylski D, Rost B. Alignments grow, secondary structure prediction improves. Proteins. 2002;46:197–205. doi: 10.1002/prot.10029. - DOI - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Analysis of an optimal hidden Markov model for secondary structure prediction

Affiliation

Analysis of an optimal hidden Markov model for secondary structure prediction

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

LinkOut - more resources

Full Text Sources