Algorithms for incorporating prior topological information in HMMs: application to transmembrane proteins

Pantelis G Bagos¹, Theodore D Liakopoulos, Stavros J Hamodrakas

Affiliations

PMID: 16597327
PMCID: PMC1523218
DOI: 10.1186/1471-2105-7-189

Algorithms for incorporating prior topological information in HMMs: application to transmembrane proteins

Pantelis G Bagos et al. BMC Bioinformatics. 2006.

. 2006 Apr 5:7:189.

doi: 10.1186/1471-2105-7-189.

Authors

Pantelis G Bagos¹, Theodore D Liakopoulos, Stavros J Hamodrakas

Affiliation

¹ Department of Cell Biology and Biophysics, Faculty of Biology, University of Athens, Athens 157 01, Greece. pbagos@biol.uoa.gr

PMID: 16597327
PMCID: PMC1523218
DOI: 10.1186/1471-2105-7-189

Abstract

Background: Hidden Markov Models (HMMs) have been extensively used in computational molecular biology, for modelling protein and nucleic acid sequences. In many applications, such as transmembrane protein topology prediction, the incorporation of limited amount of information regarding the topology, arising from biochemical experiments, has been proved a very useful strategy that increased remarkably the performance of even the top-scoring methods. However, no clear and formal explanation of the algorithms that retains the probabilistic interpretation of the models has been presented so far in the literature.

Results: We present here, a simple method that allows incorporation of prior topological information concerning the sequences at hand, while at the same time the HMMs retain their full probabilistic interpretation in terms of conditional probabilities. We present modifications to the standard Forward and Backward algorithms of HMMs and we also show explicitly, how reliable predictions may arise by these modifications, using all the algorithms currently available for decoding HMMs. A similar procedure may be used in the training procedure, aiming at optimizing the labels of the HMM's classes, especially in cases such as transmembrane proteins where the labels of the membrane-spanning segments are inherently misplaced. We present an application of this approach developing a method to predict the transmembrane regions of alpha-helical membrane proteins, trained on crystallographically solved data. We show that this method compares well against already established algorithms presented in the literature, and it is extremely useful in practical applications.

Conclusion: The algorithms presented here, are easily implemented in any kind of a Hidden Markov Model, whereas the prediction method (HMM-TM) is freely available for academic users at http://bioinformatics.biol.uoa.gr/HMM-TM, offering the most advanced decoding options currently available.

PubMed Disclaimer

Figures

**Figure 1**
**Posterior probability plots and predicted transmembrane segments for a protein whose localisation of the C-terminal was missed by HMM-TM (YDGG_ECOLI)**. In the upper graph we can see the unconstrained prediction. In the lower part, we can see the conditional prediction, after incorporating the information concerning the experimentally verified localisation of the C-terminus. The red bars indicate the predicted transmembrane segments, and we observe that these change also, coming in agreement with the other predictors.

**Figure 2**
**Posterior probability plots and predicted transmembrane segments for the multidrug efflux transporter AcrB, a protein with known 3-dimensional structure** (PDB code: 1IWG). In the upper graph we can see the unconstrained prediction. The red bars indicate the predicted transmembrane segments whereas the black bars, the observed segments. There are two missed transmembrane helices and a falsely predicted one. In the lower part, we can see the constrained prediction, after incorporating the experimental information derived from cysteine-scanning mutagenesis experiments [46]. Green arrows indicate the experimentally verified localisation of a residue in the cytoplasm, whereas blue ones indicate the experimentally verified localisation to the extracellular (periplasmic) space. We observe a remarkable agreement of the constrained prediction with the known structure.

**Figure 3**
**A representation of the matrix produced by the forward algorithm modified to incorporate some prior information**. We have a (hypothetical) model, which consists of 12 states, with 3 labels I, M, O corresponding respectively to states modelling the intracellular, transmembrane and extracellular parts of the sequence. The likelihood of sequence x (8 residues), is calculated incorporating the prior information that residues 3 and 4 are transmembrane, residue 1 is extracellular and residue 8 is intracellular.

**Figure 4**
**A schematic representation of the model's architecture**. The model consists of three sub-models denoted by the labels: Cytoplasmic loop, Transmembrane Helix and Extracellular loop. Within each sub-model, states with the same shape, size and colour are sharing the same emission probabilities (parameter tying). Allowed transitions are indicated with arrows.

See this image and copyright information in PMC

References

1. Rabiner LR. A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE. 1989;77: 257–286.
1. Durbin R, Eddy SR, Krogh A, Mithison G. Biological sequence analysis, probabilistic models of proteins and nucleic acids. Cambridge University Press; 1998.
1. Krogh A, Mian IS, Haussler D. A hidden Markov model that finds genes in E. coli DNA. Nucleic Acids Res. 1994;22:4768–4778. - PMC - PubMed
1. Eddy SR. Multiple alignment using hidden Markov models. Proc Int Conf Intell Syst Mol Biol. 1995;3:114–120. - PubMed
1. Eddy SR. Profile hidden Markov models. Bioinformatics. 1998;14:755–763. - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Algorithms for incorporating prior topological information in HMMs: application to transmembrane proteins

Affiliation

Algorithms for incorporating prior topological information in HMMs: application to transmembrane proteins

Authors

Affiliation

Abstract

Figures

References

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Molecular Biology Databases