Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 May 30:5:37-44.
doi: 10.2142/biophysics.5.37. eCollection 2009.

Profile conditional random fields for modeling protein families with structural information

Affiliations

Profile conditional random fields for modeling protein families with structural information

Akira R Kinjo. Biophysics (Nagoya-shi). .

Abstract

A statistical model of protein families, called profile conditional random fields (CRFs), is proposed. This model may be regarded as an integration of the profile hidden Markov model (HMM) and the Finkelstein-Reva (FR) theory of protein folding. While the model structure of the profile CRF is almost identical to the profile HMM, it can incorporate arbitrary correlations in the sequences to be aligned to the model. In addition, like in the FR theory, the profile CRF can incorporate long-range pair-wise interactions between model states via mean-field-like approximations. We give the detailed formulation of the model, self-consistent approximations for treating long-range interactions, and algorithms for computing partition functions and marginal probabilities. We also outline the methods for the global optimization of model parameters as well as a Bayesian framework for parameter learning and selection of optimal alignments.

Keywords: dynamic programming; fold recognition; mean field approximation; sequence analysis; structure prediction.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The model structure of a profile conditional random field (CRF). Squares, diamonds, and circles are matching, insertion, and deletion states, respectively. The start and end states are labeled with “S” and “E” in the squares.

Similar articles

Cited by

References

    1. Needleman SB, Wunsch CD. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol. 1970;48:443–453. - PubMed
    1. Smith TF, Waterman MS. Identification of common molecular subsequences. J Mol Biol. 1981;147:195–197. - PubMed
    1. Gotoh O. An improved algorithm for matching biological sequences. J Mol Biol. 1982;162:705–708. - PubMed
    1. Durbin R, Eddy R, Krogh A, Mitchison G. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge Univ. Press; Cambridge, U.K: 1999.
    1. Eidhammer I, Jonassen I, Taylor WR. Protein bioinformatics. Wiley & Sons; Chichester, England: 2004.