Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Mar 12:11:126.
doi: 10.1186/1471-2105-11-126.

Mocapy++--a toolkit for inference and learning in dynamic Bayesian networks

Affiliations

Mocapy++--a toolkit for inference and learning in dynamic Bayesian networks

Martin Paluszewski et al. BMC Bioinformatics. .

Abstract

Background: Mocapy++ is a toolkit for parameter learning and inference in dynamic Bayesian networks (DBNs). It supports a wide range of DBN architectures and probability distributions, including distributions from directional statistics (the statistics of angles, directions and orientations).

Results: The program package is freely available under the GNU General Public Licence (GPL) from SourceForge http://sourceforge.net/projects/mocapy. The package contains the source for building the Mocapy++ library, several usage examples and the user manual.

Conclusions: Mocapy++ is especially suitable for constructing probabilistic models of biomolecular structure, due to its support for directional statistics. In particular, it supports the Kent distribution on the sphere and the bivariate von Mises distribution on the torus. These distributions have proven useful to formulate probabilistic models of protein and RNA structure in atomic detail.

PubMed Disclaimer

Figures

Figure 1
Figure 1
BARNACLE: a probabilistic model of RNA structure. A DBN with nine slices is shown, of which one slice is boxed. Nodes D and H are discrete nodes, while node A is a univariate von Mises node. The dihedral angles within one nucleotide i are labelled αi to ζi. BARNACLE is a probabilistic model of the dihedral angles in a stretch of RNA [12].
Figure 2
Figure 2
Samples from three Kent distributions on the sphere. The red points were sampled from a distribution with high concentration and high correlation (κ = 1000, β = 499), the green points were sampled from a distribution with low concentration and no correlation (κ = 10, β = 0), and the blue points were sampled from a distribution with medium concentration and medium correlation (κ = 200, β = 50). The distributions underlying the red and green points have the same mean direction and axes and illustrate the effect of κ and β. For each distribution, 5000 points are sampled.
Figure 3
Figure 3
Samples from three bivariate von Mises distributions on the torus. The green points were sampled from a distribution with high concentration and no correlation (κ1 = κ2 = 100, κ3 = 0), the blue points were sampled from a distribution with high concentration and negative correlation (κ1 = κ2 = 100, κ3 = 49), and the red points were sampled from a distribution with low concentration and no correlation (κ1 = κ2 = 10, κ3 = 0). For each distribution, 10000 points are sampled.
Figure 4
Figure 4
The model used in the third benchmark. Each slice contains two hidden nodes (H and I). They are parents to a multivariate four-dimensional Gaussian node (G) and a bivariate von Mises node (V), respectively. The sizes of H and I are five and three, respectively. The length of the BN is 50 slices.
Figure 5
Figure 5
Log-likelihood evolution during S-EM training. Each column shows the evolution of the log-likelihood for one of the three benchmarks described in the results section. The training procedure was started from two different random seeds (indicated by a solid and a dashed line). The log-likelihood values, log P (D|Hn, θn), used in the upper figures are conditional on the states of the sampled hidden nodes (θn are the parameter values at iteration n, Hn are the hidden node values at iteration n and D is the observed data). The log-likelihood values in the lower figures, log P (D|θn), are computed by summing over all hidden node sequences using the forward algorithm [5]. Note that the forward algorithm can only be used on HMMs and is therefore not applied on the complex benchmark.

Similar articles

Cited by

References

    1. Bishop CM. Pattern recognition and machine learning. Springer; 2006.
    1. Pearl J. Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann; 1997.
    1. Ghahramani Z. Learning dynamic Bayesian networks. Lect Notes Comp Sci. 1998;1387:168–197. full_text.
    1. Rabiner LR. A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE. 1989;77(2):257–286. doi: 10.1109/5.18626. - DOI
    1. Durbin R, Eddy SR, Krogh A, Mitchison G. Biological sequence analysis. Cambridge University Press; 1999.

Publication types