Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Dec:24:738-746.

Learning unbelievable probabilities

Affiliations

Learning unbelievable probabilities

Xaq Pitkow et al. Adv Neural Inf Process Syst. 2011 Dec.

Abstract

Loopy belief propagation performs approximate inference on graphical models with loops. One might hope to compensate for the approximation by adjusting model parameters. Learning algorithms for this purpose have been explored previously, and the claim has been made that every set of locally consistent marginals can arise from belief propagation run on a graphical model. On the contrary, here we show that many probability distributions have marginals that cannot be reached by belief propagation using any set of model parameters or any learning algorithm. We call such marginals 'unbelievable.' This problem occurs whenever the Hessian of the Bethe free energy is not positive-definite at the target marginals. All learning algorithms for belief propagation necessarily fail in these cases, producing beliefs or sets of beliefs that may even be worse than the pre-learning approximation. We then show that averaging inaccurate beliefs, each obtained from belief propagation using model parameters perturbed about some learned mean values, can achieve the unbelievable marginals.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Landscape of Bethe free energy for the binary graphical model with pairwise interactions. (A) A slice through the Bethe free energy (solid lines) along one axis v1 of pseudomarginal space, for three different values of parameters θ. The energy U is linear in the pseudomarginals (dotted lines), so varying the parameters only changes the tilt of the free energy. This can add or remove local minima. (B) The second derivatives of the free energies in (A) are all identical. Where the second derivative is positive, a local minimum can exist (cyan); where it is negative (yellow), no parameters can produce a local minimum. (C) A two-dimensional slice of the Bethe free energy, colored according to the minimum eigenvalue λmin of the Bethe Hessian. During a run of Bethe wake-sleep learning, the beliefs (blue dots) proceed along v2 toward the target marginals p. Stable fixed points of BP can exist only in the believable region (cyan), but the target p resides in an unbelievable region (yellow). As learning equilibrates, the fixed points jump between believable regions on either side of the unbelievable zone.
Figure 2
Figure 2
Averaging over variable couplings can produce marginals otherwise unreachable by belief propagation. (A) As learning proceeds, the Bethe wake-sleep algorithm causes parameters θ to converge on a discrete limit cycle when attempting to learn unbelievable marginals. (B) The same limit cycle, projected onto their first two principal components u1 and u2 of θ during the cycle. (C) The corresponding beliefs b during the limit cycle (blue circles), projected onto the first two principal components v1 and v2 of the trajectory through pseudomarginal space. Believable regions of pseudomarginal space are colored with cyan and the unbelievable regions with yellow, and inconsistent pseudomarginals are black. Over the limit cycle, the average beliefs (blue ×) are precisely equal to the target marginals p (black □). The average (red +) over many fixed points of BP (red dots) generated from randomly perturbed parameters θ̄ + δθ still produces a better approximation of the target marginals than any of the individual believable fixed points. (D) Even the best amongst several BP fixed points cannot match unbelievable marginals (black and grey). Ensemble BP leads to much improved performance (red and pink).
Figure 3
Figure 3
Performance in learning unbelievable marginals. (A) Fraction of marginals that are unbelievable. Marginals were generated from fully connected, 8-node binary models with random biases and pairwise couplings, hi~N(0,13) and Jij ~ 𝒩(0, σJ). (B,C) Performance of five models on 370 unbelievable random target marginals (Section 3), measured with Bethe divergence Dβ[p||b] (B) and Euclidean distance |pb| (C). Target were generated as in (A) with σJ=13, and selected for unbelievability. Bars represent central quartiles, and white line indicates the median. The five models are: (i) BP on the graphical model that generated the target distribution, (ii) BP after parameters are set by pseudomoment matching, (iii) the beliefs with the best performance encountered during Bethe wake-sleep learning, (iv) eBP using exact parameters from the last 100 iterations of learning, and (v) eBP with gaussian-distributed parameters with the same first- and second-order statistics as iv.

References

    1. Cooper G. The computational complexity of probabilistic inference using bayesian belief networks. Artificial intelligence. 1990;42:393–405.
    1. Pearl J. Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann Publishers; San Mateo CA: 1988.
    1. Kschischang F, Frey B, Loeliger H. Factor graphs and the sum-product algorithm. IEEE Transactions on Information Theory. 2001;47:498–519.
    1. Bishop C. Pattern recognition and machine learning. Springer; New York: 2006.
    1. Wainwright M, Jordan M. Graphical models, exponential families, and variational inference. Foundations and Trends in Machine Learning. 2008;1:1–305.

LinkOut - more resources