nips nips2011 nips2011-158 nips2011-158-reference knowledge-graph by maker-knowledge-mining

158 nips-2011-Learning unbelievable probabilities

Source: pdf

Author: Xaq Pitkow, Yashar Ahmadian, Ken D. Miller

Abstract: Loopy belief propagation performs approximate inference on graphical models with loops. One might hope to compensate for the approximation by adjusting model parameters. Learning algorithms for this purpose have been explored previously, and the claim has been made that every set of locally consistent marginals can arise from belief propagation run on a graphical model. On the contrary, here we show that many probability distributions have marginals that cannot be reached by belief propagation using any set of model parameters or any learning algorithm. We call such marginals ‘unbelievable.’ This problem occurs whenever the Hessian of the Bethe free energy is not positive-deﬁnite at the target marginals. All learning algorithms for belief propagation necessarily fail in these cases, producing beliefs or sets of beliefs that may even be worse than the pre-learning approximation. We then show that averaging inaccurate beliefs, each obtained from belief propagation using model parameters perturbed about some learned mean values, can achieve the unbelievable marginals. 1

reference text

[1] Cooper G (1990) The computational complexity of probabilistic inference using bayesian belief networks. Artiﬁcial intelligence 42: 393–405.

[2] Pearl J (1988) Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann Publishers, San Mateo CA.

[3] Kschischang F, Frey B, Loeliger H (2001) Factor graphs and the sum-product algorithm. IEEE Transactions on Information Theory 47: 498–519.

[4] Bishop C (2006) Pattern recognition and machine learning. Springer New York.

[5] Wainwright M, Jordan M (2008) Graphical models, exponential families, and variational inference. Foundations and Trends in Machine Learning 1: 1–305.

[6] Yedidia JS, Freeman WT, Weiss Y (2000) Generalized belief propagation. In: Advances in Neural Information Processing Systems 13. MIT Press, pp. 689–695.

[7] Heskes T (2003) Stable ﬁxed points of loopy belief propagation are minima of the Bethe free energy. Advances in Neural Information Processing Systems 15: 343–350.

[8] Welling M, Teh Y (2001) Belief optimization for binary networks: A stable alternative to loopy belief propagation. In: Uncertainty in Artiﬁcial Intelligence. Morgan Kaufmann Publishers Inc., pp. 554–561.

[9] Wainwright MJ, Jaakkola TS, Willsky AS (2003) Tree-reweighted belief propagation algorithms and approximate ML estimation by pseudo-moment matching. In: Artiﬁcial Intelligence and Statistics.

[10] Welling M, Teh Y (2003) Approximate inference in Boltzmann machines. Artiﬁcial Intelligence 143: 19–50.

[11] Parise S, Welling M (2005) Learning in markov random ﬁelds: An empirical study. In: Joint Statistical Meeting. volume 4.

[12] Watanabe Y, Fukumizu K (2011) Loopy belief propagation, Bethe free energy and graph zeta function. arXiv cs.AI: 1103.0605v1.

[13] Hinton G, Sejnowski T (1983) Analyzing cooperative computation. Proceedings of the Fifth Annual Cognitive Science Society, Rochester NY .

[14] Welling M, Sutton C (2005) Learning in markov random ﬁelds with contrastive free energies. In: Cowell RG, Ghahramani Z, editors, Artiﬁcial Intelligence and Statistics. pp. 397-404.

[15] Yedidia J, Freeman W, Weiss Y (2005) Constructing free-energy approximations and generalized belief propagation algorithms. IEEE Transactions on Information Theory 51: 2282–2312.

[16] Mooij J, Kappen H (2005) On the properties of the Bethe approximation and loopy belief propagation on binary networks. Journal of Statistical Mechanics: Theory and Experiment 11: P11012.

[17] Mooij J, Kappen H (2005) Validity estimates for loopy belief propagation on binary real-world networks. In: Advances in Neural Information Processing Systems. Cambridge, MA: MIT Press, pp. 945–952.

[18] Heskes T (2004) On the uniqueness of loopy belief propagation ﬁxed points. Neural Computation 16: 2379–2413.

[19] Lafferty J, McCallum A, Pereira F (2001) Conditional random ﬁelds: Probabilistic models for segmenting and labeling sequence data. Proceedings of the 18th International Conference on Machine Learning : 282–289.

[20] Litvak S, Ullman S (2009) Cortical circuitry implementing graphical models. Neural Computation 21: 3010–3056.

[21] Steimer A, Maass W, Douglas R (2009) Belief propagation in networks of spiking neurons. Neural Computation 21: 2502–2523.

[22] Ott T, Stoop R (2007) The neurodynamics of belief propagation on binary markov random ﬁelds. In: Advances in Neural Information Processing Systems 19, Cambridge, MA: MIT Press. pp. 1057–1064.

[23] Shon A, Rao R (2005) Implementing belief propagation in neural circuits. Neurocomputing 65–66: 393– 399.

[24] George D, Hawkins J (2009) Towards a mathematical theory of cortical micro-circuits. PLoS Computational Biology 5: 1–26.

[25] Heinemann U, Globerson A (2011) What cannot be learned with Bethe approximations. In: Uncertainty in Artiﬁcial Intelligence. Corvallis, Oregon: AUAI Press, pp. 319–326. 9