nips nips2007 nips2007-78 nips2007-78-reference knowledge-graph by maker-knowledge-mining

78 nips-2007-Efficient Principled Learning of Thin Junction Trees

Source: pdf

Author: Anton Chechetka, Carlos Guestrin

Abstract: We present the ﬁrst truly polynomial algorithm for PAC-learning the structure of bounded-treewidth junction trees – an attractive subclass of probabilistic graphical models that permits both the compact representation of probability distributions and efﬁcient exact inference. For a constant treewidth, our algorithm has polynomial time and sample complexity. If a junction tree with sufﬁciently strong intraclique dependencies exists, we provide strong theoretical guarantees in terms of KL divergence of the result from the true distribution. We also present a lazy extension of our approach that leads to very signiﬁcant speed ups in practice, and demonstrate the viability of our method empirically, on several real world datasets. One of our key new theoretical insights is a method for bounding the conditional mutual information of arbitrarily large sets of variables with only polynomially many mutual information computations on ﬁxed-size subsets of variables, if the underlying distribution can be approximated by a bounded-treewidth junction tree. 1

reference text

[1] P. Abbeel, D. Koller, and A. Y. Ng. Learning factor graphs in polynomial time and sample complexity. JMLR, 7, 2006.

[2] S. Arnborg, D. G. Corneil, and A. Proskurowski. Complexity of ﬁnding embeddings in a k-tree. SIAM Journal on Algebraic and Discrete Methods, 8(2):277–284, 1987.

[3] F. R. Bach and M. I. Jordan. Thin junction trees. In NIPS, 2002.

[4] I. Beinlich, J. Suermondt, M. Chavez, and G. Cooper. The ALARM monitoring system: A case study with two probablistic inference techniques for belief networks. In Euro. Conf. on AI in Medicine, 1988.

[5] A. Choi, H. Chan, and A. Darwiche. On Bayesian network approximation by edge deletion. In UAI, 2005.

[6] C. Chow and C. Liu. Approximating discrete probability distributions with dependence trees. IEEE Transactions on Information Theory, 14(3):462–467, 1968.

[7] R. G. Cowell, P. A. Dawid, S. L. Lauritzen, and D. J. Spiegelhalter. Probabilistic Networks and Expert Systems (Information Science and Statistics). Springer, May 2003.

[8] A. Deshpande, C. Guestrin, S. Madden, J. Hellerstein, and W. Hong. Model-driven data acquisition in sensor networks. In VLDB, 2004.

[9] K. U. H¨ ffgen. Learning and robust learning of product distributions. In COLT, 1993. o

[10] D. Karger and N. Srebro. Learning Markov networks: Maximum bounded tree-width graphs. SODA-01.

[11] A. Krause and C. Guestrin. Near-optimal nonmyopic value of information in graphical models. UAI-05.

[12] M. Meil˘ and M. I. Jordan. Learning with mixtures of trees. JMLR, 1:1–48, 2001. a

[13] M. Narasimhan and J. Bilmes. PAC-learning bounded tree-width graphical models. In UAI, 2004.

[14] M. Queyranne. Minimizing symmetric submodular functions. Math. Programming, 82(1):3–12, 1998.

[15] A. Singh and A. Moore. Finding optimal Bayesian networks by dynamic programming. Technical Report CMU-CALD-05-106, Carnegie Mellon University, Center for Automated Learning and Discovery, 2005.

[16] P. Spirtes, C. Glymour, and R. Scheines. Causation, Prediction, and Search. MIT Press, 2001.

[17] M. Teyssier and D. Koller. Ordering-based search: A simple and effective algorithm for learning Bayesian networks. In UAI, 2005. 8