nips nips2010 nips2010-144 nips2010-144-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Vibhav Gogate, William Webb, Pedro Domingos
Abstract: We present an algorithm for learning high-treewidth Markov networks where inference is still tractable. This is made possible by exploiting context-specific independence and determinism in the domain. The class of models our algorithm can learn has the same desirable properties as thin junction trees: polynomial inference, closed-form weight learning, etc., but is much broader. Our algorithm searches for a feature that divides the state space into subspaces where the remaining variables decompose into independent subsets (conditioned on the feature and its negation) and recurses on each subspace/subset of variables until no useful new features can be found. We provide probabilistic performance guarantees for our algorithm under the assumption that the maximum feature length is bounded by a constant k (the treewidth can be much larger) and dependences are of bounded strength. We also propose a greedy version of the algorithm that, while forgoing these guarantees, is much more efficient. Experiments on a variety of domains show that our approach outperforms many state-of-the-art Markov network structure learners. 1
[1] G. Andrew and J. Gao. Scalable training of L1-regularized log-linear models. In Proceedings of the Twenty-Fourth International Conference (ICML), pages 33–40, 2007.
[2] F. R. Bach and M. I. Jordan. Thin junction trees. In Advances in Neural Information Processing Systems, pages 569–576, 2001.
[3] I. Beinlich, J. Suermondt, M. Chavez, and G. Cooper. The alarm monitoring system: A case study with two probablistic inference techniques for belief networks. In European Conference on AI in Medicine, 1988.
[4] J. Besag. Statistical analysis of non-lattice data. The Statistician, 24:179–195, 1975.
[5] C. Blake and C. J. Merz. UCI repository of machine learning databases. Machine-readable data repository, Department of Information and Computer Science, University of California at Irvine, Irvine, CA, 2000. http://www.ics.uci.edu/∼mlearn/MLRepository.html.
[6] C. Boutilier. Context-specific independence in Bayesian networks. In Proceedings of the Twelfth Annual Conference on Uncertainty in Artificial Intelligence (UAI), pages 115–123, 1996.
[7] M. Chavira and A. Darwiche. On probabilistic inference by weighted model counting. Artificial Intelligence, 172(6–7):772–799, April 2008.
[8] A. Chechetka and C. Guestrin. Efficient principled learning of thin junction trees. In Advances in Neural Information Processing Systems (NIPS), December 2007.
[9] D.M. Chickering, D. Geiger, and D. Heckerman. Learning Bayesian networks: Search methods and experimental results. In Proceedings of the Fifth International Workshop on Artificial Intelligence and Statistics (AISTATS), pages 112–128, 1995.
[10] A. Darwiche. A differential approach to inference in Bayesian networks. Journal of the ACM, 50(3):280– 305, 2003.
[11] R. Dechter and R. Mateescu. AND/OR search spaces for graphical models. Artificial Intelligence, 171(23):73–106, 2007.
[12] S. Della Pietra, V. Della Pietra, and J. Lafferty. Inducing features of random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19:380–392, 1997.
[13] S. Goldman. Efficient methods for calculating maximum entropy distributions. Master’s thesis, Massachusetts Institute of Technology, 1987.
[14] K. H¨ ffgen. Learning and robust learning of product distributions. In Proceedings of the Sixth Annual o ACM Conference on Computational Learning Theory (COLT), pages 77–83, 1993.
[15] D. R. Karger and N. Srebro. Learning Markov networks: maximum bounded tree-width graphs. In Proceedings of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 392–401, 2001.
[16] S. Lee, V. Ganapathi, and D. Koller. Efficient structure learning of Markov networks using L1regularization. In Proceedings of the Twentieth Annual Conference on Neural Information Processing Systems (NIPS), pages 817–824, 2006.
[17] D. C. Liu and J. Nocedal. On the limited memory BFGS method for large scale optimization. Mathematical Programming, 45(3):503–528, 1989.
[18] D. Lowd and P. Domingos. Learning arithmetic circuits. In Proceedings of the Twenty Fourth Conference in Uncertainty in Artificial Intelligence, pages 383–392, 2008.
[19] A. W. Moore and M. S. Lee. Cached sufficient statistics for efficient machine learning with large datasets. Journal of Artificial Intelligence Research, 8:67–91, 1997.
[20] K. P. Murphy, Y. Weiss, and M. I. Jordan. Loopy belief propagation for approximate inference: An empirical study. In Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence (UAI), pages 467–475, 1999.
[21] M. Narasimhan and J. Bilmes. PAC-learning bounded tree-width graphical models. In Proceedings of the Twentieth Conference in Uncertainty in Artificial Intelligence (UAI), pages 410–417, 2004.
[22] J. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, San Francisco, CA, 1988.
[23] M. Queyranne. Minimizing symmetric submodular functions. Mathematical Programming, 82(1):3–12, 1998.
[24] P. Ravikumar, M. J. Wainwright, and J. Lafferty. High-dimensional Ising model selection using L1regularized logistic regression. Annals of Statistics, 38(3):1287–1319, 2010.
[25] D. Roth. On the hardness of approximate reasoning. Artificial Intelligence, 82:273–302, 1996.
[26] T. Sang, P. Beame, and H. Kautz. Performing Bayesian inference by weighted model counting. In Proceedings of The Twentieth National Conference on Artificial Intelligence (AAAI), pages 475–482, 2005. 9