nips nips2009 nips2009-129 nips2009-129-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: M. P. Kumar, Daphne Koller
Abstract: The problem of approximating a given probability distribution using a simpler distribution plays an important role in several areas of machine learning, for example variational inference and classification. Within this context, we consider the task of learning a mixture of tree distributions. Although mixtures of trees can be learned by minimizing the KL-divergence using an EM algorithm, its success depends heavily on the initialization. We propose an efficient strategy for obtaining a good initial set of trees that attempts to cover the entire observed distribution by minimizing the α-divergence with α = ∞. We formulate the problem using the fractional covering framework and present a convergent sequential algorithm that only relies on solving a convex program at each iteration. Compared to previous methods, our approach results in a significantly smaller mixture of trees that provides similar or better accuracies. We demonstrate the usefulness of our approach by learning pictorial structures for face recognition.
[1] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004.
[2] P. Cheeseman and J. Stutz. Bayesian classification (AutoClass): Theory and results. In KDD, pages 153–180, 1995.
[3] C. Chow and C. Liu. Approximating discrete probability distributions with dependence trees. IEEE Transactions on Information Theory, 14(3):462–467, 1968.
[4] D. Crandall, P. Felzenszwalb, and D. Huttenlocher. Spatial priors for parts-based recognition using statistical models. In CVPR, 2005.
[5] M. Everingham, J. Sivic, and A. Zisserman. Hello! My name is... Buffy - Automatic naming of characters in TV video. In BMVC, 2006.
[6] M. Fischler and R. Elschlager. The representation and matching of pictorial structures. TC, 22:67–92, January 1973.
[7] Y. Freund and R. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1):119–139, 1997.
[8] B. Frey, R. Patrascu, T. Jaakkola, and J. Moran. Sequentially fitting inclusive trees for inference in noisy-OR networks. In NIPS, 2000.
[9] J. Friedman, T. Hastie, and R. Tibshirani. Additive logistic regression: A statistical view of boosting. Annals of Statistics, 28(2):337–407, 2000.
[10] N. Friedman, D. Geiger, and M. Goldszmidt. Bayesian network classifiers. Machine Learning, 29:131–163, 1997.
[11] S. Ioffe and D. Forsyth. Human tracking with mixtures of trees. In ICCV, pages 690–695, 2001.
[12] S. Ioffe and D. Forsyth. Mixtures of trees for object recognition. In CVPR, pages 180–185, 2001.
[13] Y. Jing, V. Pavlovic, and J. Rehg. Boosted bayesian network classifiers. Machine Learning, 73(2):155–184, 2008.
[14] S. Kirschner and P. Smyth. Infinite mixture of trees. In ICML, pages 417–423, 2007.
[15] J. Kleinberg and E. Tardos. Approximation algorithms for classification problems with pairwise relationships: Metric labeling and Markov random fields. In STOC, 1999.
[16] V. Kolmogorov. Convergent tree-reweighted message passing for energy minimization. PAMI, 2006.
[17] M. P. Kumar and D. Koller. MAP estimation of semi-metric MRFs via hierarchical graph cuts. In UAI, 2009.
[18] M. P. Kumar, P. Torr, and A. Zisserman. An invariant large margin nearest neighbour classifier. In ICCV, 2007.
[19] Y. Lin, S. Zhu, D. Lee, and B. Taskar. Learning sparse Markov network structure via ensembleof-trees models. In AISTATS, 2009.
[20] M. Meila and T. Jaakkola. Tractable Bayesian learning of tree belief networks. In UAI, 2000.
[21] M. Meila and M. Jordan. Learning with a mixture of trees. JMLR, 1:1–48, 2000.
[22] T. Minka. Divergence measures and message passing. Technical report, Microsoft Research, 2005.
[23] J. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan-Kauffman, 1988.
[24] S. Plotkin, D. Shmoys, and E. Tardos. Fast approximation algorithms for fractional packing and covering problems. Mathematics of Operations Research, 20:257–301, 1995.
[25] A. Renyi. On measures of information and entropy. In Berkeley Symposium on Mathematics, Statistics and Probability, pages 547–561, 1961.
[26] S. Rosset and E. Segal. Boosting density estimation. In NIPS, 2002.
[27] M. Wainwright, T. Jaakkola, and A. Willsky. A new class of upper bounds on the log partition function. IEEE Transactions on Information Theory, 51:2313–2335, 2005. 9