nips nips2009 nips2009-129 nips2009-129-reference knowledge-graph by maker-knowledge-mining

129 nips-2009-Learning a Small Mixture of Trees

Source: pdf

Author: M. P. Kumar, Daphne Koller

Abstract: The problem of approximating a given probability distribution using a simpler distribution plays an important role in several areas of machine learning, for example variational inference and classiﬁcation. Within this context, we consider the task of learning a mixture of tree distributions. Although mixtures of trees can be learned by minimizing the KL-divergence using an EM algorithm, its success depends heavily on the initialization. We propose an efﬁcient strategy for obtaining a good initial set of trees that attempts to cover the entire observed distribution by minimizing the α-divergence with α = ∞. We formulate the problem using the fractional covering framework and present a convergent sequential algorithm that only relies on solving a convex program at each iteration. Compared to previous methods, our approach results in a signiﬁcantly smaller mixture of trees that provides similar or better accuracies. We demonstrate the usefulness of our approach by learning pictorial structures for face recognition.

reference text

[1] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004.

[2] P. Cheeseman and J. Stutz. Bayesian classiﬁcation (AutoClass): Theory and results. In KDD, pages 153–180, 1995.

[3] C. Chow and C. Liu. Approximating discrete probability distributions with dependence trees. IEEE Transactions on Information Theory, 14(3):462–467, 1968.

[4] D. Crandall, P. Felzenszwalb, and D. Huttenlocher. Spatial priors for parts-based recognition using statistical models. In CVPR, 2005.

[5] M. Everingham, J. Sivic, and A. Zisserman. Hello! My name is... Buffy - Automatic naming of characters in TV video. In BMVC, 2006.

[6] M. Fischler and R. Elschlager. The representation and matching of pictorial structures. TC, 22:67–92, January 1973.

[7] Y. Freund and R. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1):119–139, 1997.

[8] B. Frey, R. Patrascu, T. Jaakkola, and J. Moran. Sequentially ﬁtting inclusive trees for inference in noisy-OR networks. In NIPS, 2000.

[9] J. Friedman, T. Hastie, and R. Tibshirani. Additive logistic regression: A statistical view of boosting. Annals of Statistics, 28(2):337–407, 2000.

[10] N. Friedman, D. Geiger, and M. Goldszmidt. Bayesian network classiﬁers. Machine Learning, 29:131–163, 1997.

[11] S. Ioffe and D. Forsyth. Human tracking with mixtures of trees. In ICCV, pages 690–695, 2001.

[12] S. Ioffe and D. Forsyth. Mixtures of trees for object recognition. In CVPR, pages 180–185, 2001.

[13] Y. Jing, V. Pavlovic, and J. Rehg. Boosted bayesian network classiﬁers. Machine Learning, 73(2):155–184, 2008.

[14] S. Kirschner and P. Smyth. Inﬁnite mixture of trees. In ICML, pages 417–423, 2007.

[15] J. Kleinberg and E. Tardos. Approximation algorithms for classiﬁcation problems with pairwise relationships: Metric labeling and Markov random ﬁelds. In STOC, 1999.

[16] V. Kolmogorov. Convergent tree-reweighted message passing for energy minimization. PAMI, 2006.

[17] M. P. Kumar and D. Koller. MAP estimation of semi-metric MRFs via hierarchical graph cuts. In UAI, 2009.

[18] M. P. Kumar, P. Torr, and A. Zisserman. An invariant large margin nearest neighbour classiﬁer. In ICCV, 2007.

[19] Y. Lin, S. Zhu, D. Lee, and B. Taskar. Learning sparse Markov network structure via ensembleof-trees models. In AISTATS, 2009.

[20] M. Meila and T. Jaakkola. Tractable Bayesian learning of tree belief networks. In UAI, 2000.

[21] M. Meila and M. Jordan. Learning with a mixture of trees. JMLR, 1:1–48, 2000.

[22] T. Minka. Divergence measures and message passing. Technical report, Microsoft Research, 2005.

[23] J. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan-Kauffman, 1988.

[24] S. Plotkin, D. Shmoys, and E. Tardos. Fast approximation algorithms for fractional packing and covering problems. Mathematics of Operations Research, 20:257–301, 1995.

[25] A. Renyi. On measures of information and entropy. In Berkeley Symposium on Mathematics, Statistics and Probability, pages 547–561, 1961.

[26] S. Rosset and E. Segal. Boosting density estimation. In NIPS, 2002.

[27] M. Wainwright, T. Jaakkola, and A. Willsky. A new class of upper bounds on the log partition function. IEEE Transactions on Information Theory, 51:2313–2335, 2005. 9