nips nips2012 nips2012-328 nips2012-328-reference knowledge-graph by maker-knowledge-mining

328 nips-2012-Submodular-Bregman and the Lovász-Bregman Divergences with Applications

Source: pdf

Author: Rishabh Iyer, Jeff A. Bilmes

Abstract: We introduce a class of discrete divergences on sets (equivalently binary vectors) that we call the submodular-Bregman divergences. We consider two kinds, deﬁned either from tight modular upper or tight modular lower bounds of a submodular function. We show that the properties of these divergences are analogous to the (standard continuous) Bregman divergence. We demonstrate how they generalize many useful divergences, including the weighted Hamming distance, squared weighted Hamming, weighted precision, recall, conditional mutual information, and a generalized KL-divergence on sets. We also show that the generalized Bregman divergence on the Lov´ sz extension of a submodular function, which we a call the Lov´ sz-Bregman divergence, is a continuous extension of a submodular a Bregman divergence. We point out a number of applications, and in particular show that a proximal algorithm deﬁned through the submodular Bregman divergence provides a framework for many mirror-descent style algorithms related to submodular function optimization. We also show that a generalization of the k-means algorithm using the Lov´ sz Bregman divergence is natural in clustering scenarios where a ordering is important. A unique property of this algorithm is that computing the mean ordering is extremely efﬁcient unlike other order based distance measures. 1

reference text

[1] F. Bach. Learning with Submodular functions: A convex Optimization Perspective. Arxiv preprint, 2011.

[2] A. Banerjee, S. Meregu, I. S. Dhilon, and J. Ghosh. Clustering with Bregman divergences. JMLR, 6:1705–1749, Oct. 2005.

[3] E. Boros and P. L. Hammer. Pseudo-boolean optimization. Discrete Applied Math., 123(1–3):155 – 225, 2002.

[4] L. Bregman. The relaxation method of ﬁnding the common point of convex sets and its application to the solution of problems in convex programming. USSR Comput. Math and Math Physics, 7, 1967.

[5] Y. Censor and S. Zenios. Parallel optimization: Theory, algorithms, and applications. Oxford University Press, USA, 1997.

[6] I. Dhillon and J. Tropp. Matrix nearness problems with Bregman divergences. SIAM Journal on Matrix Analysis and Applications, 29(4):1120–1146, 2007.

[7] J. Edmonds. Submodular functions, matroids and certain polyhedra. Combinatorial structures and their Applications, 1970.

[8] B. Frigyik, S. Srivastava, and M. Gupta. Functional Bregman divergence. In In Proc. ISIT, pages 1681–1685. IEEE, 2008.

[9] S. Fujishige. Submodular functions and optimization, volume 58. Elsevier Science, 2005.

[10] R. Iyer and J. Bilmes. A framework of mirror descent algorithms for submodular optimization. To Appear in NIPS Workshop on Discrete Optimization in Machine Learning (DISCML) 2012- Structure and Scalability, 2012.

[11] R. Iyer and J. Bilmes. Algorithms for approximate minimization of the difference between submodular functions, with applications. In Proc. UAI, 2012.

[12] R. Iyer and J. Bilmes. Submodular-Bregman and the Lov´ sz-Bregman Divergences with Applications: a Extended Version. 2012.

[13] R. Iyer and J. Bilmes. A uniﬁed theory on the generalized bregman divergences. Manuscript, 2012.

[14] S. Jegelka and J. Bilmes. Cooperative cuts: Graph cuts with submodular edge weights. Technical report, Technical Report TR-189, Max Planck Institute for Biological Cybernetics, 2010.

[15] S. Jegelka and J. Bilmes. Submodularity beyond submodular energies: coupling edges in graph cuts. In Computer Vision and Pattern Recognition (CVPR), Colorado Springs, CO, June 2011.

[16] S. Jegelka, H. Lin, and J. Bilmes. Fast approximate submodular minimization. In Proc. NIPS, 2011.

[17] M. Kendall. A new measure of rank correlation. Biometrika, 30(1/2):81–93, 1938.

[18] K. C. Kiwiel. Free-steering relaxation methods for problems with strictly convex costs and linear constraints. Mathematics of Operations Research, 22(2):326–349, 1997.

[19] S. Lloyd. Least squares quantization in pcm. IEEE Transactions on IT, 28(2):129–137, 1982.

[20] L. Lov´ sz. Submodular functions and convexity. Mathematical Programming, 1983. a

[21] J. MacQueen et al. Some methods for classiﬁcation and analysis of multivariate observations. In Proceedings of the ﬁfth Berkeley symposium on math. stats and probability, volume 1, pages 281–297. California, USA, 1967.

[22] M. Narasimhan and J. Bilmes. A submodular-supermodular procedure with applications to discriminative structure learning. In Uncertainty in Artiﬁcial Intelligence (UAI), Edinburgh, Scotland, July 2005.

[23] G. Nemhauser, L. Wolsey, and M. Fisher. An analysis of approximations for maximizing submodular set functions—i. Mathematical Programming, 14(1):265–294, 1978.

[24] P. Stobbe and A. Krause. Efﬁcient minimization of decomposable submodular functions. In Proc. Neural Information Processing Systems (NIPS), 2010.

[25] M. Telgarsky and S. Dasgupta. Agglomerative bregman clustering. In Proc. ICML, 2012.

[26] K. Tsuda, G. Ratsch, and M. Warmuth. Matrix exponentiated gradient updates for on-line learning and Bregman projection. JMLR, 6(1):995, 2006.

[27] M. K. Warmuth. Online learning and Bregman divergences. Tutorial at the Machine Learning Summer School, 2006. 9