nips nips2011 nips2011-222 nips2011-222-reference knowledge-graph by maker-knowledge-mining

222 nips-2011-Prismatic Algorithm for Discrete D.C. Programming Problem

Source: pdf

Author: Yoshinobu Kawahara, Takashi Washio

Abstract: In this paper, we propose the ﬁrst exact algorithm for minimizing the difference of two submodular functions (D.S.), i.e., the discrete version of the D.C. programming problem. The developed algorithm is a branch-and-bound-based algorithm which responds to the structure of this problem through the relationship between submodularity and convexity. The D.S. programming problem covers a broad range of applications in machine learning. In fact, this generalizes any set-function optimization. We empirically investigate the performance of our algorithm, and illustrate the difference between exact and approximate solutions respectively obtained by the proposed and existing algorithms in feature selection and discriminative structure learning.

reference text

[1] F. Bach. Structured sparsity-inducing norms through submodular functions. In Advances in Neural Information Processing Systems 23, pages 118–126, 2010.

[2] J. A. Bilmes. Dynamic Bayesian multinets. In Proc. of the 16th Conf. on Uncertainty in Artiﬁcial Intelligence (UAI’00), pages 38–45, 2000.

[3] T. S. Caetano, J. J. McAuley, L. Cheng, Q. V. Le, and A. J. Smola. Learning graph matching. IEEE Trans. on Pattern Analysis and Machine Intelligence, 31(6):1048–1058, 2009.

[4] A. Das and D. Kempe. Algorithms for subset selection in linear regression. In Proc. of the 40th annual ACM symp. on Theory of computing (STOC’08), pages 45–54, 2008.

[5] J. Edmonds. Submodular functions, matroids, and certain polyhedra. In R. Guy, H. Hanani, N. Sauer, and J. Sch¨ nheim, editors, Combinatorial structures and their applications, pages 69–87, 1970. o

[6] N. Friedman, D. Geiger, and M. Goldszmidt. Bayesian network classiﬁer. 29:131–163, 1997.

[7] S. Fujishige. Submodular Functions and Optimization. Elsevier, 2 edition, 2005.

[8] S. Fujishige, T. Hayashi, and S. Isotani. The minimum-norm-point algorithm applied submodular function minimization and linear programming. Technical report, Research Institute for Mathematical Sciences, Kyoto University, 2006.

[9] E. Hazan and S. Kale. Beyond convexity: online submodular minimization. In Advances in Neural Information Processing Systems 22, pages 700–708, 2009.

[10] S. Hoi, R. Jin, J. Zhu, and M. Lyu. Batch mode active learning and its application to medical image classiﬁcation. In Proc. of the 23rd Int’l Conf. on Machine learning (ICML’06), pages 417–424, 2006.

[11] R. Horst, T. Q. Phong, Ng. V. Thoai, and J. de Vries. On solving a D.C. programming problem by a sequence of linear programs. Journal of Global Optimization, 1:183–203, 1991.

[12] R. Horst and H. Tuy. Global Optimization (Deterministic Approaches). Springer, 3 edition, 1996.

[13] T. Ibaraki. Enumerative approaches to combinatorial optimization. In J.C. Baltzer and A.G. Basel, editors, Annals of Operations Research, volume 10 and 11. 1987.

[14] Y. Kawahara, K. Nagano, K. Tsuda, and J. A. Bilmes. Submodularity cuts and applications. In Advances in Neural Information Processing Systems 22, pages 916–924. MIT Press, 2009.

[15] A. Krause and V. Cevher. Submodular dictionary selection for sparse representation. In Proc. of the 27th Int’l Conf. on Machine learning (ICML’10), pages 567–574. Omnipress, 2010.

[16] A. Krause, H. B. McMahan, C. Guestrin, and A. Gupta. Robust submodular observation selection. Journal of Machine Learning Research, 9:2761–2801, 2008.

[17] L. Lov´ sz. Submodular functions and convexity. In A. Bachem, M. Gr¨ tschel, and B. Korte, editors, a o Mathematical Programming – The State of the Art, pages 235–257. 1983.

[18] K. Murota. Discrete Convex Analysis. Monographs on Discrete Math and Applications. SIAM, 2003.

[19] K. Nagano, Y. Kawahara, and S. Iwata. Minimum average cost clustering. In Advances in Neural Information Processing Systems 23, pages 1759–1767, 2010.

[20] M. Narasimhan and J. A. Bilmes. PAC-learning bounded tree-width graphical models. In Proc. of the 20th Ann. Conf. on Uncertainty in Artiﬁcial Intelligence (UAI’04), pages 410–417, 2004.

[21] M. Narasimhan and J. A. Bilmes. A submodular-supermodular procedure with applications to discriminative structure learning. In Proc. of the 21st Ann. Conf. on Uncertainty in Artiﬁcial Intelligence (UAI’05), pages 404–412, 2005.

[22] F. Pernkopf and J. A. Bilmes. Discriminative versus generative parameter and structure learning of bayesian network classiﬁers. In Proc. of the 22nd Int’l Conf. on Machine Learning (ICML’05), pages 657–664, 2005.

[23] M. Queyranne. Minimizing symmetric submodular functions. Math. Prog., 82(1):3–12, 1998.

[24] C. Rother, T. Minka, A. Blake, and V. Kolmogorov. Cosegmentation of image pairs by histogram matching-incorporating a global constraint into mrfs. In Proc. of the 2006 IEEE Comp. Soc. Conf. on Computer Vision and Pattern Recognition (CVPR’06), pages 993–1000, 2006.

[25] A. Shekhovtsov. Supermodular decomposition of structural labeling problem. Control Systems and Computers, 20(1):39–48, 2006.

[26] A. Shekhovtsov, V. Kolmogorov, P. Kohli, V. Hlav c, C. Rother, and P. Torr. Lp-relaxation of binarized energy minimization. Technical Report CTU-CMP-2007-27, Czech Technical University, 2007.

[27] M. Thoma, H. Cheng, A. Gretton, H. Han, H. P. Kriegel, A. J. Smola, S. Y. Le Song Philip, X. Yan, and K. Borgwardt. Near-optimal supervised feature selection among frequent subgraphs. In Proc. of the 2009 SIAM Conf. on Data Mining (SDM’09), pages 1076–1087, 2008. 9