nips nips2010 nips2010-210 nips2010-210-reference knowledge-graph by maker-knowledge-mining

210 nips-2010-Practical Large-Scale Optimization for Max-norm Regularization

Source: pdf

Author: Jason Lee, Ben Recht, Nathan Srebro, Joel Tropp, Ruslan Salakhutdinov

Abstract: The max-norm was proposed as a convex matrix regularizer in [1] and was shown to be empirically superior to the trace-norm for collaborative ﬁltering problems. Although the max-norm can be computed in polynomial time, there are currently no practical algorithms for solving large-scale optimization problems that incorporate the max-norm. The present work uses a factorization technique of Burer and Monteiro [2] to devise scalable ﬁrst-order algorithms for convex programs involving the max-norm. These algorithms are applied to solve huge collaborative ﬁltering, graph cut, and clustering problems. Empirically, the new methods outperform mature techniques from all three areas. 1

reference text

[1] Nathan Srebro, Jason Rennie, and Tommi Jaakkola. Maximum margin matrix factorization. In Advances in Neural Information Processing Systems, 2004.

[2] Samuel Burer and R. D. C. Monteiro. A nonlinear programming algorithm for solving semideﬁnite programs via low-rank factorization. Mathematical Programming (Series B), 95:329–357, 2003.

[3] Benjamin Recht, Maryam Fazel, and Pablo Parrilo. Guaranteed minimum rank solutions of matrix equations via nuclear norm minimization. SIAM Review, 2007. To appear. Preprint Available at http://pages.cs.wisc.edu/˜brecht/publications.html.

[4] Francis R. Bach, Julien Marial, and Jean Ponce. Convex sparse matrix factorizations. Preprint available at arxiv.org/abs/0812.1869, 2008.

[5] Nathan Srebro and Adi Shraibman. Rank, trace-norm and max-norm. In 18th Annual Conference on Learning Theory (COLT), 2005.

[6] G. J. O. Jameson. Summing and Nuclear Norms in Banach Space Theory. Number 8 in London Mathematical Society Student Texts. Cambridge University Press, Cambridge, UK, 1987.

[7] Ruslan Salakhutdinov and Nathan Srebro. Collaborative ﬁltering in a non-uniform world: Learning with the weighted trace norm. Preprint available at arxiv.org/abs/1002.2780, 2010.

[8] Masao Fukushima and Hisashi Mine. A generalized proximal point algorithm for certain non-convex minimization problems. International Journal of Systems Science, 12(8):989–1000, 1981.

[9] Samuel Burer and Changhui Choi. Computational enhancements in low-rank semideﬁnite programming. Optimization Methods and Software, 21(3):493–512, 2006.

[10] Dimitri P. Bertsekas. Nonlinear Programming. Athena Scientiﬁc, Belmont, MA, 2nd edition, 1999.

[11] T Hale, W Yin, and Y Zhang. A ﬁxed-point continuation method for l 1-regularized minimization with applications to compressed sensing. Dept. Computat. Appl. Math., Rice Univ., Houston, TX, Tech. Rep. TR07-07, 2007.

[12] Stephen J. Wright, Robert Nowak, and M´ rio A. T. Figueiredo. Sparse reconstruction by separable apa proximation. Journal version, to appear in IEEE Transactions on Signal Processing. Preprint available at http:http://www.optimization-online.org/DB_HTML/2007/10/1813.html, 2007.

[13] Jian-Feng Cai, Emmanuel J. Cand` s, and Zuowei Shen. A singular value thresholding algorithm for e matrix completion. To appear in SIAM J. on Optimization. Preprint available at http://arxiv.org/ abs/0810.3286, 2008.

[14] Shiqian Ma, Donald Goldfarb, and Lifeng Chen. Fixed point and Bregman iterative methods for matrix rank minimization. Preprint available at http://www.optimization-online.org/DB_HTML/ 2008/11/2151.html, 2008.

[15] Yurii Nesterov. Gradient methods for minimizing composite objective function. To appear. Preprint Available at http://www.optimization-online.org/DB_HTML/2007/09/1784.html, September 2007.

[16] M. X. Goemans and D. P. Williamson. Improved approximation algorithms for maximum cut and satisﬁability problems using semideﬁnite programming. Journal of the ACM, 42:1115–1145, 1995.

[17] The Gset is available for download at http://www.stanford.edu/˜yyye/yyye/Gset/.

[18] Samuel Burer. Sdplr. Software available at http://dollar.biz.uiowa.edu/˜sburer/www/ doku.php?id=software#sdplr.

[19] Arthur Szlam and Xavier Bresson. A total variation-based graph clustering algorithm for cheeger ratio cuts. To appear in ICML 2010. Preprint available at ftp://ftp.math.ucla.edu/pub/ camreport/cam09-68.pdf, 2010.

[20] Jianbo Shi and Jitendra Malik. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8):888–905, 2000. 9