nips nips2012 nips2012-290 nips2012-290-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Mert Pilanci, Laurent E. Ghaoui, Venkat Chandrasekaran
Abstract: We consider the problem of cardinality penalized optimization of a convex function over the probability simplex with additional convex constraints. The classical 1 regularizer fails to promote sparsity on the probability simplex since 1 norm on the probability simplex is trivially constant. We propose a direct relaxation of the minimum cardinality problem and show that it can be efficiently solved using convex programming. As a first application we consider recovering a sparse probability measure given moment constraints, in which our formulation becomes linear programming, hence can be solved very efficiently. A sufficient condition for exact recovery of the minimum cardinality solution is derived for arbitrary affine constraints. We then develop a penalized version for the noisy setting which can be solved using second order cone programs. The proposed method outperforms known rescaling heuristics based on 1 norm. As a second application we consider convex clustering using a sparse Gaussian mixture and compare our results with the well known soft k-means algorithm. 1
[1] E.J. Cand´ s, T. Tao, ”Decoding by linear programming”. IEEE Trans. Inform. Theory 51 e (2005), 4203-4215.
[2] S. Chen, D. Donoho, and M. Saunders. Atomic decomposition by basis pursuit” SIAM Review, 43(1):129-159, 2001.
[3] A. Bruckstein, D. Donoho, and M. Elad. ”From sparse solutions of systems of equations to sparse modeling of signals and images”. SIAM Review, 2007.
[4] V. Chandrasekaran, B. Recht, P.A. Parrilo, and A.S. Willsky. ”The convex algebraic geometry of linear inverse problems”. In Communication, Control, and Computing (Allerton), 2010 48th Annual Allerton Conference on, pages 699-703, 2010.
[5] S. Boyd and L. Vandenberghe, ”Convex Optimization”. Cambridge, U.K.: Cambridge Univ. Press, 2003.
[6] A. Cohen and A. Yeredor, ”On the use of sparsity for recovering discrete probability distributions from their moments”. Statistical Signal Processing Workshop (SSP), 2011 IEEE
[7] J. Kivinen and M. Warmuth. ”Exponentiated gradient versus gradient descent for linear predictors”. Information and Computation, 132(1):1-63, 1997.
[8] D. Lashkari and P. Golland, ”Convex clustering with exemplar-based models”, in NIPS, 2008.