nips nips2012 nips2012-319 nips2012-319-reference knowledge-graph by maker-knowledge-mining

319 nips-2012-Sparse Prediction with the $k$-Support Norm


Source: pdf

Author: Andreas Argyriou, Rina Foygel, Nathan Srebro

Abstract: We derive a novel norm that corresponds to the tightest convex relaxation of sparsity combined with an 2 penalty. We show that this new k-support norm provides a tighter relaxation than the elastic net and can thus be advantageous in in sparse prediction problems. We also bound the looseness of the elastic net, thus shedding new light on it and providing justification for its use. 1


reference text

[1] A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal of Imaging Sciences, 2(1):183–202, 2009.

[2] R. Bhatia. Matrix Analysis. Graduate Texts in Mathematics. Springer, 1997. 4 http://www.csie.ntu.edu.tw/∼cjlin/libsvmtools/datasets/ Regarding other sparse prediction methods, we did not manage to compare with OSCAR, due to memory limitations, or to PEN or trace Lasso, which do not have code available online. 5 8

[3] H.D. Bondell and B.J. Reich. Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR. Biometrics, 64(1):115–123, 2008.

[4] P.L. Combettes and V.R. Wajs. Signal recovery by proximal forward-backward splitting. Multiscale Modeling and Simulation, 4(4):1168–1200, 2006.

[5] C. De Mol, E. De Vito, and L. Rosasco. Elastic-net regularization in learning theory. Journal of Complexity, 25(2):201–230, 2009.

[6] E. Grave, G. R. Obozinski, and F. Bach. Trace lasso: a trace norm regularization for correlated designs. In J. Shawe-Taylor, R.S. Zemel, P. Bartlett, F.C.N. Pereira, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems 24, 2011.

[7] G. H. Hardy, J. E. Littlewood, and G. P´ lya. Inequalities. Cambridge University Press, 1934. o

[8] T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference and Prediction. Springer Verlag Series in Statistics, 2001.

[9] A.E. Hoerl and R.W. Kennard. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, pages 55–67, 1970.

[10] L. Jacob, G. Obozinski, and J.P. Vert. Group Lasso with overlap and graph Lasso. In Proceedings of the 26th Annual International Conference on Machine Learning, pages 433–440. ACM, 2009.

[11] S.M. Kakade, K. Sridharan, and A. Tewari. On the complexity of linear prediction: Risk bounds, margin bounds, and regularization. In Advances in Neural Information Processing Systems, volume 22, 2008.

[12] S. S. Keerthi and D. DeCoste. A modified finite Newton method for fast solution of large scale linear SVMs. Journal of Machine Learning Research, 6:341–361, 2005.

[13] A. Lorbert, D. Eis, V. Kostina, D.M. Blei, and P.J. Ramadge. Exploiting covariate similarity in sparse regression via the pairwise elastic net. In Proceedings of the 13th International Conference on Artificial Intelligence and Statistics, 2010.

[14] Y. Nesterov. Gradient methods for minimizing composite objective function. CORE, 2007.

[15] N. Srebro, K. Sridharan, and A. Tewari. Smoothness, low-noise and fast rates. In Advances in Neural Information Processing Systems 23, 2010.

[16] T. Suzuki and R. Tomioka. SpicyMKL: a fast algorithm for multiple kernel learning with thousands of kernels. Machine learning, pages 1–32, 2011.

[17] R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B (Statistical Methodology), 58(1):267–288, 1996.

[18] P. Tseng. On accelerated proximal gradient methods for convex-concave optimization. Preprint, 2008.

[19] P. Tseng. Approximation accuracy, gradient methods, and error bound for structured convex optimization. Mathematical Programming, 125(2):263–295, 2010.

[20] T. Zhang. Covering number bounds of certain regularized linear function classes. The Journal of Machine Learning Research, 2:527–550, 2002.

[21] H. Zou and T. Hastie. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2):301–320, 2005. 9