nips nips2011 nips2011-78 nips2011-78-reference knowledge-graph by maker-knowledge-mining

78 nips-2011-Efficient Methods for Overlapping Group Lasso

Source: pdf

Author: Lei Yuan, Jun Liu, Jieping Ye

Abstract: The group Lasso is an extension of the Lasso for feature selection on (predeﬁned) non-overlapping groups of features. The non-overlapping group structure limits its applicability in practice. There have been several recent attempts to study a more general formulation, where groups of features are given, potentially with overlaps between the groups. The resulting optimization is, however, much more challenging to solve due to the group overlaps. In this paper, we consider the efﬁcient optimization of the overlapping group Lasso penalized problem. We reveal several key properties of the proximal operator associated with the overlapping group Lasso, and compute the proximal operator by solving the smooth and convex dual problem, which allows the use of the gradient descent type of algorithms for the optimization. We have performed empirical evaluations using both synthetic and the breast cancer gene expression data set, which consists of 8,141 genes organized into (overlapping) gene sets. Experimental results show that the proposed algorithm is more efﬁcient than existing state-of-the-art algorithms. 1

reference text

[1] A. Argyriou, C.A. Micchelli, M. Pontil, L. Shen, and Y. Xu. Efﬁcient ﬁrst order methods for linear composite regularizers. Arxiv preprint arXiv:1104.1436, 2011.

[2] A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences, 2(1):183–202, 2009.

[3] H. D. Bondell and B. J. Reich. Simultaneous regression shrinkage, variable selection and clustering of predictors with oscar. Biometrics, 64:115–123, 2008.

[4] J. F. Bonnans and A. Shapiro. Optimization problems with perturbations: A guided tour. SIAM Review, 40(2):228–264, 1998.

[5] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers. 2010.

[6] X. Chen, Q. Lin, S. Kim, J.G. Carbonell, and E.P. Xing. An efﬁcient proximal gradient method for general structured sparse learning. Arxiv preprint arXiv:1005.4717, 2010.

[7] H. Y. Chuang, E. Lee, Y. T. Liu, D. Lee, and T. Ideker. Network-based classiﬁcation of breast cancer metastasis. Molecular Systems Biology, 3(140), 2007.

[8] P.L. Combettes and J.C. Pesquet. Proximal splitting methods in signal processing. Arxiv preprint arXiv:0912.3522, 2009.

[9] J. M. Danskin. The theory of max-min and its applications to weapons allocation problems. SpringerVerlag, New York, 1967.

[10] J. Friedman, T. Hastie, H. H¨ ﬂing, and R. Tibshirani. Pathwise coordinate optimization. Annals of Applied o Statistics, 1(2):302–332, 2007.

[11] B. He and X. Yuan. An accelerated inexact proximal point algorithm for convex minimization. 2010.

[12] L. Jacob, G. Obozinski, and J. Vert. Group lasso with overlap and graph lasso. In ICML, 2009.

[13] R. Jenatton, J.-Y. Audibert, and F. Bach. Structured variable selection with sparsity-inducing norms. Technical report, arXiv:0904.3523, 2009.

[14] R. Jenatton, J. Mairal, G. Obozinski, and F. Bach. Proximal methods for sparse hierarchical dictionary learning. In ICML, 2010.

[15] S. Kim and E. P. Xing. Tree-guided group lasso for multi-task regression with structured sparsity. In ICML, 2010.

[16] H. Liu, M. Palatucci, and J. Zhang. Blockwise coordinate descent procedures for the multi-task lasso, with applications to neural semantic basis discovery. In ICML, 2009.

[17] J. Liu, S. Ji, and J. Ye. Multi-task feature learning via efﬁcient ℓ2,1 -norm minimization. In UAI, 2009.

[18] J. Mairal, R. Jenatton, G. Obozinski, and F. Bach. Network ﬂow algorithms for structured sparsity. In NIPS. 2010.

[19] L. Meier, S. Geer, and P. B¨ hlmann. The group lasso for logistic regression. Journal of the Royal u Statistical Society: Series B, 70:53–71, 2008.

[20] J.-J. Moreau. Proximit´ et dualit´ dans un espace hilbertien. Bull. Soc. Math. France, 93:273–299, 1965. e e

[21] A. Nemirovski. Efﬁcient methods in convex programming. Lecture Notes, 1994.

[22] Y. Nesterov. Introductory Lectures on Convex Optimization: A Basic Course. Kluwer Academic Publishers, 2004.

[23] R.T. Rockafellar. Monotone operators and the proximal point algorithm. SIAM Journal on Control and Optimization, 14:877, 1976.

[24] A. Subramanian and et al. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression proﬁles. Proceedings of the National Academy of Sciences, 102(43):15545– 15550, 2005.

[25] R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B, 58(1):267–288, 1996.

[26] M. J. Van de Vijver and et al. A gene-expression signature as a predictor of survival in breast cancer. The New England Journal of Medicine, 347(25):1999–2009, 2002.

[27] Y. Ying, C. Campbell, and M. Girolami. Analysis of svm with indeﬁnite kernels. In NIPS. 2009.

[28] M. Yuan and Y. Lin. Model selection and estimation in regression with grouped variables. Journal Of The Royal Statistical Society Series B, 68(1):49–67, 2006.

[29] P. Zhao, G. Rocha, and B. Yu. The composite absolute penalties family for grouped and hierarchical variable selection. Annals of Statistics, 37(6A):3468–3497, 2009. 9