nips nips2013 nips2013-147 nips2013-147-reference knowledge-graph by maker-knowledge-mining

147 nips-2013-Lasso Screening Rules via Dual Polytope Projection

Source: pdf

Author: Jie Wang, Jiayu Zhou, Peter Wonka, Jieping Ye

Abstract: Lasso is a widely used regression technique to ﬁnd sparse representations. When the dimension of the feature space and the number of samples are extremely large, solving the Lasso problem remains challenging. To improve the efﬁciency of solving large-scale Lasso problems, El Ghaoui and his colleagues have proposed the SAFE rules which are able to quickly identify the inactive predictors, i.e., predictors that have 0 components in the solution vector. Then, the inactive predictors or features can be removed from the optimization problem to reduce its scale. By transforming the standard Lasso to its dual form, it can be shown that the inactive predictors include the set of inactive constraints on the optimal dual solution. In this paper, we propose an efﬁcient and effective screening rule via Dual Polytope Projections (DPP), which is mainly based on the uniqueness and nonexpansiveness of the optimal dual solution due to the fact that the feasible set in the dual space is a convex and closed polytope. Moreover, we show that our screening rule can be extended to identify inactive groups in group Lasso. To the best of our knowledge, there is currently no “exact” screening rule for group Lasso. We have evaluated our screening rule using many real data sets. Results show that our rule is more effective in identifying inactive predictors than existing state-of-the-art screening rules for Lasso. 1

reference text

[1] S. R. Becker, E. Cand`s, and M. Grant. Templates for convex cone problems with applications to sparse e signal recovery. Technical report, Standford University, 2010.

[2] D. P. Bertsekas. Convex Analysis and Optimization. Athena Scientiﬁc, 2003.

[3] A. Bruckstein, D. Donoho, and M. Elad. From sparse solutions of systems of equations to sparse modeling of signals and images. SIAM Review, 51:34–81, 2009.

[4] E. Cand` s. Compressive sampling. In Proceedings of the International Congress of Mathematics, 2006. e

[5] S. S. Chen, D. L. Donoho, and M. A. Saunders. Atomic decomposition by basis pursuit. SIAM Review, 43:129–159, 2001.

[6] D. L. Donoho and Y. Tsaig. Fast solution of l-1 norm minimization problems when the solution may be sparse. IEEE Transactions on Information Theory, 54:4789–4812, 2008.

[7] B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani. Least angle regression. Annals of Statistics, 32:407– 499, 2004.

[8] L. El Ghaoui, V. Viallon, and T. Rabbani. Safe feature elimination in sparse supervised learning. Paciﬁc Journal of Optimization, 8:667–698, 2012.

[9] J. Fan and J. Lv. Sure independence screening for ultrahigh dimensional feature spaces. Journal of the Royal Statistical Society Series B, 70:849–911, 2008.

[10] J. Friedman, T. Hastie, H. H¨ﬂing, and R. Tibshirani. Pathwise coordinate optimization. Annals of Applied e Statistics, 1:302–332, 2007.

[11] J. Friedman, T. Hastie, and R. Tibshirani. Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33:1–22, 2010.

[12] S. J. Kim, K. Koh, M. Lustig, S. Boyd, and D. Gorinevsky. An interior-point method for large scale l1-regularized least squares. IEEE Journal of Selected Topics in Signal Processing, 1:606–617, 2007.

[13] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. In Proceedings of the IEEE, 1998.

[14] J. Liu, S. Ji, and J. Ye. SLEP: Sparse Learning with Efﬁcient Projections. Arizona State University, 2009.

[15] J. Mairal and B. Yu. Complexity analysis of the lasso regularization path. In ICML, 2012.

[16] S. A. Nene, S. K. Nayar, and H. Murase. Columbia object image library (coil). Technical report, No. CUCS-006-96, Dept. Comp. Science, Columbia University, 1996.

[17] M. R. Osborne, B. Presnell, and B. A. Turlach. A new approach to variable selection in least squares problems. IMA Journal of Numerical Analysis, 20:389–404, 2000.

[18] M. Y. Park and T. Hastie. L1-regularized path algorithm for generalized linear models. Journal of the Royal Statistical Society Series B, 69:659–677, 2007.

[19] F. Samaria and A. Harter. Parameterisation of a stochastic model for human face identiﬁcation. In Proceedings of 2nd IEEE Workshop on Applications of Computer Vision, 1994.

[20] R. Tibshirani. Regression shringkage and selection via the lasso. Journal of the Royal Statistical Society Series B, 58:267–288, 1996.

[21] R. Tibshirani, J. Bien, J. Friedman, T. Hastie, N. Simon, J. Taylor, and R. Tibshirani. Strong rules for discarding predictors in lasso-type problems. Journal of the Royal Statistical Society Series B, 74:245– 266, 2012.

[22] N. Ueda and K. Saito. Parametric mixture models for multi-labeled text. Advances in neural information processing systems, 15:721–728, 2002.

[23] J. Wright, Y. Ma, J. Mairal, G. Sapiro, T. Huang, and S. Yan. Sparse representation for computer vision and pattern recognition. In Proceedings of IEEE, 2010.

[24] T. T. Wu, Y. F. Chen, T. Hastie, E. Sobel, and K. Lange. Genomewide association analysis by lasso penalized logistic regression. Bioinformatics, 25:714–721, 2009.

[25] Z. J. Xiang and P. J. Ramadge. Fast lasso screening tests based on correlations. In IEEE ICASSP, 2012.

[26] Z. J. Xiang, H. Xu, and P. J. Ramadge. Learning sparse representation of high dimensional data on large scale dictionaries. In NIPS, 2011.

[27] M. Yuan and Y. Lin. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society Series B, 68:49–67, 2006.

[28] P. Zhao and B. Yu. On model selection consistency of lasso. Journal of Machine Learning Research, 7:2541–2563, 2006.

[29] J. Zhou, L. Yuan, J. Liu, and J. Ye. A multi-task learning formulation for predicting disease progression. In KDD, pages 814–822. ACM, 2011. 9