jmlr jmlr2013 jmlr2013-51 jmlr2013-51-reference knowledge-graph by maker-knowledge-mining

51 jmlr-2013-Greedy Sparsity-Constrained Optimization


Source: pdf

Author: Sohail Bahmani, Bhiksha Raj, Petros T. Boufounos

Abstract: Sparsity-constrained optimization has wide applicability in machine learning, statistics, and signal processing problems such as feature selection and Compressed Sensing. A vast body of work has studied the sparsity-constrained optimization from theoretical, algorithmic, and application aspects in the context of sparse estimation in linear models where the fidelity of the estimate is measured by the squared error. In contrast, relatively less effort has been made in the study of sparsityconstrained optimization in cases where nonlinear models are involved or the cost function is not quadratic. In this paper we propose a greedy algorithm, Gradient Support Pursuit (GraSP), to approximate sparse minima of cost functions of arbitrary form. Should a cost function have a Stable Restricted Hessian (SRH) or a Stable Restricted Linearization (SRL), both of which are introduced in this paper, our algorithm is guaranteed to produce a sparse vector within a bounded distance from the true sparse optimum. Our approach generalizes known results for quadratic cost functions that arise in sparse linear regression and Compressed Sensing. We also evaluate the performance of GraSP through numerical simulations on synthetic and real data, where the algorithm is employed for sparse logistic regression with and without ℓ2 -regularization. Keywords: sparsity, optimization, compressed sensing, greedy algorithm


reference text

A. Agarwal, S. Negahban, and M. Wainwright. Fast global convergence rates of gradient methods for high-dimensional statistical recovery. In J. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. Zemel, and A. Culotta, editors, Advances in Neural Information Processing Systems 23, pages 37–45. 2010. long version available at arXiv:1104.4824v1 [stat.ML]. S. Bahmani, P. Boufounos, and B. Raj. Greedy sparsity-constrained optimization. In Conference Record of the Forty-Fifth Asilomar Conference on Signals, Systems, and Computers, pages 1148– 1152, 2011. A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences, 2(1):183–202, 2009. P. Bickel, Y. Ritov, and A. Tsybakov. Simultaneous analysis of Lasso and Dantzig selector. The Annals of Statistics, 37(4):1705–1732, 2009. T. Blumensath. Compressed sensing with nonlinear observations. Preprint, 2010. http://users.fmrib.ox.ac.uk/∼tblumens/papers/B Nonlinear.pdf. URL T. Blumensath and M. E. Davies. Iterative hard thresholding for compressed sensing. Applied and Computational Harmonic Analysis, 27(3):265–274, Nov. 2009. F. Bunea. Honest variable selection in linear and logistic regression models via ℓ1 and ℓ1 + ℓ2 penalization. Electronic Journal of Statistics, 2:1153–1194, 2008. E. J. Cand` s. The restricted isometry property and its implications for compressed sensing. Comptes e Rendus Mathematique, 346(9-10):589–592, 2008. E. J. Cand` s, J. K. Romberg, and T. Tao. Stable signal recovery from incomplete and inaccurate e measurements. Communications on Pure and Applied Mathematics, 59(8):1207–1223, 2006. V. Chandrasekaran, B. Recht, P. A. Parrilo, and A. S. Willsky. The convex geometry of linear inverse problems. Foundations of Computational Mathematics, 12(6):805–849, Dec. 2012. A. Cohen, W. Dahmen, and R. DeVore. Compressed sensing and best k-term approximation. American Mathematical Society, 22(1):211–231, 2009. 839 BAHMANI , R AJ AND B OUFOUNOS W. Dai and O. Milenkovic. Subspace pursuit for compressive sensing signal reconstruction. IEEE Transactions on Information Theory, 55(5):2230–2249, 2009. A. J. Dobson and A. Barnett. An Introduction to Generalized Linear Models. Chapman and Hall/CRC, 3rd edition, May 2008. D. L. Donoho. Compressed sensing. IEEE Transactions on Information Theory, 52(4):1289–1306, 2006. D. L. Donoho and M. Elad. Optimally sparse representation in general (nonorthogonal) dictionaries via ℓ1 minimization. Proceedings of the National Academy of Sciences, 100(5):2197–2202, 2003. D. L. Donoho and X. Huo. Uncertainty principles and ideal atomic decomposition. IEEE Transactions on Information Theory, 47(7):2845–2862, 2001. S. Foucart. Sparse recovery algorithms: sufficient conditions in terms of restricted isometry constants. In Approximation Theory XIII: San Antonio 2010, volume 13 of Springer Proceedings in Mathematics, pages 65–77. Springer New York, 2012. J. H. Friedman, T. Hastie, and R. Tibshirani. Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1):1–22, 2 2010. ISSN 1548-7660. Software available online at http://www-stat.stanford.edu/∼tibs/glmnet-matlab/. I. Guyon, S. Gunn, A. Ben-Hur, and G. Dror. Result analysis of the NIPS 2003 feature selection challenge. In L. K. Saul, Y. Weiss, and L. Bottou, editors, Advances in Neural Information Processing Systems 17, pages 545–552. MIT-Press, Cambridge, MA, 2005. J. D. Hamilton. Time Series Analysis. Princeton University Press, Princeton, NJ, 1994. T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Verlag, 2009. A. Jalali, C. C. Johnson, and P. K. Ravikumar. On learning discrete graphical models using greedy methods. In J. Shawe-Taylor, R. S. Zemel, P. Bartlett, F. C. N. Pereira, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 24, pages 1935–1943. 2011. S. M. Kakade, O. Shamir, K. Sridharan, and A. Tewari. Learning exponential families in highdimensions: strong convexity and sparsity. In Proceedings of the 13th International Conference on Artificial Intelligence and Statistics, volume 9 of JMLR W&CP;, pages 381–388, 2010. J. Liu, J. Chen, and J. Ye. Large-scale sparse logistic regression. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’09, pages 547–556, New York, NY, USA, 2009. ACM. A. Lozano, G. Swirszcz, and N. Abe. Group orthogonal matching pursuit for logistic regression. In G. Gordon, D. Dunson, and M. Dudik, editors, Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, volume 15 of JMLR W&CP;, pages 452–460, 2011. B. K. Natarajan. Sparse approximate solutions to linear systems. SIAM Journal on Computing, 24 (2):227–234, 1995. 840 G REEDY S PARSITY-C ONSTRAINED O PTIMIZATION D. Needell and J. A. Tropp. CoSaMP: iterative signal recovery from incomplete and inaccurate samples. Applied and Computational Harmonic Analysis, 26(3):301–321, 2009. S. Negahban, P. Ravikumar, M. Wainwright, and B. Yu. A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers. In Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems 22, pages 1348–1356. 2009. long version available at arXiv:1010.2731v1 [math.ST]. Y. Nesterov. Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM Journal on Optimization, 22(2):341–362, Jan. 2012. Y. C. Pati, R. Rezaiifar, and P. S. Krishnaprasad. Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition. In Conference Record of the TwentySeventh Asilomar Conference on Signals, Systems and Computers, volume 1, pages 40–44, 1993. S. Shalev-Shwartz, N. Srebro, and T. Zhang. Trading accuracy for sparsity in optimization problems with sparsity constraints. SIAM Journal on Optimization, 20(6):2807–2832, 2010. A. Tewari, P. K. Ravikumar, and I. S. Dhillon. Greedy algorithms for structurally constrained high dimensional problems. In J. Shawe-Taylor, R. Zemel, P. Bartlett, F. Pereira, and K. Weinberger, editors, Advances in Neural Information Processing Systems 24, pages 882–890. 2011. J. A. Tropp. User-friendly tail bounds for sums of random matrices. Foundations of Computational Mathematics, 12(4):389–434, Aug. 2012. J. A. Tropp and A. C. Gilbert. Signal recovery from random measurements via orthogonal matching pursuit. IEEE Transactions on Information Theory, 53(12):4655–4666, 2007. S. A. van de Geer. Empirical Processes in M-estimation. Cambridge University Press, Cambridge, UK, 2000. S. A. van de Geer. High-dimensional generalized linear models and the Lasso. The Annals of Statistics, 36(2):614–645, 2008. V. Vapnik. Statistical Learning Theory. Wiley, New York, NY, 1998. T. Zhang. Sparse recovery with orthogonal matching pursuit under RIP. IEEE Transactions on Information Theory, 57(9):6215 –6221, Sept. 2011. 841