nips nips2013 nips2013-249 nips2013-249-reference knowledge-graph by maker-knowledge-mining

249 nips-2013-Polar Operators for Structured Sparse Estimation


Source: pdf

Author: Xinhua Zhang, Yao-Liang Yu, Dale Schuurmans

Abstract: Structured sparse estimation has become an important technique in many areas of data analysis. Unfortunately, these estimators normally create computational difficulties that entail sophisticated algorithms. Our first contribution is to uncover a rich class of structured sparse regularizers whose polar operator can be evaluated efficiently. With such an operator, a simple conditional gradient method can then be developed that, when combined with smoothing and local optimization, significantly reduces training time vs. the state of the art. We also demonstrate a new reduction of polar to proximal maps that enables more efficient latent fused lasso. 1


reference text

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32] P. B¨ hlmann and S. van de Geer. Statistics for High-Dimensional Data. Springer, 2011. u Y. Eldar and G. Kutyniok, editors. Compressed Sensing: Theory and Applications. Cambridge, 2012. J. Huang, T. Zhang, and D. Metaxas. Learning with structured sparsity. JMLR, 12:3371–3412, 2011. S. Kim and E. Xing. Tree-guided group lasso for multi-task regression with structured sparsity. In ICML, 2010. R. Jenatton, J. Mairal, G. Obozinski, and F. Bach. Proximal methods for hierarchical sparse coding. JMLR, 12:2297–2334, 2011. G. Obozinski and F. Bach. Convex relaxation for combinatorial penalties. Technical Report HAL 00694765, 2012. P. Zhao, G. Rocha, and B. Yu. The composite absolute penalties family for grouped and hierarchical variable selection. Annals of Statistics, 37(6A):3468–3497, 2009. F. Bach, R. Jenatton, J. Mairal, and G. Obozinski. Optimization with sparsity-inducing penalties. Foundations and Trends in Machine Learning, 4(1):1–106, 2012. A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences, 2(1):183–202, 2009. Y. Nesterov. Gradient methods for minimizing composite functions. Mathematical Programming, 140: 125–161, 2013. J. Mairal, R. Jenatton, G. Obozinski, and F. Bach. Convex and network flow optimization for structured sparsity. JMLR, 12:2681–2720, 2011. J. Mairal and B. Yu. Supervised feature selection in graphs with path coding penalties and network flows. JMLR, 14:2449–2485, 2013. M. Dudik, Z. Harchaoui, and J. Malick. Lifted coordinate descent for learning with trace-norm regularizations. In AISTATS, 2012. X. Zhang, Y. Yu, and D. Schuurmans. Accelerated training for matrix-norm regularization: A boosting approach. In NIPS, 2012. S. Laue. A hybrid algorithm for convex semidefinite optimization. In ICML, 2012. B. Mishra, G. Meyer, F. Bach, and R. Sepulchre. Low-rank optimization with trace norm penalty. Technical report, 2011. http://arxiv.org/abs/1112.2318. Y. Nesterov. Smooth minimization of non-smooth functions. Mathematical Programming, 103(1):127– 152, 2005. G. Nowak, T. Hastie, J. R. Pollack, and R. Tibshirani. A fused lasso latent feature model for analyzing multi-sample aCGH data. Biostatistics, 12(4):776–791, 2011. R. T. Rockafellar. Convex Analysis. Princeton University Press, 1970. V. Chandrasekaran, B. Recht, P. A. Parrilo, and A. S. Willsky. The convex geometry of linear inverse problems. Foundations of Computational Mathematics, 12(6):805–849, 2012. F. Bach. Convex analysis and optimization with submodular functions: a tutorial. Technical Report HAL 00527714, 2010. W. Dinkelbach. On nonlinear fractional programming. Management Science, 13(7), 1967. A. Schrijver. Theory of Linear and Integer Programming. John Wiley & Sons, 1st edition, 1986. R. Tibshirani, M. Saunders, S. Rosset, J. Zhu, and K. Knight. Sparsity and smoothness via the fused lasso. Journal of the Royal Statistical Society: Series B, 67:91–108, 2005. J. Friedman, T. Hastie, H. H¨ fling, and R. Tibshirani. Pathwise coordinate optimization. The Annals of o Applied Statistics, 1(2):302–332, 2007. J. Liu, L. Yuan, and J. Ye. An efficient algorithm for a class of fused lasso problems. In Conference on Knowledge Discovery and Data Mining, 2010. M. Mahoney and P. Drineas. CUR matrix decompositions for improved data analysis. Proceedings of the National Academy of Sciences, 106(3):697–702, 2009. URL http://www.gems-system.or. URL http://spams-devel.gforge.inria.fr. URL http://drwn.anu.edu.au/index.html. M. Van De Vijver et al. A gene-expression signature as a predictor of survival in breast cancer. The New England Journal of Medicine, 347(25):1999–2009, 2002. H. Chuang, E. Lee, Y. Liu, D. Lee, and T. Ideker. Network-based classification of breast cancer metastasis. Molecular Systems Biology, 3(140), 2007. 9