nips nips2010 nips2010-181 nips2010-181-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Julien Mairal, Rodolphe Jenatton, Francis R. Bach, Guillaume R. Obozinski
Abstract: We consider a class of learning problems that involve a structured sparsityinducing norm defined as the sum of ℓ∞ -norms over groups of variables. Whereas a lot of effort has been put in developing fast optimization methods when the groups are disjoint or embedded in a specific hierarchical structure, we address here the case of general overlapping groups. To this end, we show that the corresponding optimization problem is related to network flow optimization. More precisely, the proximal problem associated with the norm we consider is dual to a quadratic min-cost flow problem. We propose an efficient procedure which computes its solution exactly in polynomial time. Our algorithm scales up to millions of variables, and opens up a whole new range of applications for structured sparse models. We present several experiments on image and video data, demonstrating the applicability and scalability of our approach for various problems. 1
[1] P. Bickel, Y. Ritov, and A. Tsybakov. Simultaneous analysis of Lasso and Dantzig selector. Ann. Stat., 37(4):1705–1732, 2009.
[2] B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani. Least angle regression. Ann. Stat., 32(2):407–499, 2004.
[3] Y. Nesterov. Gradient methods for minimizing composite objective function. Technical report, Center for Operations Research and Econometrics (CORE), Catholic University of Louvain, 2007.
[4] A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imag. Sci., 2(1):183–202, 2009.
[5] R. Jenatton, J-Y. Audibert, and F. Bach. Structured variable selection with sparsity-inducing norms. Technical report, 2009. Preprint arXiv:0904.3523v1.
[6] L. Jacob, G. Obozinski, and J.-P. Vert. Group Lasso with overlap and graph Lasso. In Proc. ICML, 2009.
[7] P. Zhao, G. Rocha, and B. Yu. The composite absolute penalties family for grouped and hierarchical variable selection. Ann. Stat., 37(6A):3468–3497, 2009.
[8] R. G. Baraniuk, V. Cevher, M. Duarte, and C. Hegde. Model-based compressive sensing. IEEE T. Inform. Theory, 2010. to appear.
[9] V. Cehver, M.F. Duarte, C. Hedge, and R.G. Baraniuk. Sparse signal recovery using markov random fields. In Adv. NIPS, 2008.
[10] J. Huang, T. Zhang, and D. Metaxas. Learning with structured sparsity. In Proc. ICML, 2009.
[11] S. Kim and E. P. Xing. Tree-guided group lasso for multi-task regression with structured sparsity. In Proc. ICML, 2010.
[12] R. Jenatton, J. Mairal, G. Obozinski, and F. Bach. Proximal methods for sparse hierarchical dictionary learning. In Proc. ICML, 2010.
[13] M. Yuan and Y. Lin. Model selection and estimation in regression with grouped variables. J. Roy. Stat. Soc. B, 68:49–67, 2006.
[14] G. Obozinski, B. Taskar, and M. I. Jordan. Joint covariate selection and joint subspace selection for multiple classification problems. Stat. Comput., 20(2):231–252, 2010.
[15] F. Bach. Exploring large feature spaces with hierarchical multiple kernel learning. In Adv. NIPS, 2008.
[16] P. L. Combettes and J.-C. Pesquet. Proximal splitting methods in signal processing. In Fixed-Point Algorithms for Inverse Problems in Science and Engineering. Springer, 2010.
[17] J. Mairal, R. Jenatton, G. Obozinski, and F. Bach. Network flow algorithms for structured sparsity. Technical report, 2010. Preprint arXiv:1008.5209v1.
[18] D. S. Hochbaum and S. P. Hong. About strongly polynomial time algorithms for quadratic optimization over submodular constraints. Math. Program., 69(1):269–309, 1995.
[19] P. Brucker. An O(n) algorithm for quadratic knapsack problems. Oper. Res. Lett., 3:163–166, 1984.
[20] G. Gallo, M. E. Grigoriadis, and R. E. Tarjan. A fast parametric maximum flow algorithm and applications. SIAM J. Comput., 18:30–55, 1989.
[21] M. Babenko and A.V. Goldberg. Experimental evaluation of a parametric flow algorithm. Technical report, Microsoft Research, 2006. MSR-TR-2006-77.
[22] H. Groenevelt. Two algorithms for maximizing a separable concave function over a polymatroid feasible region. Eur. J. Oper. Res., pages 227–236, 1991.
[23] F. Bach. Structured sparsity-inducing norms through submodular functions. In Adv. NIPS, 2010.
[24] A. V. Goldberg and R. E. Tarjan. A new approach to the maximum flow problem. In Proc. of ACM Symposium on Theory of Computing, 1986.
[25] L. R. Ford and D. R. Fulkerson. Maximal flow through a network. Canadian J. Math., 8(3), 1956.
[26] B. V. Cherkassky and A. V. Goldberg. On implementing the pushrelabel method for the maximum flow problem. Algorithmica, 19(4):390–410, 1997.
[27] S. Negahban, P. Ravikumar, M. J. Wainwright, and B. Yu. A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers. In Adv. NIPS, 2009.
[28] J. M. Borwein and A. S. Lewis. Convex analysis and nonlinear optimization. Springer, 2006.
[29] J. Wright, A. Y. Yang, A. Ganesh, S. S. Sastry, and Y. Ma. Robust face recognition via sparse representation. IEEE T. Pattern. Anal., pages 210–227, 2008.
[30] P. Sprechmann, I. Ramirez, G. Sapiro, and Y. C. Eldar. Collaborative hierarchical sparse modeling. Technical report, 2010. Preprint arXiv:1003.0400v1. 9