nips nips2010 nips2010-258 nips2010-258-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Francis R. Bach
Abstract: Sparse methods for supervised learning aim at finding good linear predictors from as few variables as possible, i.e., with small cardinality of their supports. This combinatorial selection problem is often turned into a convex optimization problem by replacing the cardinality function by its convex envelope (tightest convex lower bound), in this case the ℓ1 -norm. In this paper, we investigate more general set-functions than the cardinality, that may incorporate prior knowledge or structural constraints which are common in many applications: namely, we show that for nondecreasing submodular set-functions, the corresponding convex envelope can be obtained from its Lov´ sz extension, a common tool in submodua lar analysis. This defines a family of polyhedral norms, for which we provide generic algorithmic tools (subgradients and proximal operators) and theoretical results (conditions for support recovery or high-dimensional inference). By selecting specific submodular functions, we can give a new interpretation to known norms, such as those based on rank-statistics or grouped norms with potentially overlapping groups; we also define new norms, in particular ones that can be used as non-factorial priors for supervised learning.
[1] P. Zhao, G. Rocha, and B. Yu. Grouped and hierarchical model selection through composite absolute penalties. Annals of Statistics, 37(6A):3468–3497, 2009.
[2] R. Jenatton, J.Y. Audibert, and F. Bach. Structured variable selection with sparsity-inducing norms. Technical report, arXiv:0904.3523, 2009.
[3] J. Huang, T. Zhang, and D. Metaxas. Learning with structured sparsity. In Proc. ICML, 2009.
[4] L. Jacob, G. Obozinski, and J.-P. Vert. Group Lasso with overlaps and graph Lasso. In Proc. ICML, 2009.
[5] S. Kim and E. Xing. Tree-guided group Lasso for multi-task regression with structured sparsity. In Proc. ICML, 2010.
[6] R. Jenatton, J. Mairal, G. Obozinski, and F. Bach. Proximal methods for sparse hierarchical dictionary learning. In Proc. ICML, 2010.
[7] J. Mairal, R. Jenatton, G. Obozinski, and F. Bach. Network flow algorithms for structured sparsity. In Adv. NIPS, 2010.
[8] J. Haupt and R. Nowak. Signal reconstruction from noisy random projections. IEEE Transactions on Information Theory, 52(9):4036–4048, 2006.
[9] F Bach. Convex analysis and optimization with submodular functions: a tutorial. Technical Report 00527714, HAL, 2010.
[10] A. Krause and C. Guestrin. Near-optimal nonmyopic value of information in graphical models. In Proc. UAI, 2005.
[11] Y. Kawahara, K. Nagano, K. Tsuda, and J.A. Bilmes. Submodularity cuts and applications. In Adv. NIPS, 2009.
[12] S. Fujishige. Submodular Functions and Optimization. Elsevier, 2005.
[13] J. Edmonds. Submodular functions, matroids, and certain polyhedra. In Combinatorial optimization - Eureka, you shrink!, pages 11–26. Springer, 2003.
[14] S. Negahban and M. J. Wainwright. Joint support recovery under high-dimensional scaling: Benefits and perils of ℓ1 -ℓ∞ -regularization. In Adv. NIPS, 2008.
[15] F. Bach. Structured sparsity-inducing norms through submodular functions. Technical Report 00511310, HAL, 2010.
[16] S. P. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004.
[17] R. Jenatton, G. Obozinski, and F. Bach. Structured sparse principal component analysis. In Proc. AISTATS, 2009.
[18] R. Tibshirani, M. Saunders, S. Rosset, J. Zhu, and K. Knight. Sparsity and smoothness via the fused Lasso. J. Roy. Stat. Soc. B, 67(1):91–108, 2005.
[19] R. A. Horn and C. R. Johnson. Matrix analysis. Cambridge Univ. Press, 1990.
[20] T. Ando. Concavity of certain maps on positive definite matrices and applications to hadamard products. Linear Algebra and its Applications, 26:203–241, 1979.
[21] C. L. Mallows. Some comments on Cp . Technometrics, 15(4):661–675, 1973.
[22] D. Wipf and S. Nagarajan. Sparse estimation using general likelihoods and non-factorial priors. In Adv. NIPS, 2009.
[23] A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences, 2(1):183–202, 2009.
[24] A. Chambolle and J. Darbon. On total variation minimization and surface evolution using parametric maximum flows. International Journal of Computer Vision, 84(3):288–307, 2009.
[25] S. Negahban, P. Ravikumar, M. J. Wainwright, and B. Yu. A unified framework for highdimensional analysis of M-estimators with decomposable regularizers. In Adv. NIPS, 2009.
[26] P. Zhao and B. Yu. On model selection consistency of Lasso. Journal of Machine Learning Research, 7:2541–2563, 2006.
[27] A. Krause and V. Cevher. Submodular dictionary selection for sparse representation. In Proc. ICML, 2010. 9