nips nips2011 nips2011-251 nips2011-251-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Francis R. Bach
Abstract: We consider a class of sparsity-inducing regularization terms based on submodular functions. While previous work has focused on non-decreasing functions, we explore symmetric submodular functions and their Lov´ sz extensions. We show that the Lov´ sz a a extension may be seen as the convex envelope of a function that depends on level sets (i.e., the set of indices whose corresponding components of the underlying predictor are greater than a given constant): this leads to a class of convex structured regularization terms that impose prior knowledge on the level sets, and not only on the supports of the underlying predictors. We provide unified optimization algorithms, such as proximal operators, and theoretical guarantees (allowed level sets and recovery conditions). By selecting specific submodular functions, we give a new interpretation to known norms, such as the total variation; we also define new norms, in particular ones that are based on order statistics with application to clustering and outlier detection, and on noisy cuts in graphs with application to change point detection in the presence of outliers.
[1] P. Zhao and B. Yu. On model selection consistency of Lasso. Journal of Machine Learning Research, 7:2541–2563, 2006.
[2] S. Negahban, P. Ravikumar, M. J. Wainwright, and B. Yu. A unified framework for highdimensional analysis of M-estimators with decomposable regularizers. In Adv. NIPS, 2009.
[3] F. Bach. Structured sparsity-inducing norms through submodular functions. In Adv. NIPS, 2010.
[4] R. Tibshirani, M. Saunders, S. Rosset, J. Zhu, and K. Knight. Sparsity and smoothness via the fused Lasso. J. Roy. Stat. Soc. B, 67(1):91–108, 2005.
[5] A. Chambolle and J. Darbon. On total variation minimization and surface evolution using parametric maximum flows. International Journal of Computer Vision, 84(3):288–307, 2009.
[6] Y. Boykov, O. Veksler, and R. Zabih. Fast approximate energy minimization via graph cuts. IEEE Trans. PAMI, 23(11):1222–1239, 2001.
[7] Z. Harchaoui and C. L´ vy-Leduc. Catching change-points with Lasso. Adv. NIPS, 20, 2008. e
[8] J.-P. Vert and K. Bleakley. Fast detection of multiple change-points shared by many signals using group LARS. Adv. NIPS, 23, 2010.
[9] M. Kolar, L. Song, and E. Xing. Sparsistent learning of varying-coefficient models with structural changes. Adv. NIPS, 22, 2009.
[10] H. D. Bondell and B. J. Reich. Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with oscar. Biometrics, 64(1):115–123, 2008.
[11] F. Bach. Convex analysis and optimization with submodular functions: a tutorial. Technical Report 00527714, HAL, 2010.
[12] S. Fujishige. Submodular Functions and Optimization. Elsevier, 2005.
[13] R. T. Rockafellar. Convex Analysis. Princeton University Press, 1997.
[14] F. Bach. Shaping level sets with submodular functions. Technical Report 00542949-v2, HAL, 2011.
[15] T. Hocking, A. Joulin, F. Bach, and J.-P. Vert. Clusterpath: an algorithm for clustering using convex fusion penalties. In Proc. ICML, 2011.
[16] A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences, 2(1):183–202, 2009.
[17] F. Bach, R. Jenatton, J. Mairal, and G. Obozinski. Optimization with sparsity-inducing penalties. Technical Report 00613125, HAL, 2011.
[18] H. Groenevelt. Two algorithms for maximizing a separable concave function over a polymatroid feasible region. European Journal of Operational Research, 54(2):227–236, 1991.
[19] J. B. Orlin. A faster strongly polynomial time algorithm for submodular function minimization. Mathematical Programming, 118(2):237–251, 2009.
[20] M. Queyranne. Minimizing symmetric submodular functions. Mathematical Programming, 82(1):3–12, 1998.
[21] G. Gallo, M. D. Grigoriadis, and R. E. Tarjan. A fast parametric maximum flow algorithm and applications. SIAM Journal on Computing, 18(1):30–55, 1989.
[22] H. Hoefling. A path algorithm for the fused Lasso signal approximator. Technical Report 0910.0526v1, arXiv, 2009.
[23] J. Mairal, F. Bach, J. Ponce, and G. Sapiro. Online learning for matrix factorization and sparse coding. Journal of Machine Learning Research, 11:19–60, 2010.
[24] A. Rinaldo. Properties and refinements of the fused Lasso. Ann. Stat., 37(5):2922–2952, 2009.
[25] V. Duval, J.-F. Aujol, and Y. Gousseau. The TVL1 model: A geometric point of view. Multiscale Modeling and Simulation, 8(1):154–189, 2009.
[26] N. Srebro, J. D. M. Rennie, and T. S. Jaakkola. Maximum-margin matrix factorization. In Adv. NIPS 17, 2005.
[27] A. Argyriou, T. Evgeniou, and M. Pontil. Convex multi-task feature learning. Machine Learning, 73(3):243–272, 2008. 9