nips nips2010 nips2010-170 nips2010-170-reference knowledge-graph by maker-knowledge-mining

170 nips-2010-Moreau-Yosida Regularization for Grouped Tree Structure Learning


Source: pdf

Author: Jun Liu, Jieping Ye

Abstract: We consider the tree structured group Lasso where the structure over the features can be represented as a tree with leaf nodes as features and internal nodes as clusters of the features. The structured regularization with a pre-defined tree structure is based on a group-Lasso penalty, where one group is defined for each node in the tree. Such a regularization can help uncover the structured sparsity, which is desirable for applications with some meaningful tree structures on the features. However, the tree structured group Lasso is challenging to solve due to the complex regularization. In this paper, we develop an efficient algorithm for the tree structured group Lasso. One of the key steps in the proposed algorithm is to solve the Moreau-Yosida regularization associated with the grouped tree structure. The main technical contributions of this paper include (1) we show that the associated Moreau-Yosida regularization admits an analytical solution, and (2) we develop an efficient algorithm for determining the effective interval for the regularization parameter. Our experimental results on the AR and JAFFE face data sets demonstrate the efficiency and effectiveness of the proposed algorithm.


reference text

[1] F. Bach, G. Lanckriet, and M. Jordan. Multiple kernel learning, conic duality, and the SMO algorithm. In International conference on Machine learning, 2004.

[2] A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences, 2(1):183–202, 2009.

[3] I. Guyon, A. B. Hur, S. Gunn, and G. Dror. Result analysis of the nips 2003 feature selection challenge. In Neural Information Processing Systems, pages 545–552, 2004.

[4] E.T. Hale, W. Yin, and Y. Zhang. Fixed-point continuation for 1 -minimization: Methodology and convergence. SIAM Journal on Optimization, 19(3):1107–1130, 2008.

[5] J. Hiriart-Urruty and C. Lemar´ chal. Convex Analysis and Minimization Algorithms I & II. Springer e Verlag, Berlin, 1993.

[6] L. Jacob, G. Obozinski, and J. Vert. Group lasso with overlap and graph lasso. In International Conference on Machine Learning, 2009.

[7] R. Jenatton, J.-Y. Audibert, and F. Bach. Structured variable selection with sparsity-inducing norms. Technical report, arXiv:0904.3523v2, 2009.

[8] R. Jenatton, J. Mairal, G. Obozinski, and F. Bach. Proximal methods for sparse hierarchical dictionary learning. In International Conference on Machine Learning, 2010.

[9] S. Kim and E. P. Xing. Tree-guided group lasso for multi-task regression with structured sparsity. In International Conference on Machine Learning, 2010.

[10] C. Lemar´ chal and C. Sagastiz´ bal. Practical aspects of the Moreau-Yosida regularization I: Theoretical e a properties. SIAM Journal on Optimization, 7(2):367–385, 1997.

[11] J. Liu, S. Ji, and J. Ye. Multi-task feature learning via efficient in Artificial Intelligence, 2009. 2,1 -norm minimization. In Uncertainty

[12] J. Liu, S. Ji, and J. Ye. SLEP: Sparse Learning with Efficient Projections. Arizona State University, 2009.

[13] J. Liu, L. Yuan, and J. Ye. An efficient algorithm for a class of fused lasso problems. In ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2010.

[14] M. J. Lyons, J. Budynek, and S. Akamatsu. Automatic classification of single facial images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 21(12):1357–1362, 1999.

[15] A.M. Martinez and R. Benavente. The AR face database. Technical report, 1998.

[16] L. Meier, S. Geer, and P. B¨ hlmann. The group lasso for logistic regression. Journal of the Royal u Statistical Society: Series B, 70:53–71, 2008.

[17] J.-J. Moreau. Proximit´ et dualit´ dans un espace hilbertien. Bulletin de la Societe mathematique de e e France, 93:273–299, 1965.

[18] Y. Nesterov. Introductory Lectures on Convex Optimization: A Basic Course. Kluwer Academic Publishers, 2004.

[19] Y. Nesterov. Gradient methods for minimizing composite objective function. CORE Discussion Paper, 2007.

[20] R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B, 58(1):267–288, 1996.

[21] K. Yosida. Functional Analysis. Springer Verlag, Berlin, 1964.

[22] M. Yuan and Y. Lin. Model selection and estimation in regression with grouped variables. Journal Of The Royal Statistical Society Series B, 68(1):49–67, 2006.

[23] P. Zhao, G. Rocha, and B. Yu. The composite absolute penalties family for grouped and hierarchical variable selection. Annals of Statistics, 37(6A):3468–3497, 2009. 9