jmlr jmlr2012 jmlr2012-111 jmlr2012-111-reference knowledge-graph by maker-knowledge-mining

111 jmlr-2012-Structured Sparsity and Generalization


Source: pdf

Author: Andreas Maurer, Massimiliano Pontil

Abstract: We present a data dependent generalization bound for a large class of regularized algorithms which implement structured sparsity constraints. The bound can be applied to standard squared-norm regularization, the Lasso, the group Lasso, some versions of the group Lasso with overlapping groups, multiple kernel learning and other regularization schemes. In all these cases competitive results are obtained. A novel feature of our bound is that it can be applied in an infinite dimensional setting such as the Lasso in a separable Hilbert space or multiple kernel learning with a countable number of kernels. Keywords: empirical processes, Rademacher average, sparse estimation.


reference text

F.R. Bach, G.R.G. Lanckriet and M.I. Jordan. Multiple kernels learning, conic duality, and the SMO algorithm. In Proceedings of the Twenty-first International Conference on Machine Learning (ICML 2004), pages 6–13, 2004. L. Baldassarre, J. Morales, A. Argyriou, M. Pontil. A general framework for structured sparsity via proximal optimization. In Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics (AISTATS 2012), forthcoming. R. Baraniuk, V. Cevher, M.F. Duarte, C. Hedge. Model based compressed sensing. IEEE Trans. on Information Theory, 56:1982–2001, 2010. P.L. Bartlett and S. Mendelson. Rademacher and Gaussian Complexities: Risk Bounds and Structural Results. Journal of Machine Learning Research, 3:463–482, 2002. C. Cortes, M. Mohri, A. Rostamizadeh. Generalization bounds for learning kernels. In Proceedings of the Twenty-seventh International Conference on Machine Learning (ICML 2010, pages 247– 254, 2010. 688 S TRUCTURED S PARSITY AND G ENERALIZATION C. De Mol, E. De Vito, L. Rosasco. Elastic-net regularization in learning theory. Journal of Complexity, 25(2):201–230, 2009. W. Hoeffding, Probability inequalities for sums of bounded random variables, Journal of the American Statistical Association, 58:13–30, 1963. J. Huang, T. Zhang, D. Metaxa. Learning with structured sparsity. In Proceedings of the Twentysixth International Conference on Machine Learning (ICML 2009), pages 417–424, 2009. L. Jacob, G. Obozinski, J.-P. Vert. Group Lasso with overlap and graph Lasso. In Proceedings of the Twenty-sixth International Conference on Machine Learning (ICML 2009), pages 433–440, 2009. R. Jenatton, J.-Y. Audibert, F.R. Bach. Structured variable selection with sparsity inducing norms. Journal of Machine Learning Research, 12:2777-2824, 2011. S.M. Kakade, S. Shalev-Shwartz, A. Tewari. Regularization techniques for learning with matrices. ArXiv preprint arXiv0910.0610, 2010. M. Kloft, U. Brefeld, S. Sonnenburg, A. Zien. ℓ p -norm multiple kernel learning. Journal of Machine Learning Research, 12:953–997, 2011. V. Koltchinskii and D. Panchenko, Empirical margin distributions and bounding the generalization error of combined classifiers, Annals of Statistics, 30(1):1–50, 2002. G.R.G. Lanckriet, N. Cristianini, P.L. Bartlett, L. El Ghaoui, M.I. Jordan. Learning the kernel matrix with semi-definite programming. Journal of Machine Learning Research, 5:27–72, 2004. M. Ledoux, M. Talagrand. Probability in Banach Spaces, Springer, 1991. K. Lounici, M. Pontil, A.B. Tsybakov and S. van de Geer. Oracle inequalities and optimal inference under group sparsity. Annals of Statistics, 39(4):2164–2204, 2011. C. McDiarmid. Concentration. In Probabilistic Methods of Algorithmic Discrete Mathematics”, pages 195–248, Springer, 1998. L. Meier, S.A. van de Geer, and P. B¨ hlmann. High-dimensional additive modeling. Annals of Statisu tics, 37(6B):3779–3821, 2009. R. Meir and T. Zhang. Generalization error bounds for Bayesian mixture algorithms. Journal of Machine Learning Research, 4:839–860, 2003. C.A. Micchelli, J.M. Morales, M. Pontil. A family of penalty functions for structured sparsity. In Advances in Neural Information Processing Systems 23, pages 1612–1623, 2010. C.A. Micchelli, J.M. Morales, M. Pontil. Regularizers for structured sparsity. Advances in Computational Mathematics, forthcoming. C.A. Micchelli and M. Pontil. Feature space perspectives for learning the kernel. Machine Learning, 66:297–319, 2007. 689 M AURER AND P ONTIL J. Shawe-Taylor and N. Cristianini. Kernel Methods for Pattern Analysis, Cambridge University Press, 2004. T. Shimamura, S. Imoto, R. Yamaguchi and S. Miyano. Weighted Lasso in graphical Gaussian modeling for large gene network estimation based on microarray data. Genome Informatics, 19:142– 153, 2007. Y. Ying and C. Campbell. Generalization bounds for learning the kernel problem. In Proceedings of the 23rd Conference on Learning Theory (COLT 2009), pages 407–416, 2009. M. Yuan and Y. Lin. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society, Series B (Statistical Methodology), 68(1):49–67, 2006. P. Zhao and G. Rocha and B. Yu. Grouped and hierarchical model selection through composite absolute penalties. Annals of Statistics, 37(6A):3468–3497, 2009. 690