jmlr jmlr2012 jmlr2012-97 jmlr2012-97-reference knowledge-graph by maker-knowledge-mining

97 jmlr-2012-Regularization Techniques for Learning with Matrices

Source: pdf

Author: Sham M. Kakade, Shai Shalev-Shwartz, Ambuj Tewari

Abstract: There is growing body of learning problems for which it is natural to organize the parameters into a matrix. As a result, it becomes easy to impose sophisticated prior knowledge by appropriately regularizing the parameters under some matrix norm. This work describes and analyzes a systematic method for constructing such matrix-based regularization techniques. In particular, we focus on how the underlying statistical properties of a given problem can help us decide which regularization function is appropriate. Our methodology is based on a known duality phenomenon: a function is strongly convex with respect to some norm if and only if its conjugate function is strongly smooth with respect to the dual norm. This result has already been found to be a key component in deriving and analyzing several learning algorithms. We demonstrate the potential of this framework by deriving novel generalization and regret bounds for multi-task learning, multi-class learning, and multiple kernel learning. Keywords: regularization, strong convexity, regret bounds, generalization bounds, multi-task learning, multi-class learning, multiple kernel learning

reference text

A. Agarwal, A. Rakhlin, and P. Bartlett. Matrix regularization techniques for online multitask learning. Technical Report EECS-2008-138, EECS Department, University of California, Berkeley, 2008. Y. Amit, M. Fink, N. Srebro, and S. Ullman. Uncovering shared structures in multiclass classiﬁcation. In Proceedings of the 24th International Conference on Machine Learning, pages 17–24, 2007. A. Argyriou, T. Evgeniou, and M. Pontil. Convex multi-task feature learning. Machine Learning, 73(3):243–272, 2008. A. Argyriou, C. A. Micchelli, and M. Pontil. On spectral learning. Journal of Machine Learning Research, 11:935–953, 2010. F. Bach. Consistency of the group Lasso and multiple kernel learning. Journal of Machine Leaning Research, 9:1179–1225, 2008. K. Ball, E. A. Carlen, and E. H. Lieb. Sharp uniform convexity and smoothness inequalities for trace norms. Inventiones Mathematicae, 115:463–482, 1994. P. L. Bartlett and S. Mendelson. Rademacher and Gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research, 3:463–482, 2002. R. Bhatia. Matrix Analysis. Springer, 1997. J. Borwein and A. Lewis. Convex Analysis and Nonlinear Optimization. Springer, 2006. 1888 R EGULARIZATION T ECHNIQUES FOR L EARNING WITH M ATRICES G. Cavallanti, N. Cesa-Bianchi, and C. Gentile. Linear algorithms for online multitask classiﬁcation. In Proceedings of the Twenty-ﬁrst Annual Conference on Computational Learning Theory, pages 251–262, 2008. N. Cesa-Bianchi and G. Lugosi. Prediction, Learning, and Games. Cambridge University Press, 2006. K. Crammer and Y. Singer. On the learnability and design of output codes for multiclass problems. In Proceedings of the Thirteenth Annual Conference on Computational Learning Theory, pages 35–46, 2000. C. Gentile. The robustness of the p-norm algorithms. Machine Learning, 53(3):265–299, 2003. A. J. Grove, N. Littlestone, and D. Schuurmans. General convergence results for linear discriminant updates. Machine Learning, 43(3):173–210, 2001. A. Juditsky and A. Nemirovski. Large deviations of vector-valued martingales in 2-smooth normed spaces. submitted to Annals of Probability, 2008. preprint available at arxiv.org/abs/0809. 0813. S. M. Kakade, K. Sridharan, and A. Tewari. On the complexity of linear prediction: Risk bounds, margin bounds, and regularization. In Advances in Neural Information Processing Systems 22, pages 793–800, 2008. J. Kivinen and M. Warmuth. Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132(1):1–64, January 1997. J. Kivinen and M. Warmuth. Relative loss bounds for multidimensional regression problems. Machine Learning, 45(3):301–329, July 2001. G.R.G. Lanckriet, N. Cristianini, P.L. Bartlett, L. El Ghaoui, and M.I. Jordan. Learning the kernel matrix with semideﬁnite programming. Journal of Machine Learning Research, 5:27–72, 2004. A. S. Lewis. The convex analysis of unitarily invariant matrix functions. Journal of Convex Analysis, 2(2):173–183, 1995. N. Littlestone. Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Machine Learning, 2:285–318, 1988. K. Lounici, M. Pontil, A. B Tsybakov, and S. van de Geer. Taking advantage of sparsity in multi-task learning. In Proceedings of the Twenty-second Annual Conference on Computational Learning Theory, 2009. A. Maurer. Bounds for linear multi-task learning. Journal of Machine Learning Research, 7:117– 139, 2006. R. Meir and T. Zhang. Generalization error bounds for Bayesian mixture algorithms. Journal of Machine Learning Research, 4:839–860, 2003. A.Y. Ng. Feature selection, L1 vs. L2 regularization, and rotational invariance. In Proceedings of the Twenty-First International Conference on Machine Learning, pages 78–85, 2004. 1889 K AKADE , S HALEV-S HWARTZ AND T EWARI G. Obozinski, B. Taskar, and M Jordan. Joint covariate selection and joint subspace selection for multiple classiﬁcation problems. Statistics and Computing, 20:231–252, 2010. I. Pinelis. Optimum bounds for the distributions of martingales in Banach spaces. Annals of Probability, 22(4):1679–1706, 1994. G. Pisier. Martingales with values in uniformly convex spaces. Israel Journal of Mathematics, 20 (3–4):326–350, 1975. R.T. Rockafellar. Convex Analysis. Princeton University Press, 1970. S. Shalev-Shwartz. Online Learning: Theory, Algorithms, and Applications. PhD thesis, Hebrew University of Jerusalem, 2007. S. Shalev-Shwartz and Y. Singer. Convex repeated games and Fenchel duality. In Advances in Neural Information Processing Systems 20, pages 1265–1272, 2006. S. Shalev-Shwartz and Y. Singer. A primal-dual perspective of online learning algorithms. Machine Learning Journal, 69(2-3):115–142, 2007. S. Shalev-Shwartz and Y. Singer. On the equivalence of weak learnability and linear separability: New relaxations and efﬁcient boosting algorithms. In Proceedings of the Twenty-ﬁrst Annual Conference on Computational Learning Theory, pages 311–322, 2008. N. Srebro and S. Ben-David. Learning bounds for support vector machines with learned kernels. In Proceedings of the Nineteenth Annual Conference on Computational Learning Theory, pages 169–183, 2006. N. Srebro, J. Rennie, and T. Jaakkola. Maximum margin matrix factorization. In Advances in Neural Information Prcoessing Systems 17, pages 1329–1336, 2005. K. Sridharan and A. Tewari. Convex games in Banach spaces. In Proceedings of the 23rd Annual Conference on Learning Theory, pages 1–13. Omnipress, 2010. M. Warmuth and D. Kuzmin. Online variance minimization. In Proceedings of the Nineteenth Annual Conference on Computational Learning Theory, pages 514–528, 2006. X. Yang, S. Kim, and E. P. Xing. Heterogeneous multitask learning with joint sparsity constraints. In Advances in Neural Information Processing Systems 23, pages 2151–2159, 2009. Y. Ying and C. Campbell. Generalization bounds for learning the kernel. In Proceedings of the Twenty-second Annual Conference on Computational Learning Theory, 2009. M. Yuan and Y. Lin. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B, 68(1):49–67, 2006. C. Zalinescu. Convex Analysis in General Vector Spaces. World Scientiﬁc Publishing Co. Inc., River Edge, NJ, 2002. 1890