jmlr jmlr2010 jmlr2010-84 jmlr2010-84-reference knowledge-graph by maker-knowledge-mining

84 jmlr-2010-On Spectral Learning

Source: pdf

Author: Andreas Argyriou, Charles A. Micchelli, Massimiliano Pontil

Abstract: In this paper, we study the problem of learning a matrix W from a set of linear measurements. Our formulation consists in solving an optimization problem which involves regularization with a spectral penalty term. That is, the penalty term is a function of the spectrum of the covariance of W . Instances of this problem in machine learning include multi-task learning, collaborative ﬁltering and multi-view learning, among others. Our goal is to elucidate the form of the optimal solution of spectral learning. The theory of spectral learning relies on the von Neumann characterization of orthogonally invariant norms and their association with symmetric gauge functions. Using this tool we formulate a representer theorem for spectral regularization and specify it to several useful example, such as Schatten p−norms, trace norm and spectral norm, which should proved useful in applications. Keywords: kernel methods, matrix learning, minimal norm interpolation, multi-task learning, orthogonally invariant norms, regularization

reference text

J. Abernethy, F. Bach, T. Evgeniou, and J.-P. Vert. A new approach to collaborative ﬁltering: operator estimation with spectral regularization. Journal of Machine Learning Research, 10:803–826, 2009. Y. Amit and M. Fink and N. Srebro and S. Ullman. Uncovering shared structures in multiclass classiﬁcation, In Proceedings of the Twenty-Fourth International Conference on Machine Learning, 2007. A. Argyriou, T. Evgeniou, and M. Pontil. Convex multi-task feature learning. Machine Learning, 73(3):243–272, 2008. A. Argyriou, C. A. Micchelli, M. Pontil, and Y. Ying. A spectral regularization framework for multi-task structure learning. In J.C. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 20. MIT Press, 2007b. A. Argyriou, C.A. Micchelli, and M. Pontil. When is there a representer theorem? Vector versus matrix regularizers. Journal of Machine Learning Research, 10:2507-2529, 2009. R. Bhatia. Matrix Analysis. Graduate Texts in Mathematics. Springer, 1997. J. M. Borwein and A. S. Lewis. Convex Analysis and Nonlinear Optimization: Theory and Examples. CMS Books in Mathematics. Springer, 2005. E. J. Cand` s and B. Recht. Exact matrix completion via convex optimization. Foundations of e Computational Mathematics, 9:717-772, 2008. G. Cavallanti, N. Cesa-Bianchi, and C. Gentile. Linear algorithms for online multitask classiﬁcation. In Proceedings of the 21st Annual Conference on Learning Theory (COLT), 2008. T. Evgeniou, C. A. Micchelli, and M. Pontil. Learning multiple tasks with kernel methods. Journal of Machine Learning Research, 6:615–637, 2005. M. Fazel, H. Hindi, and S. P. Boyd. A rank minimization heuristic with application to minimum order system approximation. In Proceedings, American Control Conference, volume 6, pages 4734–4739, 2001. 952 O N S PECTRAL L EARNING G. H. Hardy, J. E. Littlewood, and G. P´ lya. Inequalities. Cambridge University Press, 1988. o R. A. Horn and C. R. Johnson. Matrix Analysis. Cambridge University Press, 1985. R. A. Horn and C. R. Johnson. Topics in Matrix Analysis. Cambridge University Press, 1991. A. J. Izenman. Reduced-rank regression for the multivariate linear model. Journal of Multivariate Analysis, 5:248–264, 1975. A. S. Lewis. The convex analysis of unitarily invariant matrix functions. Journal of Convex Analysis, 2(1/2):173–183, 1995. A. Maurer. Bounds for linear multi-task learning. Journal of Machine Learning Research, 7:117– 139, 2006a. A. Maurer. Learning similarity with operator-valued large-margin classiﬁer. Journal of Machine Learning Research, 9:1049–1082, 2008a. C. A. Micchelli and A. Pinkus. Variational problems arising from balancing several error criteria. Rendiconti di Matematica, Serie VII, 14:37–86, 1994. B. Recht, M. Fazel, and P. A. Parrilo. Guaranteed minimum rank solutions to linear matrix equations via nuclear norm minimization. Preprint, 2008. R. T. Rockafellar. Convex Analysis. Princeton University Press, 1970. B. Sch¨ lkopf, R. Herbrich, and A.J. Smola. A generalized representer theorem. In Proceedings of o the Fourteenth Annual Conference on Computational Learning Theory, 2001. N. Srebro, J. D. M. Rennie, and T. S. Jaakkola. Maximum-margin matrix factorization. In Advances in Neural Information Processing Systems 17, pages 1329–1336. MIT Press, 2005. J. von Neumann. Some matrix inequalities and metrization of matric-space. In Tomsk University Review, 1:286-300, 1937, volume IV of Collected Works, pages 205–218. Pergamon, Oxford, 1962. G. Wahba. Spline Models for Observational Data, volume 59 of Series in Applied Mathematics. SIAM, Philadelphia, 1990. M. Yuan, A. Ekici, Z. Lu, and R. Monteiro. Dimension reduction and coefﬁcient estimation in multivariate linear regression. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 69(3):329–346, 2007. G. Zames. Feedback and optimal sensitivity: Model reference transformations, multiplicative seminorms, and approximate inverses. IEEE Transactions on Automatic Control, 26(2):301–320, 1981. 953