jmlr jmlr2008 jmlr2008-92 jmlr2008-92-reference knowledge-graph by maker-knowledge-mining

92 jmlr-2008-Universal Multi-Task Kernels

Source: pdf

Author: Andrea Caponnetto, Charles A. Micchelli, Massimiliano Pontil, Yiming Ying

Abstract: In this paper we are concerned with reproducing kernel Hilbert spaces HK of functions from an input space into a Hilbert space Y , an environment appropriate for multi-task learning. The reproducing kernel K associated to HK has its values as operators on Y . Our primary goal here is to derive conditions which ensure that the kernel K is universal. This means that on every compact subset of the input space, every continuous function with values in Y can be uniformly approximated by sections of the kernel. We provide various characterizations of universal kernels and highlight them with several concrete examples of some practical importance. Our analysis uses basic principles of functional analysis and especially the useful notion of vector measures which we describe in sufﬁcient detail to clarify our results. Keywords: multi-task learning, multi-task kernels, universal approximation, vector-valued reproducing kernel Hilbert spaces

reference text

L. Amodei. Reproducing kernels of vector–valued function spaces. In Proc. of Chamonix, A. Le Meehaute et al. Eds., pages 1–9, 1997. N. Aronszajn. Theory of reproducing kernels. Trans. Amer. Math. Soc. 68:337–404, 1950. S. K. Berberian. Notes on Spectral Theory. New York: Van Nostrand, 1966. J. Burbea and P. Masani. Banach and Hilbert Spaces of Vector-Valued Functions. Pitman Research Notes in Mathematics Series, 90, 1984. A. Caponnetto and E. De Vito. Optimal rates for regularized least-squares algorithm. Foundations of Computational Mathematics, 7:331–368, 2007. C. Carmeli, E. De Vito, and A. Toigo. Vector valued reproducing kernel Hilbert spaces of integrable functions and Mercer theorem. Analysis and Applications, 4:377–408, 2006. F. Cucker and S. Smale. On the mathematical foundations of learning. Bull. Amer. Math. Soc., 39:1– 49, 2001. D. R. Chen, Q. Wu, Y. Ying, and D.X. Zhou. Support vector machine soft margin classiﬁers: error analysis. Journal of Machine Learning Research, 5:1143–1175, 2004. A. Devinatz. On measurable positive deﬁnite operator functions. J. Lonon Math. Soc., 35:417–424, 1960. N. Dinculeanu. Vector Measures. Pergamon, Berlin, 1967. J. Diestel and J. J. Uhl, Jr. Vector Measures. AMS, Providence (Math Surveys 15), 1977. T. Evgeniou, C. A. Micchelli and M. Pontil. Learning multiple tasks with kernel methods. J. Machine Learning Research, 6:615–637, 2005. G. B. Folland. Real Analysis: Modern Techniques and Their Applications. 2nd edition, New York, John Wiley & Sons, 1999. ¨ A. Gretton, K.M. Borgwardt, M. Rasch, B. Scholkopf and A.J. Smola. A kernel method for the ¨ two-sample problem. In Advances in Neural Information Processing Systems 19, B. Sch olkopf, J. Platt and T. Hoffman editors, pages 513–520, MIT Press, 2007. P. Lax. Functional Analysis., John Wiley & Sons, 2002. 1644 U NIVERSAL MULTI - TASK KERNELS S. Lowitzsch. A density theorem for matrix-valued radial basis functions. Numerical Algorithms, 39:253-256, 2005. C. A. Micchelli, Interpolation of scattered data: distances matrices and conditionally positive deﬁnite functions. Constructive Approximation, 2:11–22, 1986. C. A. Micchelli and M. Pontil. A function representation for learning in Banach spaces. In Proceedings of the 17th Annual Conference on Learning Theory (COLT’04), pages 255–269, 2004. C. A. Micchelli and M. Pontil. On leaning vector-valued functions. Neural Computation, 17:177204, 2005. C.A. Micchelli and M. Pontil. Feature space perspectives for learning the kernel. Machine Learning, 66:297–319, 2007. C. A. Micchelli, Y. Xu, and P. Ye. Cucker Smale learning theory in Besov spaces. NATO Science Series sub Series III Computer and System Science, 190:47–68, 2003. C. A. Micchelli, Y. Xu, and H. Zhang. Universal kernels. J. Machine Learning Research, 7:26512667, 2006. S. Mukherjee and D.X. Zhou. Learning coordinate covariances via gradients, J. of Machine Learning Research 7:519-549, 2006. T. Poggio, S. Mukherjee, R. Rifkin, A. Rakhlin, and A. Verri. b. In Uncertainty in Geometric Computations, J. Winkler and M. Niranjan (eds.), Kluwer, 131–141, 2002. C. E. Rasmussen and C. K. I. Williams. Gaussian Processes for Machine Learning, MIT Press, 2006. M. Reisert and H. Burkhardt. Learning equivariant functions with matrix valued kernels. J. Machine Learning Research, 8:385–408, 2007. B. Sch¨ lkopf and A. J. Smola. Learning with Kernels. The MIT Press, Cambridge, MA, USA, 2002. o J. Shawe-Taylor and N. Cristianini. Kernel Methods for Pattern Analysis. Cambridge University Press, 2004. E. Solak, R. Murray-Smith, W.E. Leithead, D.J. Leith and C.E. Rasmussen. Derivative observations in Gaussian Process models of dynamic Systems. In Advances in Neural Information Processing Systems 15, S. Becker, S. Thrun and K. Obermayer editors, pages 1033–1040, MIT Press, 2003. E. M. Stein. Singular Integrals and Differential Properties of Functions, Princeton University Press, Princeton, NJ, 1970. I. Steinwart. On the inﬂuence of the kernel on the consistency of support vector machines. J. Machine Learning Research, 2:67–93, 2001. I. Steinwart, D. Hush, and C. Scovel. Function classes that approximate the Bayes risk. In Proceeding of the 19th Annual Conference on Learning Theory, pages 79–93, 2006. 1645 C APONNETTO , M ICCHELLI , P ONTIL AND Y ING K. Yosida. Functional Analysis, 6th edition, Springer-Verlag, 1980. E. Vazquez and E. Walter. Multi-output support vector regression. In Proceedings of the 13th IFAC Symposium on System Identiﬁcation, 2003. D. X. Zhou. Density problem and approximation error in learning theory. Preprint, 2003. 1646