jmlr jmlr2011 jmlr2011-98 jmlr2011-98-reference knowledge-graph by maker-knowledge-mining

98 jmlr-2011-Universality, Characteristic Kernels and RKHS Embedding of Measures


Source: pdf

Author: Bharath K. Sriperumbudur, Kenji Fukumizu, Gert R.G. Lanckriet

Abstract: Over the last few years, two different notions of positive definite (pd) kernels—universal and characteristic—have been developing in parallel in machine learning: universal kernels are proposed in the context of achieving the Bayes risk by kernel-based classification/regression algorithms while characteristic kernels are introduced in the context of distinguishing probability measures by embedding them into a reproducing kernel Hilbert space (RKHS). However, the relation between these two notions is not well understood. The main contribution of this paper is to clarify the relation between universal and characteristic kernels by presenting a unifying study relating them to RKHS embedding of measures, in addition to clarifying their relation to other common notions of strictly pd, conditionally strictly pd and integrally strictly pd kernels. For radial kernels on Rd , all these notions are shown to be equivalent. Keywords: kernel methods, characteristic kernels, Hilbert space embeddings, universal kernels, strictly positive definite kernels, integrally strictly positive definite kernels, conditionally strictly positive definite kernels, translation invariant kernels, radial kernels, binary classification, homogeneity testing


reference text

N. Aronszajn. Theory of reproducing kernels. Trans. Amer. Math. Soc., 68:337–404, 1950. C. Berg, J. P. R. Christensen, and P. Ressel. Harmonic Analysis on Semigroups. Spring Verlag, New York, 1984. A. Caponnetto, M. Pontil, C. Micchelli, and Y. Ying. Universal multi-task kernels. Journal of Machine Learning Research, 9:1615–1646, 2008. C. Carmeli, E. De Vito, A. Toigo, and V. Umanit` . Vector valued reproducing kernel Hilbert spaces a and universality. Analysis and Applications, 8:19–61, 2010. W. Dahmen and C. A. Micchelli. Some remarks on ridge functions. Approx. Theory Appl., 3: 139–143, 1987. R. M. Dudley. Real Analysis and Probability. Cambridge University Press, Cambridge, UK, 2002. N. Dunford and J. T. Schwartz. Linear Operators. I: General Theory. Wiley-Interscience, New York, 1958. T. Evgeniou, M. Pontil, and T. Poggio. Regularization networks and support vector machines. Advances in Computational Mathematics, 13(1):1–50, 2000. G. B. Folland. Real Analysis: Modern Techniques and Their Applications. Wiley-Interscience, New York, 1999. K. Fukumizu, F. Bach, and M. Jordan. Dimensionality reduction for supervised learning with reproducing kernel Hilbert spaces. Journal of Machine Learning Research, 5:73–99, 2004. 2408 U NIVERSALITY, C HARACTERISTIC K ERNELS AND RKHS E MBEDDING OF M EASURES K. Fukumizu, A. Gretton, X. Sun, and B. Sch¨ lkopf. Kernel measures of conditional dependence. o In J.C. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 20, pages 489–496, Cambridge, MA, 2008. MIT Press. K. Fukumizu, B. K. Sriperumbudur, A. Gretton, and B. Sch¨ lkopf. Characteristic kernels on groups o and semigroups. In D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, editors, Advances in Neural Information Processing Systems 21, pages 473–480, 2009. A. Gretton, K. M. Borgwardt, M. Rasch, B. Sch¨ lkopf, and A. Smola. A kernel method for the o two sample problem. In B. Sch¨ lkopf, J. Platt, and T. Hoffman, editors, Advances in Neural o Information Processing Systems 19, pages 513–520. MIT Press, 2007. A. Gretton, K. Fukumizu, C.-H. Teo, L. Song, B. Sch¨ lkopf, and A. Smola. A kernel statistical o test of independence. In Advances in Neural Information Processing Systems 20, pages 585–592. MIT Press, 2008. E. Hewitt. Linear functionals on spaces of continuous functions. Fundamenta Mathematicae, 37: 161–189, 1950. G. S. Kimeldorf and G. Wahba. A correspondence between bayesian estimation on stochastic processes and smoothing by splines. Annals of Mathematical Statistics, 41(2):495–502, 1970. V. A. Menegatto. Strictly positive definite kernels on the circle. Rocky Mountain Journal of Mathematics, 25(3):1149–1163, 1995. C. A. Micchelli, Y. Xu, and H. Zhang. Universal kernels. Journal of Machine Learning Research, 7:2651–2667, 2006. A. Pinkus. Strictly positive definite functions on a real inner product space. Adv. Comput. Math., 20:263–271, 2004. W. Rudin. Functional Analysis. McGraw-Hill, USA, 1991. B. Sch¨ lkopf and A. J. Smola. Learning with Kernels. MIT Press, Cambridge, MA, 2002. o B. Sch¨ lkopf, R. Herbrich, and A. J. Smola. A generalized representer theorem. In Proc. of the 14th o Annual Conference on Learning Theory, pages 416–426, 2001. J. Shawe-Taylor and N. Cristianini. Kernel Methods for Pattern Analysis. Cambridge University Press, UK, 2004. B. K. Sriperumbudur, A. Gretton, K. Fukumizu, G. R. G. Lanckriet, and B. Sch¨ lkopf. Injective o Hilbert space embeddings of probability measures. In R. Servedio and T. Zhang, editors, Proc. of the 21st Annual Conference on Learning Theory, pages 111–122, 2008. B. K. Sriperumbudur, K. Fukumizu, A. Gretton, G. R. G. Lanckriet, and B. Sch¨ lkopf. Kernel choice o and classifiability for RKHS embeddings of probability distributions. In Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems 22, pages 1750–1758. MIT Press, 2009. 2409 S RIPERUMBUDUR , F UKUMIZU AND L ANCKRIET B. K. Sriperumbudur, K. Fukumizu, and G. R. G. Lanckriet. On the relation between universality, characteristic kernels and RKHS embedding of measures. In JMLR Workshop and Conference Proceedings, volume 9, pages 781–788. AISTATS, 2010a. B. K. Sriperumbudur, A. Gretton, K. Fukumizu, B. Sch¨ lkopf, and G. R. G. Lanckriet. Hilbert o space embeddings and metrics on probability measures. Journal of Machine Learning Research, 11:1517–1561, 2010b. I. Steinwart. On the influence of the kernel on the consistency of support vector machines. Journal of Machine Learning Research, 2:67–93, 2001. I. Steinwart and A. Christmann. Support Vector Machines. Springer, 2008. J. Stewart. Positive definite functions and generalizations, an historical survey. Rocky Mountain Journal of Mathematics, 6(3):409–433, 1976. Ch. Suquet. Reproducing kernel Hilbert spaces and random measures. In H. G. W. Begehr and F. Nicolosi, editors, Proc. of the 5th International ISAAC Congress, Catania, Italy, 25-30 July 2005, pages 143–152. World Scientific, 2009. H. Wendland. Scattered Data Approximation. Cambridge University Press, Cambridge, UK, 2005. 2410