jmlr jmlr2005 jmlr2005-64 jmlr2005-64-reference knowledge-graph by maker-knowledge-mining

64 jmlr-2005-Semigroup Kernels on Measures


Source: pdf

Author: Marco Cuturi, Kenji Fukumizu, Jean-Philippe Vert

Abstract: We present a family of positive definite kernels on measures, characterized by the fact that the value of the kernel between two measures is a function of their sum. These kernels can be used to derive kernels on structured objects, such as images and texts, by representing these objects as sets of components, such as pixels or words, or more generally as measures on the space of components. Several kernels studied in this work make use of common quantities defined on measures such as entropy or generalized variance to detect similarities. Given an a priori kernel on the space of components itself, the approach is further extended by restating the previous results in a more efficient and flexible framework using the “kernel trick”. Finally, a constructive approach to such positive definite kernels through an integral representation theorem is proved, before presenting experimental results on a benchmark experiment of handwritten digits classification to illustrate the validity of the approach. Keywords: kernels on measures, semigroup theory, Jensen divergence, generalized variance, reproducing kernel Hilbert space


reference text

Shotaro Akaho. A kernel method for canonical correlation analysis. In Proceedings of International Meeting on Psychometric Society (IMPS2001), 2001. Shun-ichi Amari and Hiroshi Nagaoka. Methods of Information Geometry. AMS vol. 191, 2001. Francis Bach and Michael Jordan. Kernel independent component analysis. Journal of Machine Learning Research, 3:1–48, 2002. Christian Berg, Jens Peter Reus Christensen, and Paul Ressel. Harmonic Analysis on Semigroups. Springer-Verlag, 1984. Alain Berlinet and Christine Thomas-Agnan. Reproducing Kernel Hilbert Spaces in Probability and Statistics. Kluwer Academic Publishers, 2003. B. E. Boser, I. M. Guyon, and V. N. Vapnik. A training algorithm for optimal margin classifiers. In Proceedings of the 5th annual ACM workshop on Computational Learning Theory, pages 144– 152. ACM Press, 1992. Marco Cuturi and Jean-Philippe Vert. The context-tree kernel for strings. Neural Networks, 2005. In press. Marco Cuturi and Jean-Philippe Vert. Semigroup kernels on finite sets. In Lawrence K. Saul, Yair Weiss, and L´ on Bottou, editors, Advances in Neural Information Processing Systems 17, pages e 329–336. MIT Press, Cambridge, MA, 2005. Jean Dieudonn´ . Calcul Infinit´ simal. Hermann, Paris, 1968. e e Dominik M. Endres and Johannes E. Schindelin. A new metric for probability distributions. IEEE Transactions on Information Theory, 49(7):1858–1860, 2003. Bent Fuglede and Flemming Topsøe. Jensen-shannon divergence and hilbert space embedding. In Proc. of the Internat. Symposium on Information Theory, page 31, 2004. Kenji Fukumizu, Francis Bach, and Michael Jordan. Dimensionality reduction for supervised learning with reproducing kernel hilbert spaces. Journal of Machine Learning Research, 5:73–99, 2004. M. Hein and O. Bousquet. Hilbertian metrics and positive definite kernels on probability measures. January 2005. Tony Jebara, Risi Kondor, and Andrew Howard. Probability product kernels. Journal of Machine Learning Research, 5:819–844, 2004. Thorsten Joachims. Learning to Classify Text Using Support Vector Machines: Methods, Theory, and Algorithms. Kluwer Academic Publishers, Dordrecht, 2002. ISBN 0-7923-7679-X. Risi Kondor and Tony Jebara. A kernel between sets of vectors. In Proceedings of the International Conference on Machine Learning, 2003. 1197 C UTURI , F UKUMIZU AND V ERT John Lafferty and Guy Lebanon. Information diffusion kernels. In T. G. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems 14, Cambridge, MA, 2002. MIT Press. Christina Leslie, Eleazar Eskin, and William Stafford Noble. The spectrum kernel: a string kernel for svm protein classific ation. In Proceedings of the Pacific Symposium on Biocomputing 2002, pages 564–575. World Scientific, 2002. Christina Leslie, Eleazar Eskin, Jason Weston, and William Stafford Noble. Mismatch string kernels for svm protein classification. In Suzanna Becker, Sebastian Thrun, and Klaus Obermayer, editors, Advances in Neural Information Processing Systems 15, Cambridge, MA, 2003. MIT Press. Thomas Melzer, Michael Reiter, and Horst Bischof. Nonlinear feature extraction using generalized canonical correlation analysis. In Proceedings of International Conference on Artificial Neural Networks (ICANN), pages 353–360, 2001. Pedro J. Moreno, Purdy P. Ho, and Nuno Vasconcelos. A kullback-leibler divergence based kernel for svm classification in multimedia applications. In Sebastian Thrun, Lawrence Saul, and Bernhard Sch¨ lkopf, editors, Advances in Neural Information Processing Systems 16. MIT Press, o Cambridge, MA, 2004. ¨ Ferdinand Osterreicher and Igor Vajda. A new class of metric divergences on probability spaces and its applicability in statistics. Annals of the Institute of Statistical Mathematics, 55:639–653, 2003. C. R. Rao. Differential metrics in probability spaces. In S.-I. Amari, O.E. Barndorff-Nielsen, R.E. Kass, S.L. Lauritzen, and C.R. Rao, editors, Differential Geometry in Statistical Inference, Hayward,CA, 1987. Institute of Mathematical Statistics. Walter Rudin. Fourier Analysis on Groups. John Wiley & sons, 1962. Bernhard Sch¨ lkopf and Alexander J. Smola. Learning with Kernels: Support Vector Machines, o Regularization , Optimization, and Beyond. MIT Press, Cambridge, MA, 2002. Matthias Seeger. Covariance kernels from bayesian generative models. In T. G. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems 14, pages 905–912, Cambridge, MA, 2002. MIT Press. F. M. J. Willems, Y. M. Shtarkov, and Tj. J. Tjalkens. The context-tree weighting method: basic properties. IEEE Transactions on Information Theory, pages 653–664, 1995. Lior Wolf and Amnon Shashua. Learning over sets using kernel principal angles. Journal of Machine Learning Research, 4:913–931, 2003. 1198