nips nips2010 nips2010-279 nips2010-279-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Andreas Christmann, Ingo Steinwart
Abstract: During the last years support vector machines (SVMs) have been successfully applied in situations where the input space X is not necessarily a subset of Rd . Examples include SVMs for the analysis of histograms or colored images, SVMs for text classiÄ?Ĺš cation and web mining, and SVMs for applications from computational biology using, e.g., kernels for trees and graphs. Moreover, SVMs are known to be consistent to the Bayes risk, if either the input space is a complete separable metric space and the reproducing kernel Hilbert space (RKHS) H ⊂ Lp (PX ) is dense, or if the SVM uses a universal kernel k. So far, however, there are no kernels of practical interest known that satisfy these assumptions, if X ⊂ Rd . We close this gap by providing a general technique based on Taylor-type kernels to explicitly construct universal kernels on compact metric spaces which are not subset of Rd . We apply this technique for the following special cases: universal kernels on the set of probability measures, universal kernels based on Fourier transforms, and universal kernels for signal processing. 1
[1] H. Bauer. Measure and Integration Theory. De Gruyter, Berlin, 2001.
[2] P. Billingsley. Convergence of probability measures. John Wiley & Sons, New York, 2nd edition, 1999.
[3] N. Bourbaki. Integration I. Chapters 1-6. Springer, Berlin, 2004. Translated from the 1959, 1965, and 1967 French originals by S.K. Berberian.
[4] A. Caponnetto, C.A. Micchelli, M. Pontil, and Y. Ying. Universal multi-task kernels. J. Mach. Learn. Res., 9:1615–1646, 2008.
[5] O. Chapelle, P. Haffner, and V. Vapnik. SVMs for histogram-based image classiÄ?Ĺš cation. IEEE Transactions on Neural Networks, 10:1055–1064, 1999.
[6] R. M. Dudley. Real Analysis and Probability. Cambridge University Press, Cambridge, 2002.
[7] K. Fukumizu, F. R. Bach, and M. I. Jordan. Dimensionality reduction for supervised learning with reproducing kernel hilbert spaces. J. Mach. Learn. Res., 5:73–99, 2005.
[8] K. Fukumizu, F. R. Bach, and M. I. Jordan. Kernel Dimension Reduction in Regression. Ann. Statist., 37:1871–1905, 2009.
[9] K. Fukumizu, B. K. Sriperumbudur, A. Gretton, and B. Sch¨ lkopf. Characteristic kernels on groups o and semigroups. In D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, editors, Advances in Neural Information Processing Systems 21, pages 473–480. 2009. 8
[10] R. Hable and A. Christmann. Qualitative robustness of support vector machines. arXiv:0912.0874v1, 2009.
[11] M. Hein and O. Bousquet. Kernels, associated structures and generalizations. Technical report, MaxPlanck-Institute for Biological Cybernetics, 2004.
[12] M. Hein and O. Bousquet. Hilbertian metrics and positive deÄ?Ĺš nite kernels on probability measures. In Z. Ghahramani and R. Cowell, editors, AISTATS, pages 136–143, 2005.
[13] M. Hein, O. Bousquet, and B. Sch¨ lkopf. Maximal margin classiÄ?Ĺš cation for metric spaces. Journal of o Computer and System Sciences, 71:333–359, 2005.
[14] M. Hein, T. N. Lal, and O. Bousquet. Hilbertian metrics on probability measures and their application in SVM’s. In C. E. Rasmussen, H. H. B¨ lthoff, M. Giese, and B. Sch¨ lkopf, editors, Pattern Recognition, u o Proceedings of the 26th DAGM Symposium, pages 270–277, Berlin, 2004. Springer.
[15] T. Joachims. Learning to Classify Text Using Support Vector Machines. Kluwer Academic Publishers, Boston, 2002.
[16] J. Lafferty and G. Lebanon. Diffusion kernels on statistical manifolds. J. Mach. Learn. Res., 6:129–163, 2005.
[17] A.F.T. Martins, N.A. Smith, E.P. Xing, P.M.Q. Aguiar, and M.A.T. Figueiredo. Nonextensive information theoretic kernels on measures. J. Mach. Learn. Res., 10:935–975, 2009.
[18] C. A. Micchelli, Y. Xu, and H. Zhang. Universal kernels. J. Mach. Learn. Res., 7:2651–2667, 2006.
[19] K. R. Parthasarathy. Probability Measures on Metric Spaces. Academic Press, New York, 1967.
[20] E. Parzen. On estimating of a probability density and mode. Ann. Math. Statist., 35:1065–1076, 1962.
[21] A. Pinkus. Strictly positive deÄ?Ĺš nite functions on a real inner product space. Adv. Comput. Math., 20:263– 271, 2004.
[22] B. Sch¨ lkopf, A. J. Smola, and K.-R. M¨ ller. Nonlinear component analysis as a kernel eigenvalue o u problem. Neural Comput., 10:1299–1319, 1998.
[23] B. Sch¨ lkopf, K. Tsuda, and J. P. Vert. Kernel Methods in Computational Biology. MIT Press, Cambridge, o MA, 2004.
[24] D. Scott. Averaged shifted histograms: Effective nonparametric density estimation in several dimensions. Ann. Statist., 13:1024–1040, 1985.
[25] C. Scovel, D. Hush, I. Steinwart, and J. Theiler. Radial kernels and their reproducing kernel Hilbert spaces. Journal of Complexity, 2010, to appear.
[26] A.J. Smola, A. Gretton, L. Song, and B. Sch¨ lkopf. A Hilbert Space Embedding for Distributions. In o E. Takimoto, editor, Algorithmic Learning Theory, Lecture Notes on Computer Science. Springer, 2007. Proceedings of the 10th International Conference on Discovery Science, 40-41.
[27] B. Sriperumbudur, K. Fukumizu, A. Gretton, G. Lanckriet, and B. Sch¨ lkopf. Kernel choice and claso siÄ?Ĺš ability for RKHS embeddings of probability distributions. In Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems 22, pages 1750–1758. 2009.
[28] B. Sriperumbudur, K. Fukumizu, and G. Lanckriet. On the relation between universality, characteristic kernels and RKHS embeddings of measures. In Yee Whye Teh and M. Titterington, editors, AISTATS 2010, Proc. of the 13th International Conference on ArtiÄ?Ĺš cial Intelligence and Statistics, volume 9, pages 773–780. 2010.
[29] B. Sriperumbudur, K. Fukumizu, and G. Lanckriet. Universality, characteristic kernels and RKHS embeddings of measures. arXiv:1003.0887v1, 2010.
[30] I. Steinwart. On the inÄ?Ĺš‚uence of the kernel on the consistency of support vector machines. J. Mach. Learn. Res., 2:67–93, 2001.
[31] I. Steinwart and A. Christmann. Support Vector Machines. Springer, New York, 2008.
[32] I. Steinwart, D. Hush, and C. Scovel. Function classes that approximate the Bayes risk. In COLT’06, 19th Conference on Learning Theory, Pittsburgh, 2006. 9