nips nips2000 nips2000-134 nips2000-134-reference knowledge-graph by maker-knowledge-mining

134 nips-2000-The Kernel Trick for Distances

Source: pdf

Author: Bernhard Schölkopf

Abstract: A method is described which, like the kernel trick in support vector machines (SVMs), lets us generalize distance-based algorithms to operate in feature spaces, usually nonlinearly related to the input space. This is done by identifying a class of kernels which can be represented as norm-based distances in Hilbert spaces. It turns out that common kernel algorithms, such as SVMs and kernel PCA, are actually really distance based algorithms and can be run with that class of kernels, too. As well as providing a useful new insight into how these algorithms work, the present work can form the basis for conceiving new algorithms.

reference text

[1] M. A. Aizerman, E. M. Braverman, and L. 1. Rozonoer. Theoretical foundations of the potential function method in pattern recognition learning. Autom. and Remote Contr. , 25:821- 837, 1964.

[2] C. Berg, J.P.R. Christensen, and P. Ressel. Hannonic Analysis on Semigroups. Springer-Verlag, New York, 1984.

[3] B. E. Boser, 1. M. Guyon, and V. N. Vapnik. A training algorithm for optimal margin classifiers. In D. Haussler, editor, Proceedings of the 5th Annual ACM Workshop on Computational Learning Theory, pages 144-152, Pittsburgh, PA, July 1992. ACM Press.

[4] F. Girosi, M. Jones, and T. Poggio. Regularization theory and neural networks architectures. Neural Computation, 7(2):219- 269, 1995.

[5] D. Haussler. Convolutional kernels on discrete structures. Technical Report UCSC-CRL-99-1O, Computer Science Department, University of California at Santa Cruz, 1999.

[6] J. Mercer. Functions of positive and negative type and their connection with the theory of integral equations. Philos. Trans. Roy. Soc. London, A 209:415-446, 1909.

[7] I. J. Schoenberg. Metric spaces and positive definite functions. 44:522- 536, 1938. Trans. Amer. Math. Soc.,

[8] B. Sch61kopf, C. J. C. Burges, and A. J. Smola. Advances in Kernel Methods - Support Vector Learning. MIT Press, Cambridge, MA, 1999.

[9] B. SchDlkopf, A. Smola, and K-R. Miiller. Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10:1299- 1319, 1998.

[10] A. Smola, T. FrieB, and B. ScMlkopf. Semiparametric support vector and linear programming machines. In M.S. Keams, S.A. Solla, and D.A. Cohn, editors, Advances in Neural Infonnation Processing Systems 11 , pages 585 - 591, Cambridge, MA, 1999. MIT Press.

[11] A. Smola, B. SchDlkopf, and K-R. Miiller. The connection between regularization operators and support vector kernels. Neural Networks , 11:637- 649, 1998.

[12] W.S . Torgerson. Theory and Methods of Scaling. Wiley, New York, 1958.

[13] V. Vapnik. The Nature of Statistical Learning Theory. Springer, N.Y., 1995.

[14] G. Wahba. Spline Models for Observational Data, volume 59 of CBMS-NSF Regional Conference Series in Applied Mathematics. SIAM, Philadelphia, 1990.

[15] C. Watkins, 2000. personal communication.