nips nips2001 nips2001-164 nips2001-164-reference knowledge-graph by maker-knowledge-mining

164 nips-2001-Sampling Techniques for Kernel Methods

Source: pdf

Author: Dimitris Achlioptas, Frank Mcsherry, Bernhard Schölkopf

Abstract: We propose randomized techniques for speeding up Kernel Principal Component Analysis on three levels: sampling and quantization of the Gram matrix in training, randomized rounding in evaluating the kernel expansions, and random projections in evaluating the kernel itself. In all three cases, we give sharp bounds on the accuracy of the obtained approximations. Rather intriguingly, all three techniques can be viewed as instantiations of the following idea: replace the kernel function by a “randomized kernel” which behaves like in expectation.

reference text

[1] D. Achlioptas, Database-friendly random projections, Proc. of the 20th Symposium on Principle of Database Systems (Santa Barbara, California), 2001, pp. 274–281.

[2] C. J. C. Burges, Simpliﬁed support vector decision rules, Proc. of the 13th International Conference on Machine Learning, Morgan Kaufmann, 1996, pp. 71–77.

[3] N. Cristianini, J. Shawe-Taylor, and H. Lodhi, Latent semantic kernels, Proc. of the 18th International Conference on Machine Learning, Morgan Kaufman, 2001.

[4] C. Davis and W. Kahan, The rotation of eigenvectors by a perturbation 3, SIAM Journal on Numerical Analysis 7 (1970), 1–46.

[5] Z. F¨ redi and J. Koml´ s, The eigenvalues of random symmetric matrices, Combinau o torica 1 (1981), no. 3, 233–241.

[6] N. I. M. Gould, An algorithm for large-scale quadratic programming, IMA Journal of Numerical Analysis 11 (1991), no. 3, 299–324.

[7] W. Hoeffding, Probability inequalities for sums of bounded random variables, Journal of the American Statistical Association 58 (1963), 13–30.

[8] W. B. Johnson and J. Lindenstrauss, Extensions of Lipschitz mappings into a Hilbert space, Conference in modern analysis and probability (New Haven, Conn., 1982), American Mathematical Society, 1984, pp. 189–206.

[9] R. H. Nickel and J. W. Tolle, A sparse sequential quadratic programming algorithm, Journal of Optimization Theory and Applications 60 (1989), no. 3, 453–473.

[10] E. Osuna, R. Freund, and F. Girosi, An improved training algorithm for support vector machines, Neural Networks for Signal Processing VII, 1997, pp. 276–285.

[11] B. Sch¨ lkopf, A. J. Smola, and K.-R. M¨ ller, Nonlinear component analysis as a o u kernel eigenvalue problem, Neural Computation 10 (1998), 1299–1319.

[12] A. J. Smola and B. Sch¨ lkopf, Sparse greedy matrix approximation for machine o learning, Proc. of the 17th International Conference on Machine Learning, Morgan Kaufman, 2000, pp. 911–918.

[13] V. Vapnik, The nature of statistical learning theory, Springer, NY, 1995.

[14] C. K. I. Williams and M. Seeger, Using the Nystrom method to speed up kernel machines, Advances in Neural Information Processing Systems 2000, MIT Press, 2001.