nips nips2002 nips2002-106 nips2002-106-reference knowledge-graph by maker-knowledge-mining

106 nips-2002-Hyperkernels

Source: pdf

Author: Cheng S. Ong, Robert C. Williamson, Alex J. Smola

Abstract: We consider the problem of choosing a kernel suitable for estimation using a Gaussian Process estimator or a Support Vector Machine. A novel solution is presented which involves deﬁning a Reproducing Kernel Hilbert Space on the space of kernels itself. By utilizing an analog of the classical representer theorem, the problem of choosing a kernel from a parameterized family of kernels (e.g. of varying width) is reduced to a statistical estimation problem akin to the problem of minimizing a regularized risk functional. Various classical settings for model or kernel selection are special cases of our framework.

reference text

[1] G. Lanckriet, N. Cristianini, P. Bartlett, L. El Ghaoui, and M. Jordan. Learning the kernel matrix with semideﬁnite programming. In ICML. Morgan Kaufmann, 2002.

[2] C. K. I. Williams. Prediction with Gaussian processes: From linear regression to linear prediction and beyond. In M. I. Jordan, editor, Learning and Inference in Graphical Models. Kluwer Academic, 1998.

[3] O. Chapelle, V. Vapnik, O. Bousquet, and S. Mukherjee. Choosing kernel parameters for support vector machines. Machine Learning, 2002. Forthcoming.

[4] G. Wahba. Spline Models for Observational Data, volume 59 of CBMS-NSF Regional Conference Series in Applied Mathematics. SIAM, Philadelphia, 1990.

[5] K. Crammer, J. Keshet, and Y. Singer. Kernel design using boosting. In Advances in Neural Information Processing Systems 15, 2002. In press.

[6] O. Bousquet and D. Herrmann. On the complexity of learning the kernel matrix. In Advances in Neural Information Processing Systems 15, 2002. In press.

[7] N. Cristianini, A. Elisseeff, and J. Shawe-Taylor. On optimizing kernel alignment. Technical Report NC2-TR-2001-087, NeuroCOLT, http://www.neurocolt.com, 2001.

[8] B. Sch¨ lkopf and A. J. Smola. Learning with Kernels. MIT Press, 2002. o

[9] S. Fine and K. Scheinberg. Efﬁcient SVM training using low-rank kernel representation. Technical report, IBM Watson Research Center, New York, 2000.

[10] Y. Freund and R. E. Schapire. Experiments with a new boosting algorithm. In ICML, pages 148–146. Morgan Kaufmann Publishers, 1996.

[11] G. R¨ tsch, T. Onoda, and K. R. M¨ ller. Soft margins for adaboost. Machine Learning, a u 42(3):287–320, 2001.