nips nips2009 nips2009-128 nips2009-128-reference knowledge-graph by maker-knowledge-mining

128 nips-2009-Learning Non-Linear Combinations of Kernels


Source: pdf

Author: Corinna Cortes, Mehryar Mohri, Afshin Rostamizadeh

Abstract: This paper studies the general problem of learning kernels based on a polynomial combination of base kernels. We analyze this problem in the case of regression and the kernel ridge regression algorithm. We examine the corresponding learning kernel optimization problem, show how that minimax problem can be reduced to a simpler minimization problem, and prove that the global solution of this problem always lies on the boundary. We give a projection-based gradient descent algorithm for solving the optimization problem, shown empirically to converge in few iterations. Finally, we report the results of extensive experiments with this algorithm using several publicly available datasets demonstrating the effectiveness of our technique.


reference text

[1] A. Argyriou, R. Hauser, C. Micchelli, and M. Pontil. A DC-programming algorithm for kernel selection. In International Conference on Machine Learning, 2006.

[2] A. Argyriou, C. Micchelli, and M. Pontil. Learning convex combinations of continuously parameterized basic kernels. In Conference on Learning Theory, 2005.

[3] F. Bach. Exploring large feature spaces with hierarchical multiple kernel learning. In Advances in Neural Information Processing Systems, 2008.

[4] C. Berg, J. P. R. Christensen, and P. Ressel. Harmonic Analysis on Semigroups. SpringerVerlag: Berlin-New York, 1984.

[5] J. Blitzer, M. Dredze, and F. Pereira. Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification. In Association for Computational Linguistics, 2007.

[6] B. Boser, I. Guyon, and V. Vapnik. A training algorithm for optimal margin classifiers. In Conference on Learning Theory, 1992.

[7] O. Chapelle, V. Vapnik, O. Bousquet, and S. Mukherjee. Choosing multiple parameters for support vector machines. Machine Learning, 46(1-3), 2002.

[8] C. Cortes, M. Mohri, and A. Rostamizadeh. Learning sequence kernels. In Machine Learning for Signal Processing, 2008.

[9] C. Cortes, M. Mohri, and A. Rostamizadeh. L2 regularization for learning kernels. In Uncertainty in Artificial Intelligence, 2009.

[10] C. Cortes and V. Vapnik. Support-Vector Networks. Machine Learning, 20(3), 1995.

[11] T. Jebara. Multi-task feature and kernel selection for SVMs. In International Conference on Machine Learning, 2004.

[12] G. Lanckriet, N. Cristianini, P. Bartlett, L. E. Ghaoui, and M. Jordan. Learning the kernel matrix with semidefinite programming. Journal of Machine Learning Research, 5, 2004.

[13] C. Micchelli and M. Pontil. Learning the kernel function via regularization. Journal of Machine Learning Research, 6, 2005.

[14] C. S. Ong, A. Smola, and R. Williamson. Learning the kernel with hyperkernels. Journal of Machine Learning Research, 6, 2005.

[15] A. Rakotomamonjy, F. Bach, Y. Grandvalet, and S. Canu. Simplemkl. Journal of Machine Learning Research, 9, 2008.

[16] C. Saunders, A. Gammerman, and V. Vovk. Ridge Regression Learning Algorithm in Dual Variables. In International Conference on Machine Learning, 1998.

[17] B. Sch¨ lkopf and A. Smola. Learning with Kernels. MIT Press: Cambridge, MA, 2002. o

[18] B. Scholkopf, A. Smola, and K. Muller. Nonlinear component analysis as a kernel eigenvalue problem. Neural computation, 10(5), 1998.

[19] J. Shawe-Taylor and N. Cristianini. Kernel Methods for Pattern Analysis. Cambridge University Press, 2004.

[20] S. Sonnenburg, G. R¨ tsch, C. Sch¨ fer, and B. Sch¨ lkopf. Large scale multiple kernel learning. a a o Journal of Machine Learning Research, 7, 2006.

[21] N. Srebro and S. Ben-David. Learning bounds for support vector machines with learned kernels. In Conference on Learning Theory, 2006.

[22] V. N. Vapnik. Statistical Learning Theory. Wiley-Interscience, New York, 1998.

[23] M. Varma and B. R. Babu. More generality in efficient multiple kernel learning. In International Conference on Machine Learning, 2009. 9