nips nips2010 nips2010-145 nips2010-145-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Kun Gai, Guangyun Chen, Chang-shui Zhang
Abstract: In this paper, we point out that there exist scaling and initialization problems in most existing multiple kernel learning (MKL) approaches, which employ the large margin principle to jointly learn both a kernel and an SVM classifier. The reason is that the margin itself can not well describe how good a kernel is due to the negligence of the scaling. We use the ratio between the margin and the radius of the minimum enclosing ball to measure the goodness of a kernel, and present a new minimization formulation for kernel learning. This formulation is invariant to scalings of learned kernels, and when learning linear combination of basis kernels it is also invariant to scalings of basis kernels and to the types (e.g., L1 or L2 ) of norm constraints on combination coefficients. We establish the differentiability of our formulation, and propose a gradient projection algorithm for kernel learning. Experiments show that our method significantly outperforms both SVM with the uniform combination of basis kernels and other state-of-art MKL approaches. 1
[1] O. Chapelle, V. Vapnik, O. Bousquet, and S. Mukherjee. Choosing multiple parameters for support vector machines. Machine Learning, 46(1):131–159, 2002.
[2] G.R.G. Lanckriet, N. Cristianini, P. Bartlett, L.E. Ghaoui, and M.I. Jordan. Learning the kernel matrix with semidefinite programming. The Journal of Machine Learning Research, 5:27–72, 2004.
[3] F.R. Bach, G.R.G. Lanckriet, and M.I. Jordan. Multiple kernel learning, conic duality, and the smo algorithm. In Proceedings of the twenty-first international conference on Machine learning (ICML 2004), 2004.
[4] S. Sonnenburg, G. R¨ tsch, and C. Sch¨ fer. A general and efficient multiple kernel learning algorithm. In a a Adv. Neural. Inform. Process Syst. (NIPS 2005), 2006.
[5] A. Rakotomamonjy, F. Bach, S. Canu, and Y. Grandvalet. SimpleMKL. Journal of Machine Learning Research, 9:2491–2521, 2008.
[6] O. Chapelle and A. Rakotomamonjy. Second order optimization of kernel parameters. In Proc. of the NIPS Workshop on Kernel Learning: Automatic Selection of Optimal Kernels, 2008.
[7] M. Kloft, U. Brefeld, S. Sonnenburg, P. Laskov, K. M¨ ller, and A. Zien. Efficient and Accurate lp-Norm u Multiple Kernel Learning. In Adv. Neural. Inform. Process Syst. (NIPS 2009), 2009.
[8] C. Cortes, M. Mohri, and A. Rostamizadeh. L2 regularization for learning kernels. In Uncertainty in Artificial Intelligence, 2009.
[9] J. Saketha Nath, G. Dinesh, S. Raman, Chiranjib Bhattacharyya, Aharon Ben-Tal, and K. R. Ramakrishnan. On the algorithmics and applications of a mixed-norm based kernel learning formulation. In Adv. Neural. Inform. Process Syst. (NIPS 2009), 2009.
[10] F. Bach. Exploring large feature spaces with hierarchical multiple kernel learning. In Adv. Neural. Inform. Process Syst. (NIPS 2008), 2008.
[11] M. G¨ nen and E. Alpaydin. Localized multiple kernel learning. In Proceedings of the 25th international o conference on Machine learning (ICML 2008), 2008.
[12] M. Varma and B.R. Babu. More generality in efficient multiple kernel learning. In Proceedings of the 26th International Conference on Machine Learning (ICML 2009), 2009.
[13] C. Cortes, M. Mohri, and A. Rostamizadeh. Learning Non-Linear Combinations of Kernels. In Adv. Neural. Inform. Process Syst. (NIPS 2009), 2009.
[14] N. Srebro and S. Ben-David. Learning bounds for support vector machines with learned kernels. In Proceedings of the International Conference on Learning Theory (COLT 2006), pages 169–183. Springer, 2006.
[15] Yiming Ying and Colin Campbell. Generalization bounds for learning the kernel. In Proceedings of the International Conference on Learning Theory (COLT 2009), 2009.
[16] H. Do, A. Kalousis, A. Woznica, and M. Hilario. Margin and Radius Based Multiple Kernel Learning. In Proceedings of the European Conference on Machine Learning (ECML 2009), 2009.
[17] J.M. Danskin. The theory of max-min, with applications. SIAM Journal on Applied Mathematics, pages 641–664, 1966.
[18] Dimitri P. Bertsekas. Nonlinear Programming. Athena Scientific, Belmont, MA, September 1999.
[19] John Duchi, Shai Shalev-Shwartz, Yoram Singer, and Tushar Chandra. Efficient projections onto the l1ball for learning in high dimensions. In Proceedings of the 25th international conference on Machine learning (ICML 2008), 2008.
[20] A. Asuncion and D.J. Newman. UCI machine learning repository, 2007. http://www.ics.uci.edu/∼mlearn/MLRepository.html. Software available at
[21] Chih-Chung Chang and Chih-Jen Lin. LIBSVM: a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/∼cjlin/libsvm. 9