nips nips2010 nips2010-176 nips2010-176-reference knowledge-graph by maker-knowledge-mining

176 nips-2010-Multiple Kernel Learning and the SMO Algorithm

Source: pdf

Author: Zhaonan Sun, Nawanol Ampornpunt, Manik Varma, S.v.n. Vishwanathan

Abstract: Our objective is to train p-norm Multiple Kernel Learning (MKL) and, more generally, linear MKL regularised by the Bregman divergence, using the Sequential Minimal Optimization (SMO) algorithm. The SMO algorithm is simple, easy to implement and adapt, and efﬁciently scales to large problems. As a result, it has gained widespread acceptance and SVMs are routinely trained using SMO in diverse real world applications. Training using SMO has been a long standing goal in MKL for the very same reasons. Unfortunately, the standard MKL dual is not differentiable, and therefore can not be optimised using SMO style co-ordinate ascent. In this paper, we demonstrate that linear MKL regularised with the p-norm squared, or with certain Bregman divergences, can indeed be trained using SMO. The resulting algorithm retains both simplicity and efﬁciency and is signiﬁcantly faster than state-of-the-art specialised p-norm MKL solvers. We show that we can train on a hundred thousand kernels in approximately seven minutes and on ﬁfty thousand points in less than half an hour on a single core. 1

reference text

[1] http://www.csie.ntu.edu.tw/ cjlin/libsvmtools/datasets/binary.html. 8

[2] F. R. Bach. Exploring large feature spaces with hierarchical multiple kernel learning. In NIPS, pages 105–112, 2008.

[3] F. R. Bach, G. R. G. Lanckriet, and M. I. Jordan. Multiple kernel learning, conic duality, and the SMO algorithm. In ICML, pages 6–13, 2004.

[4] A. Ben-Tal, T. Margalit, and A. Nemirovski. The ordered subsets mirror descent optimization method with applications to tomography. SIAM Journal of Opimization, 12(1):79–108, 2001.

[5] C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/˜cjlin/libsvm.

[6] C. Cortes, M. Mohri, and A. Rostamizadeh. L2 regularization for learning kernels. In UAI, 2009.

[7] C. Cortes, M. Mohri, and A. Rostamizadeh. Learning non-linear combinations of kernels. In NIPS, 2009.

[8] R. E. Fan, P. H. Chen, and C. J. Lin. Working set selection using second order information for training SVM. JMLR, 6:1889–1918, 2005.

[9] C. Gentile. Robustness of the p-norm algorithms. ML, 53(3):265–299, 2003.

[10] M. Gonen and E. Alpaydin. Localized multiple kernel learning. In ICML, 2008.

[11] J. Kivinen, M. K. Warmuth, and B. Hassibi. The p-norm generaliziation of the LMS algorithm for adaptive ﬁltering. IEEE Trans. Signal Processing, 54(5):1782–1793, 2006.

[12] M. Kloft, U. Brefeld, S. Sonnenburg, P. Laskov, K.-R. Muller, and A. Zien. Efﬁcient and accurate lp -norm Multiple Kernel Learning. In NIPS, 2009.

[13] G. R. G. Lanckriet, N. Cristianini, P. Bartlett, L. El Ghaoui, and M. I. Jordan. Learning the kernel matrix with semideﬁnite programming. JMLR, 5:27–72, 2004.

[14] C. J. Lin, S. Lucidi, L. Palagi, A. Risi, and M. Sciandrone. Decomposition algorithm model for singly linearly-constrained problems subject to lower and upper bounds. JOTA, 141(1):107–126, 2009.

[15] J. Platt. Fast training of support vector machines using sequential minimal optimization. In Advances in Kernel Methods – Support Vector Learning, pages 185–208, 1999.

[16] A. Rakotomamonjy, F. Bach, Y. Grandvalet, and S. Canu. SimpleMKL. JMLR, 9:2491–2521, 2008.

[17] S. Sonnenburg, G. Raetsch, C. Schaefer, and B. Schoelkopf. Large scale multiple kernel learning. JMLR, 7:1531–1565, 2006.

[18] M. Varma and B. R. Babu. More generality in efﬁcient multiple kernel learning. In ICML, 2009.

[19] A. Vedaldi, V. Gulshan, M. Varma, and A. Zisserman. Multiple kernels for object detection. In ICCV, 2009.

[20] S. V. N. Vishwanathan, Z. Sun, N. Theera-Ampornpunt, and M. Varma, 2010. The SMO-MKL code http://research.microsoft.com/˜manik/code/SMO-MKL/download.html.

[21] J. Yang, Y. Li, Y. Tian, L. Duan, and W. Gao. Group-sensitive multiple kernel learning for object categorization. In ICCV, 2009. 9