nips nips2010 nips2010-176 nips2010-176-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Zhaonan Sun, Nawanol Ampornpunt, Manik Varma, S.v.n. Vishwanathan
Abstract: Our objective is to train p-norm Multiple Kernel Learning (MKL) and, more generally, linear MKL regularised by the Bregman divergence, using the Sequential Minimal Optimization (SMO) algorithm. The SMO algorithm is simple, easy to implement and adapt, and efficiently scales to large problems. As a result, it has gained widespread acceptance and SVMs are routinely trained using SMO in diverse real world applications. Training using SMO has been a long standing goal in MKL for the very same reasons. Unfortunately, the standard MKL dual is not differentiable, and therefore can not be optimised using SMO style co-ordinate ascent. In this paper, we demonstrate that linear MKL regularised with the p-norm squared, or with certain Bregman divergences, can indeed be trained using SMO. The resulting algorithm retains both simplicity and efficiency and is significantly faster than state-of-the-art specialised p-norm MKL solvers. We show that we can train on a hundred thousand kernels in approximately seven minutes and on fifty thousand points in less than half an hour on a single core. 1
[1] http://www.csie.ntu.edu.tw/ cjlin/libsvmtools/datasets/binary.html. 8
[2] F. R. Bach. Exploring large feature spaces with hierarchical multiple kernel learning. In NIPS, pages 105–112, 2008.
[3] F. R. Bach, G. R. G. Lanckriet, and M. I. Jordan. Multiple kernel learning, conic duality, and the SMO algorithm. In ICML, pages 6–13, 2004.
[4] A. Ben-Tal, T. Margalit, and A. Nemirovski. The ordered subsets mirror descent optimization method with applications to tomography. SIAM Journal of Opimization, 12(1):79–108, 2001.
[5] C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/˜cjlin/libsvm.
[6] C. Cortes, M. Mohri, and A. Rostamizadeh. L2 regularization for learning kernels. In UAI, 2009.
[7] C. Cortes, M. Mohri, and A. Rostamizadeh. Learning non-linear combinations of kernels. In NIPS, 2009.
[8] R. E. Fan, P. H. Chen, and C. J. Lin. Working set selection using second order information for training SVM. JMLR, 6:1889–1918, 2005.
[9] C. Gentile. Robustness of the p-norm algorithms. ML, 53(3):265–299, 2003.
[10] M. Gonen and E. Alpaydin. Localized multiple kernel learning. In ICML, 2008.
[11] J. Kivinen, M. K. Warmuth, and B. Hassibi. The p-norm generaliziation of the LMS algorithm for adaptive filtering. IEEE Trans. Signal Processing, 54(5):1782–1793, 2006.
[12] M. Kloft, U. Brefeld, S. Sonnenburg, P. Laskov, K.-R. Muller, and A. Zien. Efficient and accurate lp -norm Multiple Kernel Learning. In NIPS, 2009.
[13] G. R. G. Lanckriet, N. Cristianini, P. Bartlett, L. El Ghaoui, and M. I. Jordan. Learning the kernel matrix with semidefinite programming. JMLR, 5:27–72, 2004.
[14] C. J. Lin, S. Lucidi, L. Palagi, A. Risi, and M. Sciandrone. Decomposition algorithm model for singly linearly-constrained problems subject to lower and upper bounds. JOTA, 141(1):107–126, 2009.
[15] J. Platt. Fast training of support vector machines using sequential minimal optimization. In Advances in Kernel Methods – Support Vector Learning, pages 185–208, 1999.
[16] A. Rakotomamonjy, F. Bach, Y. Grandvalet, and S. Canu. SimpleMKL. JMLR, 9:2491–2521, 2008.
[17] S. Sonnenburg, G. Raetsch, C. Schaefer, and B. Schoelkopf. Large scale multiple kernel learning. JMLR, 7:1531–1565, 2006.
[18] M. Varma and B. R. Babu. More generality in efficient multiple kernel learning. In ICML, 2009.
[19] A. Vedaldi, V. Gulshan, M. Varma, and A. Zisserman. Multiple kernels for object detection. In ICCV, 2009.
[20] S. V. N. Vishwanathan, Z. Sun, N. Theera-Ampornpunt, and M. Varma, 2010. The SMO-MKL code http://research.microsoft.com/˜manik/code/SMO-MKL/download.html.
[21] J. Yang, Y. Li, Y. Tian, L. Duan, and W. Gao. Group-sensitive multiple kernel learning for object categorization. In ICCV, 2009. 9