jmlr jmlr2012 jmlr2012-74 jmlr2012-74-reference knowledge-graph by maker-knowledge-mining

74 jmlr-2012-Multi Kernel Learning with Online-Batch Optimization

Source: pdf

Author: Francesco Orabona, Luo Jie, Barbara Caputo

Abstract: In recent years there has been a lot of interest in designing principled classiﬁcation algorithms over multiple cues, based on the intuitive notion that using more features should lead to better performance. In the domain of kernel methods, a principled way to use multiple features is the Multi Kernel Learning (MKL) approach. Here we present a MKL optimization algorithm based on stochastic gradient descent that has a guaranteed convergence rate. We directly solve the MKL problem in the primal formulation. By having a p-norm formulation of MKL, we introduce a parameter that controls the level of sparsity of the solution, while leading to an easier optimization problem. We prove theoretically and experimentally that 1) our algorithm has a faster convergence rate as the number of kernels grows; 2) the training complexity is linear in the number of training examples; 3) very few iterations are sufﬁcient to reach good solutions. Experiments on standard benchmark databases support our claims. Keywords: multiple kernel learning, learning kernels, online optimization, stochastic subgradient descent, convergence bounds, large scale

reference text

F. R. Bach, G. R. G. Lanckriet, and M. I. Jordan. Multiple kernel learning, conic duality, and the SMO, algorithm. In Proc. of the International Conference on Machine Learning, 2004. P. Bartlett, E. Hazan, and A. Rakhlin. Adaptive online gradient descent. In Advances in Neural Information Processing Systems 20, pages 65–72. MIT Press, Cambridge, MA, 2008. 249 O RABONA , J IE AND C APUTO A. Beck and M. Teboulle. Mirror descent and nonlinear projected subgradient methods for convex optimization. Operations Research Letters, 31(3):167–175, 2003. A. Bosch, A. Zisserman, and X. Munoz. Representing shape with a spatial pyramid kernel. In Proc. of the 6th ACM International Conference on Image and Video Retrieval, pages 401–408. ACM, July 2007. S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004. G. Cavallanti, N. Cesa-Bianchi, and C. Gentile. Linear algorithms for online multitask classiﬁcation. In Proc. of the 21st Conference on Learning Theory, 2008. C. C. Chang and C. J. Lin. LIBSVM: A Library for Support Vector Machines, 2001. Software available at www.csie.ntu.edu.tw/˜cjlin/libsvm. C. Cortes, M. Mohri, and A. Rostamizadeh. Two-stage learning kernel algorithms. In Proc. of the 27th International Conference on Machine Learning, 2010. K. Crammer and Y. Singer. On the algorithmic implementation of multiclass kernel-based vector machines. The Journal of Machine Learning Research, 2:265–292, 2002. N. Cristianini and J. Shawe-Taylor. An Introduction to Support Vector Machines and Other KernelBased Learning Methods. Cambridge University Press, 2000. N. Cristianini, J. Kandola, A. Elisseeff, and J. Shawe-Taylor. On kernel-target alignment. In Advances in Neural Information Processing Systems 14, volume 14, pages 367–373, 2002. C. B. Do, Q. V. Le, and Chuan-Sheng Foo. Proximal regularization for online and batch learning. In Proc. of the International Conference on Machine Learning, 2009. L. Fei-Fei, R. Fergus, and P. Perona. One-shot learning of object categories. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(4):594–611, 2004. P. Gehler and S. Nowozin. Let the kernel ﬁgure it out: Principled learning of pre-processing for kernel classiﬁers. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, 2009a. P. Gehler and S. Nowozin. On feature combination for multiclass object classiﬁcation. In Proc. of the International Conference on Computer Vision, 2009b. M. G¨ nen and E. Alpaydin. Cost-conscious multiple kernel learning. Pattern Recognition Letters, o 31:959–965, July 2010. D. Hush, P. Kelly, C. Scovel, and I. Steinwart. QP algorithms with guaranteed accuracy and run time for support vector machines. Journal of Machine Learning Research, 7:733–769, 2006. L. Jie, F. Orabona, and B. Caputo. An online framework for learning novel concepts over multiple cues. In H. Zha, R. Taniguchi, and S. J. Maybank, editors, Computer Vision - ACCV 2009, 9th Asian Conference on Computer Vision, Xi’an, China, September 23-27, 2009, Revised Selected Papers, Part I, volume 5994 of Lecture Notes in Computer Science, Berlin / Heidelberg, 2010a. Springer. 250 M ULTI K ERNEL L EARNING WITH O NLINE -BATCH O PTIMIZATION L. Jie, F. Orabona, M. Fornoni, B. Caputo, and N. Cesa-Bianchi. OM-2: An online multi-class multi-kernel learning algorithm. In 4th IEEE Online Learning for Computer Vision Workshop (in CVPR10). IEEE Computer Society, June 2010b. R. Jin, S. C. H. Hoi, and T. Yang. Online multiple kernel learning: Algorithms and mistake bounds. In Proc. of the 21st International Conference on Algorithmic Learning Theory, pages 390–404, 2010. T. Joachims. Making large-scale SVM learning practical. In Advances in Kernel Methods – Support Vector Learning, pages 169–185. MIT Press, 1999. S. Kakade, S. Shalev-Shwartz, and A. Tewari. On the duality of strong convexity and strong smoothness: Learning applications and matrix regularization. Technical report, TTI, 2009. M. Kloft, U. Brefeld, S. Sonnenburg, P. Laskov, K.-R. M¨ ller, and A. Zien. Efﬁcient and accurate u ℓ p -norm multiple kernel learning. In Advances in Neural Information Processing Systems 22, pages 997–1005, 2009. G. Lanckriet, N. Cristianini, P. Bartlett, and L. E. Ghaoui. Learning the kernel matrix with semidefinite programming. Journal of Machine Learning Research, 5:27–72, 2004. S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, 2006. Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86:2278–2324, 1998. D. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2):91–110, 2004. C. A. Micchelli and M. Pontil. Learning the kernel function via regularization. Journal of Machine Learning Research, 6:1099–1125, December 2005. J. S. Nath, G. Dinesh, S. Ramanand, C. Bhattacharyya, A. Ben-Tal, and K. R. Ramakrishnan. On the algorithmics and applications of a mixed-norm based kernel learning formulation. In Advances in Neural Information Processing Systems 22, pages 844–852, 2009. M. E. Nilsback and B. Caputo. Cue integration through discriminative accumulation. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, 2004. M.-E. Nilsback and A. Zisserman. A visual vocabulary for ﬂower classiﬁcation. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, 2006. T. Ojala, M. Pietika¨ inen, and T. Ma¨ enpa¨ a¨ . Multiresolution gray-scale and rotation invariant a a a a texture classiﬁcation with local binary patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(7):971–987, 2002. F. Orabona. DOGMA: a MATLAB toolbox for Online Learning, 2009. Software available at http: //dogma.sourceforge.net. 251 O RABONA , J IE AND C APUTO F. Orabona and K. Crammer. New adaptive algorithms for online classiﬁcation. In Advances in Neural Information Processing Systems, December 2010. F. Orabona, L. Jie, and B. Caputo. Online-batch strongly convex multi kernel learning. In Proc. of the 23rd IEEE Conference on Computer Vision and Pattern Recognition, June 2010. N. Pinto, D. D. Cox, and J. J. Dicarlo. Why is real-world visual object recognition hard? PLoS Computational Biology, 4(1), January 2008. J. C. Platt. Fast training of support vector machines using sequential minimal optimization. In Advances in Kernel Methods – Support Vector Learning, pages 185–208. MIT Press, 1999. A. Pronobis, J. Luo, and B. Caputo. The more you learn, the less you store: Memory-controlled incremental SVM for visual place recognition. Image and Vision Computing (IMAVIS), Special Issue on Online Pattern Recognition and Machine Learning Techniques for Computer-Vision: Theory and Applications, 28(7):1080–1097, July 2010. A. Rakotomamonjy, F. R. Bach, S. Canu, and Y. Grandvalet. SimpleMKL. Journal of Machine Learning Research, 9:2491–2521, November 2008. E. Rubinstein. Support vector machines via advanced optimization techniques. Technical report, Masters thesis, Faculty of Electrical Engineering, Technion, Nov 2005. C. Sanderson and K. K. Paliwal. Identity veriﬁcation using speech and face information. Digital Signal Processing, 14(5):449–480, 2004. S. Shalev-Shwartz and Y. Singer. Logarithmic regret algorithms for strongly convex repeated games. Technical Report 2007-42, The Hebrew University, 2007. S. Shalev-Shwartz and N. Srebro. SVM, optimization: inverse dependence on training set size. In Proc. of the International Conference on Machine Learning, 2008. S. Shalev-Shwartz, Y. Singer, and N. Srebro. Pegasos: Primal estimated sub-gradient solver for svm. In Proc. of the International Conference on Machine Learning, 2007. S. Sonnenburg, G. R¨ tsch, and C. Sch¨ fer. Learning interpretable SVMs for biological sequence a a classiﬁcation. In Research in Computational Molecular Biology, 9th Annual International Conference, RECOMB 2005, Cambridge, MA, USA, May 14-18, 2005, Proceedings, volume 3500 of Lecture Notes in Computer Science, pages 389–407. Springer, 2005. S. Sonnenburg, G. R¨ tsch, C. Sch¨ fer, and B. Sch¨ lkopf. Large scale multiple kernel learning. a a o Journal of Machine Learning Research, 7:1531–1565, December 2006. R. Tomioka and T. Suzuki. Sparsity-accuracy trade-off in MKL, 2010. URL http://arxiv.org/ abs/1001.2615. I. Tsochantaridis, T. Hofmann, T. Joachims, and Y. Altun. Support vector machine learning for interdependent and structured output spaces. In Proc. of the International Conference on Machine Learning, 2004. 252 M ULTI K ERNEL L EARNING WITH O NLINE -BATCH O PTIMIZATION O. Tuzel, F. Porikli, and P. Meer. Human detection via classiﬁcation on Riemannian manifolds. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, 2007. M. Varma and B. R. Babu. More generality in efﬁcient multiple kernel learning. In Proc. of the International Conference on Machine Learning, 2009. S. V. N. Vishwanathan, Z. Sun, N. Theera-Ampornpunt, and M. Varma. Multiple kernel learning and the SMO algorithm. In Advances in Neural Information Processing Systems, December 2010. D. H. Wolpert. Stacked generalization. Neural Networks, 5(2):241–259, 1992. Z. Xu, R. Jin, I. King, and M. R. Lyu. An extended level method for efﬁcient multiple kernel learning. In Advances in Neural Information Processing Systems 21, pages 1825–1832, 2008. M. Yuan and Y. Lin. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society, 68:49–67, 2006. A. Zien and C. S. Ong. Multiclass multiple kernel learning. In Proc. of the International Conference on Machine Learning, 2007. 253