jmlr jmlr2008 jmlr2008-86 jmlr2008-86-reference knowledge-graph by maker-knowledge-mining

86 jmlr-2008-SimpleMKL

Source: pdf

Author: Alain Rakotomamonjy, Francis R. Bach, Stéphane Canu, Yves Grandvalet

Abstract: Multiple kernel learning (MKL) aims at simultaneously learning a kernel and the associated predictor in supervised learning settings. For the support vector machine, an efﬁcient and general multiple kernel learning algorithm, based on semi-inﬁnite linear programming, has been recently proposed. This approach has opened new perspectives since it makes MKL tractable for large-scale problems, by iteratively using existing support vector machine code. However, it turns out that this iterative algorithm needs numerous iterations for converging towards a reasonable solution. In this paper, we address the MKL problem through a weighted 2-norm regularization formulation with an additional constraint on the weights that encourages sparse kernel combinations. Apart from learning the combination, we solve a standard SVM optimization problem, where the kernel is deﬁned as a linear combination of multiple kernels. We propose an algorithm, named SimpleMKL, for solving this MKL problem and provide a new insight on MKL algorithms based on mixed-norm regularization by showing that the two approaches are equivalent. We show how SimpleMKL can be applied beyond binary classiﬁcation, for problems like regression, clustering (one-class classiﬁcation) or multiclass classiﬁcation. Experimental results show that the proposed algorithm converges rapidly and that its efﬁciency compares favorably to other MKL algorithms. Finally, we illustrate the usefulness of MKL for some regressors based on wavelet kernels and on some model selection problems related to multiclass classiﬁcation problems. Keywords: multiple kernel learning, support vector machines, support vector regression, multiclass SVM, gradient descent ∗. Also at Heudiasyc, CNRS/Universit´ de Technologie de Compi` gne (UMR 6599), 60205 Compi` gne, France. e e e c 2008 Alain Rakotomamonjy, Francis R. Bach, St´ phane Canu and Yves Grandvalet. e R AKOTOMAMONJY, BACH , C ANU AND G RANDVALET

reference text

A. Antoniadis and J. Fan. Regularization by wavelet approximations. J. American Statistical Association, 96:939–967, 2001. A. Argyriou, T. Evgeniou, and M. Pontil. Convex multi-task feature learning. Machine Learning, to appear, 2008. N. Aronszajn. Theory of reproducing kernels. Trans. Am. Math. Soc., (68):337–404, 1950. 2518 S IMPLE MKL F. Bach. Consistency of the group Lasso and multiple kernel learning. Journal of Machine Learning Research, 9:1179–1225, 2008. F. Bach, G. Lanckriet, and M. Jordan. Multiple kernel learning, conic duality, and the SMO algorithm. In Proceedings of the 21st International Conference on Machine Learning, pages 41–48, 2004a. F. Bach, R. Thibaux, and M. Jordan. Computing regularization paths for learning multiple kernels. In Advances in Neural Information Processing Systems, volume 17, pages 41–48, 2004b. M. Berkelaar, K. Eikland, and P. Notebaert. //lpsolve.sourceforge.net/5.5/. Lpsolve, Version 5.1.0.0, 2004. URL http: D. Bertsekas. Nonlinear Programming. Athena scientiﬁc, 1999. C. Blake and C. Merz. UCI repository of machine learning databases. University of California, Irvine, Dept. of Information and Computer Sciences, 1998. URL http://www.ics.uci.edu/ ˜mlearn/MLRepository.html. F. Bonnans. Optimisation Continue. Dunod, 2006. J.F. Bonnans and A. Shapiro. Optimization problems with pertubation : A guided tour. SIAM Review, 40(2):202–227, 1998. J.F. Bonnans, J.C Gilbert, C. Lemar´ chal, and C.A Sagastizbal. Numerical Optimization Theoretical e and Practical Aspects. Springer, 2003. S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004. S. Canu, Y. Grandvalet, V. Guigue, and A. Rakotomamonjy. SVM and kernel methods Matlab toolbox. LITIS EA4108, INSA de Rouen, Rouen, France, 2003. URL http://asi.insa-rouen. fr/enseignants/˜arakotom/toolbox/index.html. C-C. Chang and C-J. Lin. LIBSVM: A Library for Support Vector Machines, 2001. Software available at http://www.csie.ntu.edu.tw/˜cjlin/libsvm. O. Chapelle, V. Vapnik, O. Bousquet, and S. Mukerjhee. Choosing multiple parameters for SVM. Machine Learning, 46(1-3):131–159, 2002. K. Crammer and Y. Singer. On the algorithmic implementation of multiclass kernel-based vector machines. Journal of Machine Learning Research, 2:265–292, 2001. A. D’Amato, A. Antoniadis, and M. Pensky. Wavelet kernel penalized estimation for nonequispaced design regression. Statistics and Computing, 16:37–56, 2006. A. D’Aspremont. Smooth optimization with approximate gradient. SIAM Journal on Optimization, To appear, 2008. D. DeCoste and K. Wagstaff. Alpha seeding for support vector machines. In International Conference on Knowledge Discovery and Data Mining, 2000. 2519 R AKOTOMAMONJY, BACH , C ANU AND G RANDVALET K. Duan and S. Keerthi. Which is the best multiclass svm method? an empirical study. In Multiple Classiﬁer Systems, pages 278–285, 2005. URL http://www.keerthis.com/multiclass. html. B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani. Least angle regression (with discussion). Annals of statistics, 32(2):407–499, 2004. G. Fung, M. Dundar, J. Bi, and B. Rao. A fast iterative algorithm for Fisher discriminant using heterogeneous kernels. In Proceeedins of the 21th International Conference on Machine Learning, 2004. Y. Grandvalet. Least absolute shrinkage is equivalent to quadratic penalization. In L. Niklasson, M. Bod´ n, and T. Ziemske, editors, ICANN’98, volume 1 of Perspectives in Neural Computing, e pages 201–206. Springer, 1998. Y. Grandvalet and S. Canu. Adaptive scaling for feature selection in SVMs. In Advances in Neural Information Processing Systems, volume 15. MIT Press, 2003. Y. Grandvalet and S. Canu. Outcomes of the equivalence of adaptive ridge with least absolute shrinkage. In M. S. Kearns, S. A. Solla, and D. A. Cohn, editors, Advances in Neural Information Processing Systems 11, pages 445–451. MIT Press, 1999. Z. Harchaoui and F. Bach. Image classiﬁcation with segmentation graph kernels. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2007. T. Hastie, S. Rosset, R. Tibshirani, and J. Zhu. The entire regularization path for the support vector machine. Journal of Machine Learning Research, 5:1391–1415, 2004. C.-W. Hsu and C.-J. Lin. A comparison of methods for multi-class support vector machines. IEEE Transactions on Neural Networks, 13:415–425, 2002. T. Joachims. Making large-scale SVM learning practical. In B. Scholkopf, C. Burges, and A. Smola, editors, Advanced in Kernel Methods - Support Vector Learning, pages 169–184. MIT Press, 1999. S.-J. Kim, A. Magnani, and S. Boyd. Optimal kernel selection in kernel Fisher discriminant analysis. In Proceedings of the 23rd International Conference on Machine Learning (ICML), 2006. K. Koh, S.-J. Kim, and S. Boyd. An interior-point method for large-scale regression. Journal of Machine Learning Research, 8:1519–1555, 2007. 1 -regularized logistic G. Lanckriet, T. De Bie, N. Cristianini, M. Jordan, and W. Noble. A statistical framework for genomic data fusion. Bioinformatics, 20:2626–2635, 2004a. G. Lanckriet, N. Cristianini, L. El Ghaoui, P. Bartlett, and M. Jordan. Learning the kernel matrix with semi-deﬁnite programming. Journal of Machine Learning Research, 5:27–72, 2004b. C. Lemar´ chal and C. Sagastizabal. Practical aspects of moreau-yosida regularization : theoretical e preliminaries. SIAM Journal of Optimization, 7:867–895, 1997. 2520 S IMPLE MKL G. Loosli and S. Canu. Comments on the ”Core vector machines: Fast SVM training on very large data sets”. Journal of Machine Learning Research, 8:291–301, February 2007. G. Loosli, S. Canu, S. Vishwanathan, A. Smola, and M. Chattopadhyay. Boˆte a outils SVM simple ı ` et rapide. Revue d’Intelligence Artiﬁcielle, 19(4-5):741–767, 2005. D. Luenberger. Linear and Nonlinear Programming. Addison-Wesley, 1984. C. Micchelli and M. Pontil. Learning the kernel function via regularization. Journal of Machine Learning Research, 6:1099–1125, 2005. A. Rakotomamonjy and S. Canu. Frames, reproducing kernels, regularization and learning. Journal of Machine Learning Research, 6:1485–1515, 2005. A. Rakotomamonjy, X. Mary, and S. Canu. Non parametric regression with wavelet kernels. Applied Stochastics Model for Business and Industry, 21(2):153–163, 2005. A. Rakotomamonjy, F. Bach, S. Canu, and Y. Grandvalet. More efﬁciency in multiple kernel learning. In Zoubin Ghahramani, editor, Proceedings of the 24th Annual International Conference on Machine Learning (ICML 2007), pages 775–782. Omnipress, 2007. R. Rifkin and A. Klautau. In defense of one-vs-all classiﬁcation. Journal of Machine Learning Research, 5:101–141, 2004. S. Rosset. Tracking curved regularized optimization solution paths. In Advances in Neural Information Processing Systems, 2004. B. Sch¨ lkopf and A. Smola. Learning with Kernels. MIT Press, 2001. o S. Sonnenburg, G. R¨ tsch, and C. Sch¨ fer. A general and efﬁcient algorithm for multiple kernel a a learning. In Advances in Neural Information Processing Systems, volume 17, pages 1–8, 2005. S. Sonnenburg, G. R¨ tsch, C. Sch¨ fer, and B. Sch¨ lkopf. Large scale multiple kernel learning. a a o Journal of Machine Learning Research, 7(1):1531–1565, 2006. I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun. Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research, 6:1453–1484, 2005. V. Vapnik, S. Golowich, and A. Smola. Support vector method for function estimation, regression estimation and signal processing. volume Vol. 9. MIT Press, Cambridge, MA, neural information processing systems, edition, 1997. S. V. N. Vishwanathan, A. J. Smola, and M. Murty. SimpleSVM. In International Conference on Machine Learning, 2003. G. Wahba. Spline Models for Observational Data. Series in Applied Mathematics, Vol. 59, SIAM, 1990. J. Weston and C. Watkins. Multiclass support vector machines. In Proceedings of ESANN99, Brussels. D. Facto Press, 1999. A. Zien and C.S. Ong. Multiclass multiple kernel learning. In Proceedings of the 24th International Conference on Machine Learning (ICML 2007), pages 1191–1198, 2007. 2521