jmlr jmlr2011 jmlr2011-101 jmlr2011-101-reference knowledge-graph by maker-knowledge-mining

101 jmlr-2011-Variable Sparsity Kernel Learning

Source: pdf

Author: Jonathan Aflalo, Aharon Ben-Tal, Chiranjib Bhattacharyya, Jagarlapudi Saketha Nath, Sankaran Raman

Abstract: paper1 This presents novel algorithms and applications for a particular class of mixed-norm regularization based Multiple Kernel Learning (MKL) formulations. The formulations assume that the given kernels are grouped and employ l1 norm regularization for promoting sparsity within RKHS norms of each group and ls , s ≥ 2 norm regularization for promoting non-sparse combinations across groups. Various sparsity levels in combining the kernels can be achieved by varying the grouping of kernels—hence we name the formulations as Variable Sparsity Kernel Learning (VSKL) formulations. While previous attempts have a non-convex formulation, here we present a convex formulation which admits efﬁcient Mirror-Descent (MD) based solving techniques. The proposed MD based algorithm optimizes over product of simplices and has a computational complexity of O m2 ntot log nmax /ε2 where m is no. training data points, nmax , ntot are the maximum no. kernels in any group, total no. kernels respectively and ε is the error in approximating the objective. A detailed proof of convergence of the algorithm is also presented. Experimental results show that the VSKL formulations are well-suited for multi-modal learning tasks like object categorization. Results also show that the MD based algorithm outperforms state-of-the-art MKL solvers in terms of computational efﬁciency. Keywords: multiple kernel learning, mirror descent, mixed-norm, object categorization, scalability 1. All authors contributed equally. The author names appear in alphabetical order. c 2011 Jonathan Aﬂalo, Aharon Ben-Tal, Chiranjib Bhattacharyya, Jagarlapudi Saketha Nath and Sankaran Raman. A FLALO , B EN -TAL , B HATTACHARYYA , NATH AND R AMAN

reference text

F. Bach, G. R. G. Lanckriet, and M. I. Jordan. Multiple kernel learning, conic duality, and the SMO algorithm. In Proceedings of the International Conference on Machine Learning, 2004. A. Beck and M. Teboulle. Mirror descent and nonlinear projected subgradient methods for convex optimization. Operations Research Letters, 31:167–175, 2003. 590 VARIABLE S PARSITY K ERNEL L EARNING A. Ben-Tal and A. Nemirovski. Lectures on modern convex optimization: Analysis, algorithms and engineering applications. MPS/ SIAM Series on Optimization, 1, 2001. A. Ben-Tal, T. Margalit, and A. Nemirovski. The ordered subsets mirror descent optimization method with applications to tomography. SIAM Journal of Optimization, 12(1):79–108, 2001. A. C. Berg, T. L. Berg, and J. Malik. Shape matching and object recognition using low distortion correspondences. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2005. D. P. Bertsekas. Nonlinear Programming. Athena Scientiﬁc, second edition, 1999. C. Blake and C. Merz. Uci repository of machine learning databases, 1998. URL http://www. ics.uci.edu/˜mlearn/MLRepository.html. S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004. T. M. Cover and J. A. Thomas. Elements of Information Theory. Wiley, 2006. A. D’Aspermont. Smooth optimization with approximate gradient. SIAM Journal on Optimization, 19(3):1171–1183, 2008. L. Fei-Fei, R. Fergus, and P. Perona. Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Workshop on Generative-Model Based Vision, 2004. G. Grifﬁn, A. Holub, and P. Perona. Caltech-256 object category dataset. Technical Report CNSTR-2007-001, California Institute of Technology, 2007. M. Kloft, U. Brefeld, S. Sonnenburg, and A. Zien. Non-sparse regularization and efﬁcient training with multiple kernels. Technical Report UCB/EECS-2010-21, University of California, Berkeley, 2010. G.R.G. Lanckriet, N. Cristianini, P. Bartlett, L. El Ghaoui, and M.I. Jordan. Learning the kernel matrix with semideﬁnite programming. Journal of Machine Learning Research, 5:27–72, 2004. S. Lazebnik, C. Schmid, and J. Ponce. Beyond bag of features: Spatial pyramid matching for recognizing natural scene categories. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2006. C. A. Micchelli and M. Pontil. Learning the kernel function via regularization. Journal of Machine Learning Research, 6:1099–1125, 2005. J. S. Nath, G. Dinesh, S. Raman, C. Bhattacharyya, A. Ben-Tal, and K. R. Ramakrishnan. On the algorithmics and applications of a mixed-norm regularization based kernel learning formulation. In Advances in Neural Information Processing Systems (NIPS), 2009. M-E. Nilsback and A. Zisserman. A visual vocabulary for ﬂower classiﬁcation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, volume 2, pages 1447–1454, 2006. 591 A FLALO , B EN -TAL , B HATTACHARYYA , NATH AND R AMAN M-E. Nilsback and A. Zisserman. Automated ﬂower classiﬁcation over a large number of classes. In Proceedings of the Sixth Indian Conference on Computer Vision, Graphics & Image Processing, 2008. A. Quattoni, X. Carreras, M. Collins, and T. Darrell. An efﬁcient projection for l1,∞ regularization. In Proceedings of the International Conference on Machine Learning, 2009. A. Rakotomamonjy, F. Bach, S. Canu, and Y Grandvalet. Simplemkl. Journal of Machine Learning Research, 9:2491–2521, 2008. R. T. Rockafellar. Minimax theorems and conjugate saddle-functions. Mathematica Scandinava, 14:151–173, 1964. R. T. Rockafellar. Convex Analysis. Princeton University Press, 1970. E. Shechtman and M. Irani. Matching local self-similarities across images and videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2007. M. Sion. On general minimax theorem. Paciﬁc Journal of Mathematics, 1958. S. Sonnenburg, G. Ratsch, C. Schafer, and B. Scholkopf. Large scale multiple kernel learning. Journal of Machine Learning Research, 7:1531–1565, 2006. M. Szafranski, Y. Grandvalet, and A. Rakotomamonjy. Composite kernel learning. In Proceedings of the International Conference on Machine Learning, 2008. P. Tseng. Convergence of a block coordinate descent method for nondifferentiable minimisation. Journal of Optimization Theory and Applications, 2001. V. Vapnik. Statistical Learning Theory. Wiley-Interscience, 1998. M. Varma and D. Ray. Learning the discriminative power invariance trade-off. In Proceedings of the International Conference on Computer Vision, 2007. A. Vedaldi, V. Gulshan, M. Varma, and A. Zisserman. Multiple kernels for object detection. In Proceedings of the International Conference on Computer Vision, 2009. H. Zhang, A. Berg, M. Maire, and J. Malik. Svm-knn: Discriminative nearest neighbor classication for visual category recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2126–2136, 2006. 592