jmlr jmlr2012 jmlr2012-77 jmlr2012-77-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Fei Yan, Josef Kittler, Krystian Mikolajczyk, Atif Tahir
Abstract: Sparsity-inducing multiple kernel Fisher discriminant analysis (MK-FDA) has been studied in the literature. Building on recent advances in non-sparse multiple kernel learning (MKL), we propose a non-sparse version of MK-FDA, which imposes a general ℓ p norm regularisation on the kernel weights. We formulate the associated optimisation problem as a semi-infinite program (SIP), and adapt an iterative wrapper algorithm to solve it. We then discuss, in light of latest advances in MKL optimisation techniques, several reformulations and optimisation strategies that can potentially lead to significant improvements in the efficiency and scalability of MK-FDA. We carry out extensive experiments on six datasets from various application areas, and compare closely the performance of ℓ p MK-FDA, fixed norm MK-FDA, and several variants of SVM-based MKL (MK-SVM). Our results demonstrate that ℓ p MK-FDA improves upon sparse MK-FDA in many practical situations. The results also show that on image categorisation problems, ℓ p MK-FDA tends to outperform its SVM counterpart. Finally, we also discuss the connection between (MK-)FDA and (MK-)SVM, under the unified framework of regularised kernel machines. Keywords: multiple kernel learning, kernel fisher discriminant analysis, regularised least squares, support vector machines
F. Bach and G. Lanckriet. Multiple kernel learning, conic duality, and the smo algorithm. In International Conference on Machine Learning, 2004. G. Baudat and F. Anouar. Generalized discriminant analysis using a kernel approach. Neural Computation, 12:2385–2404, 2000. O. Bousquet and D. Herrmann. On the complexity of learning the kernel matrix. In Advances in Neural Information Processing Systems, 2003. M. Braun, J. Buhmann, and K. M¨ ller. On relevant dimensions in kernel feature spaces. Journal of u Machine Learning Research, 9:1875–1908, 2008. D. Cai, X. He, and J. Han. Efficient kernel discriminant analysis via spectral regression. In International Conference on Data Mining, 2007. H. Cai, K. Mikolajczyk, and J. Matas. Learning linear discriminant projections for dimensionality reduction of image descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(2):338–352, 2011. O. Chapelle, V. Vapnik, O. Bousquet, and S. Mukherjee. Choosing multiple parameters for support vector machines. Machine Learning, 46:131–159, 2002. C. Cortes, M . Mohri, and A. Rostamizadeh. L2 regularization for learning kernels. In Uncertainty in Artificial Intelligence, 2009. N. Cristianini, J. Shawe-Taylor, A. Elisseeff, and J. Kandola. On kernel-target alignment. In Advances in Neural Information Processing Systems, 2002. G. Csurka, C. Dance, L. Fan, J. Willamowski, and C Bray. Visual categorization with bags of keypoints. In ECCV workshop on Statistical Learning in Computer Vision, 2004. R. Duda, P. Hart, and D. Stork. Pattern Classification. Wiley, 2000. M. Everingham, L. van Gool, C. Williams, J. Winn, and A. Zisserman. The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results. http://www.pascalnetwork.org/challenges/VOC/voc2007/workshop/index.html, 2007. M. Everingham, L. van Gool, C. Williams, J. Winn, and A. Zisserman. The PASCAL Visual Object Classes Challenge 2008 (VOC2008) Results. http://www.pascalnetwork.org/challenges/VOC/voc2008/workshop/index.html, 2008. T. Evgeniou, M. Pontil, and T. Poggio. Regularization networks and support vector machines. Advances in Computational Mathematics, 13:1–50, 2000. L. Fei-Fei, R. Fergus, and P. Perona. One-shot learning of object categories. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(4):594–611, 2006. R. Fisher. The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7:179– 188, 1936. 638 N ON -S PARSE M ULTIPLE K ERNEL F ISHER D ISCRIMINANT A NALYSIS G. Fung and O. L. Mangasarian. Proximal support vector machine classifier. In International Conference on Knowledge Discovery and Data Mining, 2001. P. Gehler and S. Nowozin. On feature combination for multiclass object classification. In International Conference on Computer Vision, 2009. J. Gemert, J. Geusebroek, C. Veenman, and A. Smeulders. Kernel codebooks for scene categorization. In European Conference on Computer Vision, 2008. T. Gestel, J. Suykens, G. Lanckriet, A. Lambrechts, B. Moor, and J. Vandewalle. Bayesian framework for least-squares support vector machine classifiers, gaussian processes, and kernel fisher discriminant analysis. Machine Learning, 14(5):1115–1147, 2002. F. Girosi, M. Jones, and T. Poggio. Regularization theory and neural networks architectures. Neural Computation, 7:219–269, 1995. G. Golub and C. van Loan. Matrix Computations. John Hopkins University Press, third edition, 1996. K. Grauman and T. Darrell. The pyramid match kernel: Efficient learning with sets of features. Journal of Machine Learning Research, 8:725–760, 2007. T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer, second edition, 2002. R. Herbrich, T. Graeple, and C. Campbell. Bayes point machines. Journal of Machine Learning Research, 1:245–279, 2001. R. Hettich and K. Kortanek. Semi-infinite programming: Theory, methods, and applications. SIAM Review, 35(3):380–429, 1993. T. Joachims. Making Large-Scale Support Vector Machine Learning Practical. MIT Press, Cambridge, MA, 1988. S. Keerthi and S. Shevade. Smo algorithm for least squares svm formulations. Neural Computation, 15(2):487–507, 2003. S. Kim, A. Magnani, and S. Boyd. Optimal kernel selection in kernel fisher discriminant analysis. In International Conference on Machine Learning, 2006. M. Kloft, U. Brefeld, P. Laskov, and S. Sonnenburg. Non-sparse multiple kernel learning. In NIPS Workshop on Kernel Learning: Automatic Selection of Optimal Kernels, 2008. M. Kloft, U. Brefeld, S. Sonnenburg, and A. Zien. Efficient and accurate lp-norm mkl. In Advances in Neural Information Processing Systems, 2009. M. Kloft, U. Brefeld, S. Sonnenburg, and A. Zien. Lp norm multiple kernel learning. Journal of Machine Learning Research, 12:953–997, 2011. G. Lanckriet, N. Cristianini, P. Bartlett, L. E. Ghaoui, and M. Jordan. Learning teh kernel matrix with semi-definite programming. In International Conference on Machine Learning, 2002. 639 YAN , K ITTLER , M IKOLAJCZYK AND TAHIR G. Lanckriet, N. Cristianini, P. Bartlett, L. E. Ghaoui, and M. Jordan. Learning the kernel matrix with semidefinite programming. Journal of Machine Learning Research, 5:27–72, 2004. S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In International Conference on Computer Vision and Pattern Recognition, 2006. D. Liu and J. Nocedal. On the limited memory method for large scale optimization. Mathematical Programming B, 45(3):503–528, 1989. J. Lopez and J. Suykens. First and second order smo algorithms for ls-svm classifiers. Neural Processing Letters, 33(1):31–44, 2011. D. Lowe. Distincetive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2):91–110, 2004. S. Mika. Kernel fisher discriminants. PhD Thesis, University of Technology, Berlin, Germany, 2002. S. Mika, G. R¨ tsch, J. Weston, B. Sch¨ lkopf, and K. M¨ ller. Fisher discriminant analysis with a o u kernels. In IEEE Signal Processing Society Workshop: Neural Networks for Signal Processing, 1999. K. Mikolajczyk and C. Schmid. A performance evaluation of local descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(10):1615–1630, 2005. M. Momma, K. Hatano, and H. Nakayama. Ellipsoidal support vector machines. In Asian Conference on Machine Learning, 2010. S. Nakajima, A. Binder, C. Muller, W. Wojcikiewicz, M. Kloft, U. Brefeld, K. M¨ ller, and u M. Kawanabe. Multiple kernel learning for object classification. Technical Report on Information-Based Induction Sciences, 2009. M. Nilsback and A. Zisserman. Automated flower classification over a large number of classes. In Indian Conference on Computer Vision, Graphics and Image Processing, 2008. C. Ong and A. Zien. An automated combination of kernels for predicting protein subcellular localization. In Workshop on Algorithms in Bioinformatics, 2008. C. Ong, A. Smola, and R. C. Williamson. Hyperkernels. In Advances in Neural Information Processing Systems, 2003. F. Orabona and L. Jie. Ultra-fast optimization algorithm for sparse multi kernel learning. In International Conference on Machine Learning, 2011. F. Orabona, L. Jie, and B. Caputo. Online-batch strongly convex multi kerenl learning. In International Conference on Computer Vision and Pattern Recognition, 2010. T. Poggio, S. Mukherjee, R. Rifkin, A. Rakhlin, and A. Verri. B. In Conference on Uncertainty in Geometric Computations, 2004. 640 N ON -S PARSE M ULTIPLE K ERNEL F ISHER D ISCRIMINANT A NALYSIS A. Rakotomamonjy, F. Bach, Y. Grandvalet, and S. Canu. Simplemkl. Journal of Machine Learning Research, 9:2491–2521, 2008. G. R¨ tsch. Robust boosting via convex optimization. PhD Thesis, University of Potsdam, Potsdam, a Germany, 2001. R. Rifkin. Everything old is new again: a fresh look at historical approaches in machine learning. PhD Thesis, Massachusetts Institute of Technology, Boston, USA, 2002. P. Rujan. Playing billiard in version space. Neural Computation, 9:99–122, 1997. K. Sande, T. Gevers, and C. Snoek. Evaluation of color descriptors for object and scene recognition. In International Conference on Computer Vision and Pattern Recognition, 2008. C. Saunders, A. Gammerman, and V. Vovk. Ridge regression learning algorithm in dual variables. In International Conference on Machine Learning, 1998. B. Sch¨ lkopf and A. Smola. Learning with Kernels. MIT Press, 2002. o B. Sch¨ lkopf, A. Smola, and K. M¨ ller. Kernel principal component analysis. Advances in Kernel o u Methods: Support Vector Learning, pages 327–352, 1999. J. Shawe-Taylor and N. Cristianini. Kernel Methods for Pattern Analysis. Cambridge University Press, 2004. C. Snoek, M. Worring, J. Gemert, J. Geusebroek, and A. Smeulders. The challenge problem for automated detection of 101 semantic concepts in multimedia. In ACM Multimedia Conference, 2006. S. Sonnenburg, G. R¨ tsch, C. Schafer, and B. Sch¨ lkopf. Large scale multiple kernel learning. a o Journal of Machine Learning Research, 7:1531–1565, 2006. S. Sonnenburg, G. R¨ tsch, S. Henschel, C. Widmer, J. Behr, A. Zien adn F. Bona, A. Binder, C. Gehl, a and V. Franc. The shogun machine learning toolbox. Journal of Machine Learning Research, 11: 1799–1802, 2010. J. Suykens and J. Vandewalle. Least squares support vector machine classifiers. Neural Processing Letters, 9:293–300, 1999. M. Szafranski, Y. Grandvalet, and A. Rakotomamonjy. Composite kernel learning. In International Conference on Machine Learning, 2008. A. Tahir, J. Kittler, K. Mikolajczyk, F. Yan, K. Sande, and T. Gevers. Visual category recognition using spectral regression and kernel discriminant analysis. In International Workshop on Subspace Methods, 2009. A. Tikhonov and V. Arsenin. Solutions of Ill-Posed Problems. Winston, Washington DC, 1977. V. Vapnik. The Nature of Statistical Learning Theory. Springer-Verlag, 1999. S. Vishwanathan, Z. Sun, and N. Theera-Ampornpunt. Multiple kernel learning and the smo algorithm. In Advances in Neural Information Processing Systems, 2010. 641 YAN , K ITTLER , M IKOLAJCZYK AND TAHIR F. Yan, J. Kittler, K. Mikolajczyk, and A. Tahir. Non-sparse multiple kernel learning for fisher discriminant analysis. In International Conference on Data Mining, 2009a. F. Yan, K. Mikolajczyk, J. Kittler, and A. Tahir. A comparison of l1 norm and l2 norm multiple kernel svms in image and video classification. In International Workshop on Content-Based Multimedia Indexing, 2009b. F. Yan, K. Mikolajczyk, M. Barnard, H. Cai, and J. Kittler. Lp norm multiple kernel fisher discriminant analysis for object and image categorisation. In International Conference on Computer Vision and Pattern Recognition, 2010. J. Ye, S. Ji, and J. Chen. Multi-class discriminant kernel learning via convex programming. Journal of Machine Learning Research, 9:719–758, 2008. J. Zhang, M. Marszalek, S. Lazebnik, and C. Schmid. Local features and kernels for classification of texture and object categories: A comprehensive study. International Journal of Computer Vision, 73(2):213–238, 2007. A. Zien and C. Ong. Multiclass multiple kernel learning. In International Conference on Machine Learning, 2007. 642