jmlr jmlr2011 jmlr2011-4 jmlr2011-4-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Jinfeng Zhuang, Ivor W. Tsang, Steven C.H. Hoi
Abstract: Previous studies of Non-Parametric Kernel Learning (NPKL) usually formulate the learning task as a Semi-Definite Programming (SDP) problem that is often solved by some general purpose SDP solvers. However, for N data examples, the time complexity of NPKL using a standard interiorpoint SDP solver could be as high as O(N 6.5 ), which prohibits NPKL methods applicable to real applications, even for data sets of moderate size. In this paper, we present a family of efficient NPKL algorithms, termed “SimpleNPKL”, which can learn non-parametric kernels from a large set of pairwise constraints efficiently. In particular, we propose two efficient SimpleNPKL algorithms. One is SimpleNPKL algorithm with linear loss, which enjoys a closed-form solution that can be efficiently computed by the Lanczos sparse eigen decomposition technique. Another one is SimpleNPKL algorithm with other loss functions (including square hinge loss, hinge loss, square loss) that can be re-formulated as a saddle-point optimization problem, which can be further resolved by a fast iterative algorithm. In contrast to the previous NPKL approaches, our empirical results show that the proposed new technique, maintaining the same accuracy, is significantly more efficient and scalable. Finally, we also demonstrate that the proposed new technique is also applicable to speed up many kernel learning tasks, including colored maximum variance unfolding, minimum volume embedding, and structure preserving embedding. Keywords: non-parametric kernel learning, semi-definite programming, semi-supervised learning, side information, pairwise constraints
F. Alizadeh, J.-P. A. Haeberly, and M. L. Overton. Complementarity and nondegeneracy in semidefinite programming. Mathematical Programming, 77:111–128, 1997. A. Argyriou, C. A. Micchelli, and M. Pontil. Learning convex combinations of continuously parameterized basic kernels. In Proceedings of Annual Conference on Learning Theory, pages 338–352, 2005. F. R. Bach and Z. Harchaoui. DIFFRAC: a discriminative and flexible framework for clustering. In Advances in Neural Information Processing Systems, pages 49–56, 2008. F. R. Bach, G. R. G. Lanckriet, and M. I. Jordan. Multiple kernel learning, conic duality, and the smo algorithm. In Proceedings of International Conference on Machine Learning, 2004. A. Beygelzimer, S. Kakade, and J. Langford. Cover trees for nearest neighbor. In Proceedings of International Conference on Machine Learning, pages 97–104, 2006. J. F. Bonnans and E. Shapiro. Optimization problems with perturbations, a guided tour. SIAM Review, 40:228–264, 1996. S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004. S. Boyd and L. Xiao. Least-squares covariance matrix adjustment. SIAM Journal on Matrix Analysis and Applications, 27(2):532–546, 2005. O. Chapelle, J. Weston, and B. Sch¨ lkopf. Cluster kernels for semi-supervised learning. In Advances o in Neural Information Processing Systems 15, pages 585–592, 2003. J. Chen and J. Ye. Training SVM with indefinite kernels. In International Conference on Machine Learning, pages 136–143, 2008. N. Cristianini, J. Shawe-Taylor, A. Elisseeff, and J. S. Kandola. On kernel-target alignment. In Advances in Neural Information Processing Systems 14, pages 367–373, 2002. L. Duan, I.W. Tsang, D. Xu, and S.J. Maybank. Domain transfer SVM for video concept detection. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2009. P. V. Gehler and S. Nowozin. Infinite kernel learning. In TECHNICAL REPORT NO. TR-178,Max Planck Institute for Biological Cybernetics, 2008. M. G¨ nen and E. Alpaydin. Localized multiple kernel learning. In Proceedings of International o Conference on Machine Learning, pages 352–359, 2008. A. Gretton, O. Bousquet, A. J. Smola, and B. Sch¨ lkopf. Measuring statistical dependence with o hilbert-schmidt norms. In Proceedings of International Conference on Algorithmic Learning Theory, pages 63–77, 2005. 1344 A FAMILY OF S IMPLE N ON -PARAMETRIC K ERNEL L EARNING A LGORITHMS T. Hofmann, B. Sch¨ lkopf, and A. J. Smola. Kernel methods in machine learning. Annals of o Statistics, 36(3):1171–1220, 2008. S. C. H. Hoi and R. Jin. Active kernel learning. In Proceedings of International Conference on Machine Learning, pages 400–407, 2008. S. C. H. Hoi, M. R. Lyu, and E. Y. Chang. Learning the unified kernel machines for classification. In Proceedings of ACM SIGKDD conference on Knowledge Discovery and Data Mining, pages 187–196, 2006. S. C. H. Hoi, R. Jin, and M. R. Lyu. Learning nonparametric kernel matrices from pairwise constraints. In Proceedings of International Conference on Machine Learning, pages 361–368, 2007. T. Jebara and V. Shchogolev. B-matching for spectral clustering. In European Conference on Machine Learning, pages 679–686, September 2006. R. Johnson and T. Zhang. Graph-based semi-supervised learning and spectral kernel design. IEEE Transactions on Information Theory, 54(1):275–288, 2008. K. Krishnan and J. E. Mitchell. A unifying framework for several cutting plane methods for semidefinite programming. Optimization Methods and Software, 21(1):57–74, 2006. B. Kulis, M. Sustik, and I. S. Dhillon. Learning low-rank kernel matrices. In Proceedings of International Conference on Machine Learning, pages 505–512, 2006. B. Kulis, M. A. Sustik, and I. S. Dhillon. Low-rank kernel learning with bregman matrix divergences. Journal of Machine Learning Research, 10:341–376, 2009. J. Kwok and I. W. Tsang. Learning with idealized kernels. In Proceedings of International Conference on Machine Learning, pages 400–407, 2003. G. R. G. Lanckriet, N. Cristianini, P. L. Bartlett, L. El Ghaoui, and M. I. Jordan. Learning the kernel matrix with semidefinite programming. Journal of Machine Learning Research, 5:27–72, 2004. R. B. Lehoucq, D. C. Sorensen, and C. Yang. ARPACK users guide: Solution of large scale eigenvalue problems with implicitly restarted arnoldi methods. Technical report, 1998. D. P. Lewis, T. Jebara, and W. S. Noble. Nonstationary kernel combination. In Proceedings of International Conference on Machine Learning, pages 553–560, 2006. F. Li, Y. Fu, Y.-H. Dai, C. Sminchisescu, and J. Wang. Kernel learning by unconstrained optimization. In Proceedings of International Conference on Artificial Intelligence and Statistics, 2009. M.S. Lobo, L. Vandenberghe, S. Boyd, and H. Lebret. Applications of second-order cone programming. Linear Algebra Applications, 284:193–228, 1998. R. Luss and A. d’Aspremont. Support vector machine classification with indefinite kernels. In Advances in Neural Information Processing Systems 20, 2008. 1345 Z HUANG , T SANG AND H OI Q. Mao and I. W. Tsang. Parameter-free spectral kernel learning. In Conference on Uncertainty in Artificial Intelligence, 2010. Q. Mao and I. W. Tsang. Multiple template learning for structured prediction. arXiv CoRR 1103.0890, Mar 2011. C. A. Micchelli and M. Pontil. Learning the kernel function via regularization. Journal of Machine Learning Research, 6:1099–1125, 2005. Y. Nesterov. Smooth minimization of non-smooth functions. Mathematical Programming, 103: 127–152, 2005. Y. Nesterov and A. Nemirovskii. Interior-point Polynomial Algorithms in Convex Programming. Society for Industrial and Applied Mathematics, Philadelphia, PA, 1994. G. Pataki. On the rank of extreme matrices in semidefinite programs and the multiplicity of optimal eigenvalues. Technical Report MSRR-604, Carnegie Mellon University, August 1995. F. R. Rakotomamonjy, A.and Bach, S. Canu, and Y. Grandvalet. SimpleMKL. Journal of Machine Learning Research, 9:2491–2521, 2008. B. Shaw and T. Jebara. Minimum volume embedding. In Proceedings of International Conference on Artificial Intelligence and Statistics, 2007. B. Shaw and T. Jebara. Structure preserving embedding. In Proceedings of Interational Conference on Machine Learning, page 118, 2009. V. Sindhwani, P. Niyogi, and M. Belkin. Beyond the point cloud: from transductive to semisupervised learning. In Proceedings of International Conference on Machine Learning, pages 824–831, 2005. L. Song, A. J. Smola, K. M. Borgwardt, and A. Gretton. Colored maximum variance unfolding. In Advances in Neural Information Processing Systems 20, 2008. S. Sonnenburg, G. R¨ tsch, and C. Sch¨ fer. A general and efficient multiple kernel learning algoa a rithm. In Advances in Neural Information Processing Systems 18, 2006a. S. Sonnenburg, G. R¨ tsch, C. Sch¨ fer, and B. Sch¨ lkopf. Large scale multiple kernel learning. a a o Journal of Machine Learning Research, 7:1531–1565, 2006b. J. Sun, X. Wu, S. Yan, L.-F. Cheong, T.-S. Chua, and J. Li. Hierarchical spatio-temporal context modeling for action recognition. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2009. M. Varma and B. R. Babu. More generality in efficient multiple kernel learning. In Proceedings of Interational Conference on Machine Learning, page 134, 2009. A. Vedaldi, V. Gulshan, M. Varma, and A. Zisserman. Multiple kernels for object detection. In Proceedings of IEEE International Conference on Computer Vision, 2009. 1346 A FAMILY OF S IMPLE N ON -PARAMETRIC K ERNEL L EARNING A LGORITHMS K. Q. Weinberger and L. K. Saul. Fast solvers and efficient implementations for distance metric learning. In Proceedings of International Conference on Machine Learning, pages 1160–1167, 2008. K. Q. Weinberger, F. Sha, and L. K. Saul. Learning a kernel matrix for nonlinear dimensionality reduction. In Proceedings of International Conference on Machine Learning, 2004. E. P. Xing, A. Y. Ng, M. I. Jordan, and S. Russell. Distance metric learning with application to clustering with side-information. In Advances in Neural Information Processing Systems 15, 2003. Z. Xu, R. Jin, I. King, and M. R. Lyu. An extended level method for efficient multiple kernel learning. In Advances in Neural Information Processing Systems, pages 1825–1832, 2008. Y. Ying, C. Campbell, and M. Girolami. Analysis of SVM with indefinite kernels. In Advances in Neural Information Processing Systems, 2010. T. Zhang and R. Ando. Analysis of spectral kernel design based semi-supervised learning. In Advances in Neural Information Processing Systems 18, 2006. X. Zhu, J. S. Kandola, Z. Ghahramani, and J. D. Lafferty. Nonparametric transforms of graph kernels for semi-supervised learning. In Advances in Neural Information Processing Systems, 2005. J. Zhuang and S. C. H. Hoi. Non-parametric kernel ranking approach for social image retrieval. In Proceedings of the 9th ACM International Conference on Image and Video Retrieval, pages 26–33, 2010. J. Zhuang, I. W. Tsang, and S. C. H. Hoi. SimpleNPKL: simple non-parametric kernel learning. In Proceedings of Interational Conference on Machine Learning, 2009. J. Zhuang, I. W. Tsang, and S. C. H. Hoi. Two-layer multiple kernel learning. In Proceedings of Interational Conference on Artificial Intelligence and Statistics, 2011. 1347