jmlr jmlr2009 jmlr2009-51 jmlr2009-51-reference knowledge-graph by maker-knowledge-mining

51 jmlr-2009-Low-Rank Kernel Learning with Bregman Matrix Divergences

Source: pdf

Author: Brian Kulis, Mátyás A. Sustik, Inderjit S. Dhillon

Abstract: In this paper, we study low-rank matrix nearness problems, with a focus on learning low-rank positive semideﬁnite (kernel) matrices for machine learning applications. We propose efﬁcient algorithms that scale linearly in the number of data points and quadratically in the rank of the input matrix. Existing algorithms for learning kernel matrices often scale poorly, with running times that are cubic in the number of data points. We employ Bregman matrix divergences as the measures of nearness—these divergences are natural for learning low-rank kernels since they preserve rank as well as positive semideﬁniteness. Special cases of our framework yield faster algorithms for various existing learning problems, and experimental results demonstrate that our algorithms can effectively learn both low-rank and full-rank kernel matrices. Keywords: kernel methods, Bregman divergences, convex optimization, kernel learning, matrix nearness

reference text

F. Bach and M. Jordan. Predictive low-rank decomposition for kernel methods. In Proc. 22nd International Conference on Machine Learning (ICML), 2005. 373 K ULIS , S USTIK AND D HILLON J. Barnes and P. Hut. A hierarchical O(n log n) force calculation algorithm. Nature, 324:446–449, 1986. M. Bilenko, S. Basu, and R. Mooney. Integrating constraints and metric learning in semi-supervised clustering. In Proceedings of the 21st International Conference on Machine Learning, 2004. L. Bregman. The relaxation method of ﬁnding the common point of convex sets and its application to the solution of problems in convex programming. USSR Comp. Mathematics and Mathematical Physics, 7:200–217, 1967. R.P. Brent. Algorithms for Minimization without Derivatives. Prentice-Hall, Englewood Cliffs, NJ., 1973. ISBN 0-13-022335-2. Y. Censor and S. Zenios. Parallel Optimization. Oxford University Press, 1997. T. M. Cover and J. A. Thomas. Elements of Information Theory. Wiley-Interscience, 1991. N. Cristianini, J. Shawe-Taylor, A. Elisseeff, and J. Kandola. On kernel-target alignment. In Advances in Neural Information Processing Systems (NIPS) 14, 2002. J. Davis, B. Kulis, S. Sra, and I. S. Dhillon. Information-theoretic metric learning. In Proc. 24th International Conference on Machine Learning (ICML), 2007. J. D. Demmel. Applied Numerical Linear Algebra. Society for Industrial and Applied Mathematics, 1997. I. S. Dhillon and J. A. Tropp. Matrix nearness problems with Bregman divergences. SIAM Journal on Matrix Analysis and Applications, 29(4):1120–1146, 2007. I. S. Dhillon, Y. Guan, and B. Kulis. Kernel k-means, spectral clustering and normalized cuts. In Proc. 10th ACM SIGKDD Conference, 2004. S. Fine and K. Scheinberg. Efﬁcient SVM training using low-rank kernel representations. Journal of Machine Learning Research, 2:243–264, 2001. R. Fletcher. A new variational result for quasi-Newton formulae. SIAM Journal on Optimziation, 1 (1), February 1991. A. Globerson and S. Roweis. Metric learning by collapsing classes. In Advances in Neural Information Processing Systems (NIPS) 18, 2005. G. Golub. Some modiﬁed matrix eigenvalue problems. SIAM Review, 15:318–334, 1973. G. H. Golub and C. F. Van Loan. Matrix Computations. Johns Hopkins University Press, 1996. L. Greengard and V. Rokhlin. A fast algorithm for particle simulations. J. Comput. Phys., 73: 325–348, 1987. M. Gu and S. Eisenstat. A stable and efﬁcient algorithm for the rank-one modiﬁcation of the symmetric eigenproblem. SIAM J. Matrix Anal. Appl., 15:1266–1276, 1994. 374 L OW-R ANK K ERNEL L EARNING WITH B REGMAN M ATRIX D IVERGENCES N. Higham. Computing the nearest correlation matrix—a problem from ﬁnance. IMA J. Numerical Analysis, 22(3):329–343, 2002. P. Jain, B. Kulis, and K. Grauman. Fast image search for learned metrics. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2008. W. James and C. Stein. Estimation with quadratic loss. In Proc. Fourth Berkeley Symposium on Mathematical Statistics and Probability, volume 1, pages 361–379. Univ. of California Press, 1961. B. Kulis, M. A. Sustik, and I. S. Dhillon. Learning low-rank kernel matrices. In Proc. 23rd International Conference on Machine Learning (ICML), 2006. B. Kulis, S. Sra, S. Jegelka, and I. S. Dhillon. Scalable semideﬁnite programming using convex perturbations. Technical Report TR-07-47, University of Texas at Austin, September 2007a. B. Kulis, A. Surendran, and J. Platt. Fast low-rank semideﬁnite programming for embedding and clustering. In Proc. 11th International Conference on AI and Statistics (AISTATS), 2007b. J. Kwok and I. Tsang. Learning with idealized kernels. In Proc. 20th International Conference on Machine Learning (ICML), 2003. G. Lanckriet, N. Cristianini, P. Bartlett, L. E. Ghaoui, and M. Jordan. Learning the kernel matrix with semideﬁnite programming. Journal of Machine Learning Research, 5:27–72, 2004. K. Lang. Fixing two weaknesses of the spectral method. In Advances in Neural Information Processing Systems (NIPS) 18, 2005. M.A. Nielsen and I.L. Chuang. Quantum Computation and Quantum Information. Cambridge University Press, 2000. M. Schultz and T. Joachims. Learning a distance metric from relative comparisons. In Advances in Neural Information Processing Systems (NIPS) 16, 2003. J. Shawe-Taylor and N. Cristianini. Kernel Methods for Pattern Analysis. Cambridge University Press, 2004. J. Sherman and W. J. Morrison. Adjustment of an inverse matrix corresponding to changes in the elements of a given column or a given row of the original matrix. Annals of Mathematical Statistics, 20, 1949. A. Strehl, J. Ghosh, and R. Mooney. Impact of similarity measures on web-page clustering. In Workshop on Artiﬁcial Intelligence for Web Search (AAAI), 2000. M. A. Sustik and I. S. Dhillon. On some modiﬁed root-ﬁnding problems. Working manuscript, 2008. L. Torresani and K. Lee. Large margin component analysis. In Advances in Neural Information Processing Systems (NIPS) 19, 2006. 375 K ULIS , S USTIK AND D HILLON K. Tsuda, G. R´ tsch, and M. Warmuth. Matrix exponentiated gradient updates for online learning a and Bregman projection. Journal of Machine Learning Research, 6:995–1018, 2005. M. K. Warmuth and D. Kuzmin. Randomized PCA algorithms with regret bounds that are logarithmic in the dimension. In Advances in Neural Information Processing Systems (NIPS) 20, 2006. K. Weinberger, F. Sha, and L. Saul. Learning a kernel matrix for nonlinear dimensionality reduction. In Proc. 21st International Conference on Machine Learning (ICML), 2004. K. Weinberger, J. Blitzer, and L. Saul. Distance metric learning for large margin nearest neighbor classiﬁcation. In Advances in Neural Information Processing Systems (NIPS) 18, 2005. K. Weinberger, F. Sha, Q. Zhu, and L. Saul. Graph Laplacian methods for large-scale semideﬁnite programming, with an application to sensor localization. In Advances in Neural Information Processing Systems (NIPS) 19, 2006. 376