jmlr jmlr2012 jmlr2012-66 jmlr2012-66-reference knowledge-graph by maker-knowledge-mining

66 jmlr-2012-Metric and Kernel Learning Using a Linear Transformation

Source: pdf

Author: Prateek Jain, Brian Kulis, Jason V. Davis, Inderjit S. Dhillon

Abstract: Metric and kernel learning arise in several machine learning applications. However, most existing metric learning algorithms are limited to learning metrics over low-dimensional data, while existing kernel learning algorithms are often limited to the transductive setting and do not generalize to new data points. In this paper, we study the connections between metric learning and kernel learning that arise when studying metric learning as a linear transformation learning problem. In particular, we propose a general optimization framework for learning metrics via linear transformations, and analyze in detail a special case of our framework—that of minimizing the LogDet divergence subject to linear constraints. We then propose a general regularized framework for learning a kernel matrix, and show it to be equivalent to our metric learning framework. Our theoretical connections between metric and kernel learning have two main consequences: 1) the learned kernel matrix parameterizes a linear transformation kernel function and can be applied inductively to new data points, 2) our result yields a constructive method for kernelizing most existing Mahalanobis metric learning formulations. We demonstrate our learning approach by applying it to large-scale real world problems in computer vision, text mining and semi-supervised kernel dimensionality reduction. Keywords: divergence metric learning, kernel learning, linear transformation, matrix divergences, logdet

reference text

A. Argyriou, C. A. Micchelli, and M. Pontil. On spectral learning. Journal of Machine Learning Research (JMLR), 11:935–953, 2010. S. Arora and S. Kale. A combinatorial, primal-dual approach to semideﬁnite programs. In Proceedings of the ACM Symposium on Theory of Computing (STOC), pages 227–236, 2007. A. Bar-Hillel, T. Hertz, N. Shental, and D. Weinshall. Learning a mahalanobis metric from equivalence constraints. Journal of Machine Learning Research (JMLR), 6:937–965, 2005. S. Basu, M. Bilenko, and R. J. Mooney. A probabilistic framework for semi-supervised clustering. In Proceedings of the ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD), 2004. 544 M ETRIC AND K ERNEL L EARNING U SING A L INEAR T RANSFORMATION Y. Bengio, O. Delalleau, N. Le Roux, J. Paiement, P. Vincent, and M. Ouimet. Learning eigenfunctions links spectral embedding and kernel PCA. Neural Computation, 16(10):2197–2219, 2004. A. C. Berg and J. Malik. Geometric blur for template matching. In Proccedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pages 607–614, 2001. J. Cai, E. J. Candes, and Z. Shen. A singular value thresholding algorithm for matrix completion. Arxiv:0810.3286, 2008. Caltech-101 Data Set. http://www.vision.caltech.edu/Image Datasets/Caltech101/, 2004. R. Chatpatanasiri, T. Korsrilabutr, P. Tangchanachaianan, and B. Kijsirikul. A new kernelization framework for Mahalanobis distance learning algorithms. Neurocomputing, 73(10–12):1570– 1579, 2010. S. Chopra, R. Hadsell, and Y. LeCun. Learning a similarity metric discriminatively, with application to face veriﬁcation. In Proccedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), 2005. Classic3 Data Set. ftp.cs.cornell.edu/pub/smart, 2008. CMU 20-Newsgroups Data Set. 20/www/data/news20.html, 2008. http://www.cs.cmu.edu/afs/cs.cmu.edu/project/theo- N. Cristianini, J. Shawe-Taylor, A. Elisseeff, and J. Kandola. On kernel-target alignment. In Proccedings of Advances in Neural Information Processing Systems (NIPS), 2001. J. V. Davis and I. S. Dhillon. Structured metric learning for high dimensional problems. In Proceedings of the ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD), pages 195–203, 2008. J. V. Davis, B. Kulis, P. Jain, S. Sra, and I. S. Dhillon. Information-theoretic metric learning. In Proccedings of the International Conference on Machine Learning (ICML), pages 209–216, 2007. S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harshman. Indexing by latent semantic analysis. Journal of the American Society of Information Science, 41(6):391–407, 1990. R. Fletcher. A new variational result for quasi-Newton formulae. SIAM Journal on Optimization, 1 (1), 1991. A. Globerson and S. T. Roweis. Metric learning by collapsing classes. In Proccedings of Advances in Neural Information Processing Systems (NIPS), 2005. J. Goldberger, S. Roweis, G. Hinton, and R. Salakhutdinov. Neighbourhood component analysis. In Proccedings of Advances in Neural Information Processing Systems (NIPS), 2004. 545 JAIN , K ULIS , DAVIS AND D HILLON K. Grauman and T. Darrell. The Pyramid Match Kernel: Efﬁcient learning with sets of features. Journal of Machine Learning Research (JMLR), 8:725–760, April 2007. M. Groschel, L. Lovasz, and A. Schrijver. Geometric Algorithms and Combinatorial Optimization. Springer-Verlag, 1988. J. Ha, C. J. Rossbach, J. V. Davis, I. Roy, H. E. Ramadan, D. E. Porter, D. L. Chen, and E. Witchel. Improved error reporting for software that uses black-box components. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 101– 111, 2007. T. Hastie and R. Tibshirani. Discriminant adaptive nearest neighbor classiﬁcation. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 18:607–616, 1996. P. Jain, B. Kulis, I. S. Dhillon, and K. Grauman. Online metric learning and fast similarity search. In Proccedings of Advances in Neural Information Processing Systems (NIPS), pages 761–768, 2008. P. Jain, B. Kulis, and I. S. Dhillon. Inductive regularized learning of kernel functions. In Proccedings of Advances in Neural Information Processing Systems (NIPS), 2010. W. James and C. Stein. Estimation with quadratic loss. In Fourth Berkeley Symposium on Mathematical Statistics and Probability, volume 1, pages 361–379. Univ. of California Press, 1961. B. Kulis, M. Sustik, and I. S. Dhillon. Learning low-rank kernel matrices. In Proccedings of the International Conference on Machine Learning (ICML), pages 505–512, 2006. B. Kulis, M. Sustik, and I. Dhillon. Low-rank kernel learning with Bregman matrix divergences. Journal of Machine Learning Research, 2008. B. Kulis, P. Jain, and K. Grauman. Fast similarity search for learned metrics. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 31(12):2143–2157, 2009a. B. Kulis, S. Sra, and I. S. Dhillon. Convex perturbations for scalable semideﬁnite programming. In Proceedings of the International Conference on Artiﬁcial Intelligence and Statistics (AISTATS), 2009b. J. T. Kwok and I. W. Tsang. Learning with idealized kernels. In Proccedings of the International Conference on Machine Learning (ICML), 2003. G. R. G. Lanckriet, N. Cristianini, P. L. Bartlett, L. El Ghaoui, and M. I. Jordan. Learning the kernel matrix with semideﬁnite programming. Journal of Machine Learning Research (JMLR), 5:27–72, 2004. S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proccedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pages 2169–2178, 2006. G. Lebanon. Metric learning for text documents. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 28(4):497–508, 2006. ISSN 0162-8828. 546 M ETRIC AND K ERNEL L EARNING U SING A L INEAR T RANSFORMATION M. A. Nielsen and I. L. Chuang. Quantum Computation and Quantum Information. Cambridge University Press, 2000. C. S. Ong, A. J. Smola, and R. C. Williamson. Learning the kernel with hyperkernels. Journal of Machine Learning Research (JMLR), 6:1043–1071, 2005. B. Recht, M. Fazel, and P. A. Parrilo. Guaranteed minimum rank solutions to linear matrix equations via nuclear norm minimization. SIAM Review, 52(3):471–501, 2010. M. Schultz and T. Joachims. Learning a distance metric from relative comparisons. In Proccedings of Advances in Neural Information Processing Systems (NIPS), 2003. M. Seeger. Cross-validation optimization for large scale hierarchical classiﬁcation kernel methods. In Proccedings of Advances in Neural Information Processing Systems (NIPS), pages 1233–1240, 2006. S. Shalev-Shwartz, Y. Singer, and A. Y. Ng. Online and batch learning of pseudo-metrics. In Proccedings of the International Conference on Machine Learning (ICML), 2004. V. Sindhwani, P. Niyogi, and M. Belkin. Beyond the point cloud: from transductive to semisupervised learning. In Proccedings of the International Conference on Machine Learning (ICML), pages 824–831, 2005. K. Tsuda, G. R¨ tsch, and M. K. Warmuth. Matrix exponentiated gradient updates for on-line learna ing and Bregman projection. Journal of Machine Learning Research (JMLR), 6:995–1018, 2005. M. K. Warmuth and D. Kuzmin. Randomized online PCA algorithms with regret bounds that are logarithmic in the dimension. Journal of Machine Learning Research, 9:2287–2320, 2008. K. Q. Weinberger and L. K. Saul. Fast solvers and efﬁcient implementations for distance metric learning. In Proccedings of the International Conference on Machine Learning (ICML), 2008. K. Q. Weinberger, J. Blitzer, and L. K. Saul. Distance metric learning for large margin nearest neighbor classiﬁcation. In Proccedings of Advances in Neural Information Processing Systems (NIPS), 2005. E. P. Xing, A. Y. Ng, M. I. Jordan, and S. J. Russell. Distance metric learning with application to clustering with side-information. In Proccedings of Advances in Neural Information Processing Systems (NIPS), pages 505–512, 2002. H. Zhang, A. C. Berg, M. Maire, and J. Malik. SVM-KNN: Discriminative nearest neighbor classiﬁcation for visual category recognition. In Proccedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pages 2126–2136, 2006. X. Zhu, J. Kandola, Z. Ghahramani, and J. Lafferty. Nonparametric transforms of graph kernels for semi-supervised learning. In Lawrence K. Saul, Yair Weiss, and L´ on Bottou, editors, Proce cedings of Advances in Neural Information Processing Systems (NIPS), volume 17, pages 1641– 1648, 2005. 547