nips nips2010 nips2010-124 nips2010-124-reference knowledge-graph by maker-knowledge-mining

124 nips-2010-Inductive Regularized Learning of Kernel Functions

Source: pdf

Author: Prateek Jain, Brian Kulis, Inderjit S. Dhillon

Abstract: In this paper we consider the problem of semi-supervised kernel function learning. We ﬁrst propose a general regularized framework for learning a kernel matrix, and then demonstrate an equivalence between our proposed kernel matrix learning framework and a general linear transformation learning problem. Our result shows that the learned kernel matrices parameterize a linear transformation kernel function and can be applied inductively to new data points. Furthermore, our result gives a constructive method for kernelizing most existing Mahalanobis metric learning formulations. To make our results practical for large-scale data, we modify our framework to limit the number of parameters in the optimization process. We also consider the problem of kernelized inductive dimensionality reduction in the semi-supervised setting. To this end, we introduce a novel method for this problem by considering a special case of our general kernel learning framework where we select the trace norm function as the regularizer. We empirically demonstrate that our framework learns useful kernel functions, improving the k-NN classiﬁcation accuracy signiﬁcantly in a variety of domains. Furthermore, our kernelized dimensionality reduction technique signiﬁcantly reduces the dimensionality of the feature space while achieving competitive classiﬁcation accuracies. 1

reference text

[1] K. Tsuda, G. R¨ tsch, and M. K. Warmuth. Matrix exponentiated gradient updates for on-line learning and a Bregman projection. JMLR, 6:995–1018, 2005.

[2] J. T. Kwok and I. W. Tsang. Learning with idealized kernels. In ICML, 2003.

[3] N. Cristianini, J. Shawe-Taylor, A. Elisseeff, and J. Kandola. On kernel-target alignment. In NIPS, 2001.

[4] C. S. Ong, A. J. Smola, and R. C. Williamson. Learning the kernel with hyperkernels. JMLR, 6:1043– 1071, 2005.

[5] G. R. G. Lanckriet, N. Cristianini, P. L. Bartlett, L. El Ghaoui, and M. I. Jordan. Learning the kernel matrix with semideﬁnite programming. JMLR, 5:27–72, 2004.

[6] Xiaojin Zhu, Jaz Kandola, Zoubin Ghahramani, and John Lafferty. Nonparametric transforms of graph kernels for semi-supervised learning. In Lawrence K. Saul, Yair Weiss, and L´ on Bottou, editors, NIPS, e volume 17, pages 1641–1648, 2005.

[7] Peter V. Gehler and Sebastian Nowozin. Let the kernel ﬁgure it out; principled learning of pre-processing for kernel classiﬁers. In CVPR, pages 2836–2843, 2009.

[8] Matthias Seeger. Cross-validation optimization for large scale hierarchical classiﬁcation kernel methods. In NIPS, pages 1233–1240, 2006.

[9] Yoshua Bengio, Olivier Delalleau, Nicolas Le Roux, Jean-Francois Paiement, Pascal Vincent, and Marie Ouimet. Learning eigenfunctions links spectral embedding and kernel PCA. Neural Computation, 16(10):2197–2219, 2004.

[10] E. P. Xing, A. Y. Ng, M. I. Jordan, and S. J. Russell. Distance metric learning with application to clustering with side-information. In NIPS, pages 505–512, 2002.

[11] K. Q. Weinberger, J. Blitzer, and L. K. Saul. Distance metric learning for large margin nearest neighbor classiﬁcation. In NIPS, 2005.

[12] J. V. Davis, B. Kulis, P. Jain, S. Sra, and I. S. Dhillon. Information-theoretic metric learning. In ICML, pages 209–216, 2007.

[13] S. Shalev-Shwartz, Y. Singer, and A. Y. Ng. Online and batch learning of pseudo-metrics. In ICML, 2004.

[14] A. Globerson and S. T. Roweis. Metric learning by collapsing classes. In NIPS, 2005.

[15] J. Goldberger, S. Roweis, G. Hinton, and R. Salakhutdinov. Neighbourhood component analysis. In NIPS, 2004.

[16] B. Kulis, S. Sra, and I. S. Dhillon. Convex perturbations for scalable semideﬁnite programming. In AISTATS, 2009.

[17] R. Chatpatanasiri, T. Korsrilabutr, P. Tangchanachaianan, and B. Kijsirikul. On kernelization of supervised Mahalanobis distance learners, 2008.

[18] Andreas Argyriou, Charles A. Micchelli, and Massimiliano Pontil. On spectral learning. JMLR, 11:935– 953, 2010.

[19] F. R. Bach and M. I. Jordan. Predictive low-rank decomposition for kernel methods. In ICML, pages 33–40, 2005.

[20] L. Song, A. Smola, K. M. Borgwardt, and A. Gretton. Colored maximum variance unfolding. In NIPS, pages 1385–1392, 2007.

[21] Y. Song, F. Nie, C. Zhang, and S. Xiang. A uniﬁed framework for semi-supervised dimensionality reduction. Pattern Recognition, 41(9):2789–2799, 2008.

[22] K. Fukumizu, F. R. Bach, and M. I. Jordan. Kernel dimensionality reduction for supervised learning. In NIPS, 2003.

[23] R. Urtasun and T. Darrell. Discriminative gaussian process latent variable model for classiﬁcation. In ICML, pages 927–934, 2007.

[24] S. Mika, B. Sch¨ lkopf, A. J. Smola, K. M¨ ller, M. Scholz, and G. R¨ tsch. Kernel pca and de-noising in o u a feature spaces. In NIPS, pages 536–542, 1998.

[25] V. Sindhwani, P. Niyogi, and M. Belkin. Beyond the point cloud: from transductive to semi-supervised learning. In ICML, pages 824–831, 2005.

[26] Brian Kulis, M´ ty´ s Sustik, and Inderjit S. Dhillon. Learning low-rank kernel matrices. In ICML, pages a a 505–512, 2006.

[27] Matthew Schultz and Thorsten Joachims. Learning a distance metric from relative comparisons. In NIPS, 2003.

[28] A. Bar-Hillel, T. Hertz, N. Shental, and D. Weinshall. Learning a mahalanobis metric from equivalence constraints. JMLR, 6:937–965, 2005.

[29] P. Jain, B. Kulis, and K. Grauman. Fast image search for learned metrics. In CVPR, 2008.

[30] B. Recht, M. Fazel, and P. A. Parrilo. Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization, 2007.

[31] J. Cai, E. J. Candes, and Z. Shen. A singular value thresholding algorithm for matrix completion, 2008.

[32] K. Grauman and T. Darrell. The Pyramid Match Kernel: Efﬁcient learning with sets of features. Journal of Machine Learning Research (JMLR), 8:725–760, April 2007.

[33] S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In CVPR, pages 2169–2178, 2006.

[34] A. C. Berg and J. Malik. Geometric blur for template matching. In CVPR, pages 607–614, 2001. 9