nips nips2010 nips2010-124 nips2010-124-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Prateek Jain, Brian Kulis, Inderjit S. Dhillon
Abstract: In this paper we consider the problem of semi-supervised kernel function learning. We first propose a general regularized framework for learning a kernel matrix, and then demonstrate an equivalence between our proposed kernel matrix learning framework and a general linear transformation learning problem. Our result shows that the learned kernel matrices parameterize a linear transformation kernel function and can be applied inductively to new data points. Furthermore, our result gives a constructive method for kernelizing most existing Mahalanobis metric learning formulations. To make our results practical for large-scale data, we modify our framework to limit the number of parameters in the optimization process. We also consider the problem of kernelized inductive dimensionality reduction in the semi-supervised setting. To this end, we introduce a novel method for this problem by considering a special case of our general kernel learning framework where we select the trace norm function as the regularizer. We empirically demonstrate that our framework learns useful kernel functions, improving the k-NN classification accuracy significantly in a variety of domains. Furthermore, our kernelized dimensionality reduction technique significantly reduces the dimensionality of the feature space while achieving competitive classification accuracies. 1
[1] K. Tsuda, G. R¨ tsch, and M. K. Warmuth. Matrix exponentiated gradient updates for on-line learning and a Bregman projection. JMLR, 6:995–1018, 2005.
[2] J. T. Kwok and I. W. Tsang. Learning with idealized kernels. In ICML, 2003.
[3] N. Cristianini, J. Shawe-Taylor, A. Elisseeff, and J. Kandola. On kernel-target alignment. In NIPS, 2001.
[4] C. S. Ong, A. J. Smola, and R. C. Williamson. Learning the kernel with hyperkernels. JMLR, 6:1043– 1071, 2005.
[5] G. R. G. Lanckriet, N. Cristianini, P. L. Bartlett, L. El Ghaoui, and M. I. Jordan. Learning the kernel matrix with semidefinite programming. JMLR, 5:27–72, 2004.
[6] Xiaojin Zhu, Jaz Kandola, Zoubin Ghahramani, and John Lafferty. Nonparametric transforms of graph kernels for semi-supervised learning. In Lawrence K. Saul, Yair Weiss, and L´ on Bottou, editors, NIPS, e volume 17, pages 1641–1648, 2005.
[7] Peter V. Gehler and Sebastian Nowozin. Let the kernel figure it out; principled learning of pre-processing for kernel classifiers. In CVPR, pages 2836–2843, 2009.
[8] Matthias Seeger. Cross-validation optimization for large scale hierarchical classification kernel methods. In NIPS, pages 1233–1240, 2006.
[9] Yoshua Bengio, Olivier Delalleau, Nicolas Le Roux, Jean-Francois Paiement, Pascal Vincent, and Marie Ouimet. Learning eigenfunctions links spectral embedding and kernel PCA. Neural Computation, 16(10):2197–2219, 2004.
[10] E. P. Xing, A. Y. Ng, M. I. Jordan, and S. J. Russell. Distance metric learning with application to clustering with side-information. In NIPS, pages 505–512, 2002.
[11] K. Q. Weinberger, J. Blitzer, and L. K. Saul. Distance metric learning for large margin nearest neighbor classification. In NIPS, 2005.
[12] J. V. Davis, B. Kulis, P. Jain, S. Sra, and I. S. Dhillon. Information-theoretic metric learning. In ICML, pages 209–216, 2007.
[13] S. Shalev-Shwartz, Y. Singer, and A. Y. Ng. Online and batch learning of pseudo-metrics. In ICML, 2004.
[14] A. Globerson and S. T. Roweis. Metric learning by collapsing classes. In NIPS, 2005.
[15] J. Goldberger, S. Roweis, G. Hinton, and R. Salakhutdinov. Neighbourhood component analysis. In NIPS, 2004.
[16] B. Kulis, S. Sra, and I. S. Dhillon. Convex perturbations for scalable semidefinite programming. In AISTATS, 2009.
[17] R. Chatpatanasiri, T. Korsrilabutr, P. Tangchanachaianan, and B. Kijsirikul. On kernelization of supervised Mahalanobis distance learners, 2008.
[18] Andreas Argyriou, Charles A. Micchelli, and Massimiliano Pontil. On spectral learning. JMLR, 11:935– 953, 2010.
[19] F. R. Bach and M. I. Jordan. Predictive low-rank decomposition for kernel methods. In ICML, pages 33–40, 2005.
[20] L. Song, A. Smola, K. M. Borgwardt, and A. Gretton. Colored maximum variance unfolding. In NIPS, pages 1385–1392, 2007.
[21] Y. Song, F. Nie, C. Zhang, and S. Xiang. A unified framework for semi-supervised dimensionality reduction. Pattern Recognition, 41(9):2789–2799, 2008.
[22] K. Fukumizu, F. R. Bach, and M. I. Jordan. Kernel dimensionality reduction for supervised learning. In NIPS, 2003.
[23] R. Urtasun and T. Darrell. Discriminative gaussian process latent variable model for classification. In ICML, pages 927–934, 2007.
[24] S. Mika, B. Sch¨ lkopf, A. J. Smola, K. M¨ ller, M. Scholz, and G. R¨ tsch. Kernel pca and de-noising in o u a feature spaces. In NIPS, pages 536–542, 1998.
[25] V. Sindhwani, P. Niyogi, and M. Belkin. Beyond the point cloud: from transductive to semi-supervised learning. In ICML, pages 824–831, 2005.
[26] Brian Kulis, M´ ty´ s Sustik, and Inderjit S. Dhillon. Learning low-rank kernel matrices. In ICML, pages a a 505–512, 2006.
[27] Matthew Schultz and Thorsten Joachims. Learning a distance metric from relative comparisons. In NIPS, 2003.
[28] A. Bar-Hillel, T. Hertz, N. Shental, and D. Weinshall. Learning a mahalanobis metric from equivalence constraints. JMLR, 6:937–965, 2005.
[29] P. Jain, B. Kulis, and K. Grauman. Fast image search for learned metrics. In CVPR, 2008.
[30] B. Recht, M. Fazel, and P. A. Parrilo. Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization, 2007.
[31] J. Cai, E. J. Candes, and Z. Shen. A singular value thresholding algorithm for matrix completion, 2008.
[32] K. Grauman and T. Darrell. The Pyramid Match Kernel: Efficient learning with sets of features. Journal of Machine Learning Research (JMLR), 8:725–760, April 2007.
[33] S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In CVPR, pages 2169–2178, 2006.
[34] A. C. Berg and J. Malik. Geometric blur for template matching. In CVPR, pages 607–614, 2001. 9