nips nips2009 nips2009-223 nips2009-223-reference knowledge-graph by maker-knowledge-mining

223 nips-2009-Sparse Metric Learning via Smooth Optimization


Source: pdf

Author: Yiming Ying, Kaizhu Huang, Colin Campbell

Abstract: In this paper we study the problem of learning a low-rank (sparse) distance matrix. We propose a novel metric learning model which can simultaneously conduct dimension reduction and learn a distance matrix. The sparse representation involves a mixed-norm regularization which is non-convex. We then show that it can be equivalently formulated as a convex saddle (min-max) problem. From this saddle representation, we develop an efficient smooth optimization approach [17] for sparse metric learning, although the learning model is based on a nondifferentiable loss function. Finally, we run experiments to validate the effectiveness and efficiency of our sparse metric learning model on various datasets.


reference text

[1] A. Argyriou, T. Evgeniou, and M. Pontil. Multi-task feature learning. NIPS, 2007.

[2] A. Argyriou, C. A. Micchelli, M. Pontil, and Y. Ying. A spectral regularization framework for multi-task structure learning. NIPS, 2008.

[3] F. R. Bach. Consistency of trace norm minimization. J. of Machine Learning Research, 9: 1019–1048, 2008.

[4] J. M. Borwein and A. S. Lewis. Convex Analysis and Nonlinear Optimization: Theory and Examples. CMS Books in Mathematics. Springer, 2005.

[5] S. Boyd and L . Vandenberghe. Convex optimization. Cambridge University Press, 2004.

[6] J. F. Bonnans and A. Shapiro. Optimization problems with perturbation: A guided tour. SIAM Review, 40:202–227 ,1998.

[7] A. Bar-Hillel, T. Hertz, N. Shental, and D. Weinshall. Learning a mahalanobis metric from equivalence constraints. J. of Machine Learning Research, 6: 937-965, 2005.

[8] S. Chopra, R. Hadsell, and Y. LeCun. Learning a similarity metric discriminatively with application to face verification. CVPR, 2005.

[9] J. Davis, B. Kulis, P. Jain, S. Sra, and I. Dhillon. Information-theoretic metric learning. ICML, 2007.

[10] G. M. Fung, O. L. Mangasarian, and A. J. Smola. Minimal kernel classifiers. J. of Machine Learning Research, 3: 303–321, 2002.

[11] A. Globerson, S. Roweis, Metric learning by collapsing classes. NIPS, 2005.

[12] J. Goldberger, S. Roweis, G. Hinton, and R. Salakhutdinov. Neighbourhood component analysis. NIPS, 2004.

[13] T. Hastie, R.Tibshirani, and Robert Friedman. The Elements of Statistical Learning. SpringerVerlag New York, LLC, 2003.

[14] R.A. Horn and C.R. Johhnson. Topics in Matrix Analysis. Cambridge University Press, 1991.

[15] A. Nemirovski. Efficient methods in convex programming. Lecture Notes, 1994.

[16] Y. Nesterov. Introductory Lectures on Convex Optimization: A Basic Course. Springer, 2003.

[17] Y. Nesterov. Smooth minimization of non-smooth functions. Mathematical Programming, 103:127-152, 2005.

[18] Obozinski, B. Taskar, and M. I. Jordan. Joint covariate selection and joint subspace selection for multiple classification problems. Statistics and Computing. In press, 2009.

[19] J. D. M. Rennie, and N. Srebro. Fast maximum margin matrix factorization for collaborative prediction. ICML, 2005.

[20] R. Rosales and G. Fung. Learning sparse metrics via linear programming. KDD, 2006.

[21] N. Srebro, J.D. M. Rennie, and T. S. Jaakkola. Maximum-margin matrix factorization. NIPS, 2005.

[22] L. Torresani and K. Lee. Large margin component analysis. NIPS, 2007.

[23] K. Q. Weinberger, J. Blitzer, and L. Saul. Distance metric learning for large margin nearest neighbour classification. NIPS, 2006.

[24] K. Q. Weinberger and L. K. Saul. Fast solvers and efficient implementations for distance metric learning. ICML, 2008.

[25] E. Xing, A. Ng, M. Jordan, and S. Russell. Distance metric learning with application to clustering with side information. NIPS, 2002.

[26] L. Yang and R. Jin. Distance metric learning: A comprehensive survey. In Technical report, Department of Computer Science and Engineering, Michigan State University, 2007. 9