nips nips2012 nips2012-242 nips2012-242-reference knowledge-graph by maker-knowledge-mining

242 nips-2012-Non-linear Metric Learning

Source: pdf

Author: Dor Kedem, Stephen Tyree, Fei Sha, Gert R. Lanckriet, Kilian Q. Weinberger

Abstract: In this paper, we introduce two novel metric learning algorithms, χ2 -LMNN and GB-LMNN, which are explicitly designed to be non-linear and easy-to-use. The two approaches achieve this goal in fundamentally different ways: χ2 -LMNN inherits the computational beneﬁts of a linear mapping from linear metric learning, but uses a non-linear χ2 -distance to explicitly capture similarities within histogram data sets; GB-LMNN applies gradient-boosting to learn non-linear mappings directly in function space and takes advantage of this approach’s robustness, speed, parallelizability and insensitivity towards the single additional hyperparameter. On various benchmark data sets, we demonstrate these methods not only match the current state-of-the-art in terms of kNN classiﬁcation error, but in the case of χ2 -LMNN, obtain best results in 19 out of 20 learning settings. 1

reference text

[1] B. Babenko, S. Branson, and S. Belongie. Similarity metrics for categorization: from monolithic to category speciﬁc. In ICCV ’09, pages 293–300. IEEE, 2009.

[2] A. Beck and M. Teboulle. Mirror descent and nonlinear projected subgradient methods for convex optimization. Operations Research Letters, 31(3):167–175, 2003.

[3] D.M. Blei, A.Y. Ng, and M.I. Jordan. Latent dirichlet allocation. The Journal of Machine Learning Research, 3:993–1022, 2003.

[4] L. Breiman. Classiﬁcation and regression trees. Chapman & Hall/CRC, 1984. 8

[5] R. Chatpatanasiri, T. Korsrilabutr, P. Tangchanachaianan, and B. Kijsirikul. A new kernelization framework for mahalanobis distance learning algorithms. Neurocomputing, 73(10-12):1570–1579, 2010.

[6] S. Chopra, R. Hadsell, and Y. LeCun. Learning a similarity metric discriminatively, with application to face veriﬁcation. In CVPR ’05, pages 539–546. IEEE, 2005.

[7] T. Cover and P. Hart. Nearest neighbor pattern classiﬁcation. IEEE Transactions on Information Theory, 13(1):21–27, 1967.

[8] O.G. Cula and K.J. Dana. 3D texture recognition using bidirectional feature histograms. International Journal of Computer Vision, 59(1):33–60, 2004.

[9] M. Cuturi and D. Avis. Ground metric learning. arXiv preprint, arXiv:1110.2306, 2011.

[10] J.V. Davis, B. Kulis, P. Jain, S. Sra, and I.S. Dhillon. Information-theoretic metric learning. In ICML ’07, pages 209–216. ACM, 2007.

[11] J.H. Friedman. Greedy function approximation: a gradient boosting machine. Annals of Statistics, pages 1189–1232, 2001.

[12] C. Galleguillos, B. McFee, S. Belongie, and G. Lanckriet. Multi-class object localization by combining local contextual interactions. CVPR ’10, pages 113–120, 2010.

[13] A. Globerson and S. Roweis. Metric learning by collapsing classes. In NIPS ’06, pages 451–458. MIT Press, 2006.

[14] A. Globerson and S. Roweis. Visualizing pairwise similarity via semideﬁnite programming. In AISTATS ’07, pages 139–146, 2007.

[15] J. Goldberger, S. Roweis, G. Hinton, and R. Salakhutdinov. Neighbourhood components analysis. In NIPS ’05, pages 513–520. MIT Press, 2005.

[16] J. Hafner, H.S. Sawhney, W. Equitz, M. Flickner, and W. Niblack. Efﬁcient color histogram indexing for quadratic form distance functions. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 17(7):729–736, 1995.

[17] M. Hoffman, D. Blei, and P. Cook. Easy as CBA: A simple probabilistic model for tagging music. In ISMIR ’09, pages 369–374, 2009.

[18] P. Jain, B. Kulis, J.V. Davis, and I.S. Dhillon. Metric and kernel learning using a linear transformation. Journal of Machine Learning Research, 13:519–547, 03 2012.

[19] A.M. Mood, F.A. Graybill, and D.C. Boes. Introduction in the theory of statistics. McGraw-Hill International Book Company, 1963.

[20] O. Pele and M. Werman. The quadratic-chi histogram distance family. ECCV ’10, pages 749–762, 2010.

[21] Y. Rubner, C. Tomasi, and L.J. Guibas. The earth mover’s distance as a metric for image retrieval. International Journal of Computer Vision, 40(2):99–121, 2000.

[22] K. Saenko, B. Kulis, M. Fritz, and T. Darrell. Adapting visual category models to new domains. Computer Vision–ECCV 2010, pages 213–226, 2010.

[23] G. Shakhnarovich. Learning task-speciﬁc similarity. PhD thesis, MIT, 2005.

[24] N. Shental, T. Hertz, D. Weinshall, and M. Pavel. Adjustment learning and relevant component analysis. In ECCV ’02, volume 4, pages 776–792. Springer-Verlag, 2002.

[25] M. Stricker and M. Orengo. Similarity of color images. In Storage and Retrieval for Image and Video Databases, volume 2420, pages 381–392, 1995.

[26] L. Torresani and K. Lee. Large margin component analysis. NIPS ’07, pages 1385–1392, 2007.

[27] T. Tuytelaars and K. Mikolajczyk. Local invariant feature detectors: a survey. Foundations and Trends R in Computer Graphics and Vision, 3(3):177–280, 2008.

[28] S. Tyree, K.Q. Weinberger, K. Agrawal, and J. Paykin. Parallel boosted regression trees for web search ranking. In WWW ’11, pages 387–396. ACM, 2011.

[29] M. Varma and A. Zisserman. A statistical approach to material classiﬁcation using image patch exemplars. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 31(11):2032–2047, 2009.

[30] K.Q. Weinberger and L.K. Saul. Fast solvers and efﬁcient implementations for distance metric learning. In ICML ’08, pages 1160–1167. ACM, 2008.

[31] K.Q. Weinberger and L.K. Saul. Distance metric learning for large margin nearest neighbor classiﬁcation. The Journal of Machine Learning Research, 10:207–244, 2009.

[32] E. P. Xing, A. Y. Ng, M. I. Jordan, and S. Russell. Distance metric learning, with application to clustering with side-information. In NIPS ’02, pages 505–512. MIT Press, 2002.

[33] P.N. Yianilos. Data structures and algorithms for nearest neighbor search in general metric spaces. In ACM-SIAM Symposium on Discrete Algorithms ’93, pages 311–321, 1993. 9