jmlr jmlr2012 jmlr2012-92 jmlr2012-92-reference knowledge-graph by maker-knowledge-mining

92 jmlr-2012-Positive Semidefinite Metric Learning Using Boosting-like Algorithms

Source: pdf

Author: Chunhua Shen, Junae Kim, Lei Wang, Anton van den Hengel

Abstract: The success of many machine learning and pattern recognition methods relies heavily upon the identiﬁcation of an appropriate distance metric on the input data. It is often beneﬁcial to learn such a metric from the input training data, instead of using a default one such as the Euclidean distance. In this work, we propose a boosting-based technique, termed B OOST M ETRIC, for learning a quadratic Mahalanobis distance metric. Learning a valid Mahalanobis distance metric requires enforcing the constraint that the matrix parameter to the metric remains positive semideﬁnite. Semideﬁnite programming is often used to enforce this constraint, but does not scale well and is not easy to implement. B OOST M ETRIC is instead based on the observation that any positive semideﬁnite matrix can be decomposed into a linear combination of trace-one rank-one matrices. B OOST M ETRIC thus uses rank-one positive semideﬁnite matrices as weak learners within an efﬁcient and scalable boosting-based learning process. The resulting methods are easy to implement, efﬁcient, and can accommodate various types of constraints. We extend traditional boosting algorithms in that its weak learner is a positive semideﬁnite matrix with trace and rank being one rather than a classiﬁer or regressor. Experiments on various data sets demonstrate that the proposed algorithms compare favorably to those state-of-the-art methods in terms of classiﬁcation accuracy and running time. Keywords: Mahalanobis distance, semideﬁnite programming, column generation, boosting, Lagrange duality, large margin nearest neighbor

reference text

A. Bar-Hillel, T. Hertz, N. Shental, and D. Weinshall. Learning a Mahalanobis metric from equivalence constraints. J. Machine Learning Research, 6:937–965, 2005. O. Boiman, E. Shechtman, and M. Irani. In defense of nearest-neighbor based image classiﬁcation. In Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, 2008. B. Borchers. CSDP, a C library for semideﬁnite programming. Optimization Methods and Software, 11(1):613–623, 1999. S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004. Z. Cao, Q. Yin, X. Tang, and J. Sun. Face recognition with learning-based descriptor. In Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, 2010. G. B. Dantzig and P. Wolfe. Decomposition principle for linear programs. Operation Research, 8 (1):101–111, 1960. J. V. Davis, B. Kulis, P. Jain, S. Sra, and I. S. Dhillon. Information-theoretic metric learning. In Int’l Conf. Machine Learning, pages 209–216, Corvalis, Oregon, 2007. ACM Press. 1033 S HEN , K IM , WANG AND VAN DEN H ENGEL A. Demiriz, K.P. Bennett, and J. Shawe-Taylor. Linear programming boosting via column generation. Machine Learning, 46(1-3):225–254, 2002. P. Drineas, R. Kannan, and M. Mahoney. Fast Monte Carlo algorithms for matrices II: Computing a compressed approximate matrix decomposition. SIAM J. Computing, 36:2006, 2004. L. Fei-Fei, R. Fergus, and P. Perona. One-shot learning of object categories. IEEE Trans. Pattern Analysis and Machine Intelligence, 28(4):594–611, April 2006. P. A. Fillmore and J. P. Williams. Some convexity theorems for matrices. Glasgow Math. Journal, 12:110–117, 1971. A. Frome, Y. Singer, F. Sha, and J. Malik. Learning globally-consistent local distance functions for shape-based image retrieval and classiﬁcation. In Proc. IEEE Int’l Conf. Computer Vision, 2007. A. Globerson and S. Roweis. Metric learning by collapsing classes. In Proc. Advances in Neural Information Processing Systems, 2005. J. Goldberger, S. Roweis, G. Hinton, and R. Salakhutdinov. Neighbourhood component analysis. In Proc. Advances in Neural Information Processing Systems. MIT Press, 2004. M. Guillaumin, J. Verbeek, and C. Schmid. Is that you? metric learning approaches for face identiﬁcation. In Proc. IEEE Int’l Conf. Computer Vision, 2009. T. Hastie and R. Tibshirani. Discriminant adaptive nearest neighbor classiﬁcation. IEEE Trans. Pattern Analysis and Machine Intelligence, 18(6):607–616, 1996. X. He, S. Yan, Y. Hu, P. Niyogi, and H.-J. Zhang. Face recognition using Laplacianfaces. IEEE Trans. Pattern Analysis and Machine Intelligence, 27(3):328–340, 2005. G. B. Huang, M. Ramesh, T. Berg, and E. Learned-Miller. Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Technical Report 07-49, University of Massachusetts, Amherst, October 2007. G. B. Huang, M. J. Jones, and E. Learned-Miller. LFW results using a combined nowak plus merl recognizer. In Faces in Real-Life Images Workshop in Euro. Conf. Computer Vision, 2008. B. Jian and B. C. Vemuri. Metric learning using Iwasawa decomposition. In Proc. IEEE Int’l Conf. Computer Vision, Rio de Janeiro, Brazil, 2007. IEEE. M. Krein and D. Milman. On extreme points of regular convex sets. Studia Mathematica, 9:133– 138, 1940. N. Kumar, A. C. Berg, P. N. Belhumeur, and S. K. Nayar. Attribute and simile classiﬁers for face veriﬁcation. In Proc. IEEE Int’l Conf. Computer Vision, 2009. D. G. Lowe. Distinctive image features from scale-invariant keypoints. Int’l J. Computer Vision, 60 (2):91–110, 2004. M. E. L¨ bbecke and J. Desrosiers. Selected topics in column generation. Operation Research, 53 u (6):1007–1023, 2005. 1034 M ETRIC L EARNING U SING B OOSTING - LIKE A LGORITHMS K. Mikolajczyk and C. Schmid. Scale & afﬁne invariant interest point detectors. Int’l J. Computer Vision, 60(1):63–86, 2004. E. Nowak and F. Jurie. Learning visual similarity measures for comparing never seen objects. In Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, 2007. M. L. Overton and R. S. Womersley. On the sum of the largest eigenvalues of a symmetric matrix. SIAM J. Matrix Analysis and Application, 13(1):41–45, 1992. N. Pinto, J. J. DiCarlo, and D. D. Cox. How far can you get with a modern face recognition test set using only simple features? In Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, 2009. R. Rosales and G. Fung. Learning sparse metrics via linear programming. In Proc. ACM SIGKDD Int’l Conf. Knowledge discovery and Data Mining, pages 367–373. ACM, 2006. S. Rosset, J. Zhu, and T. Hastie. Boosting as a regularized path to a maximum margin classiﬁer. J. Machine Learning Research, 5:941–973, 2004. R. E. Schapire. Theoretical views of boosting and applications. In Proc. Int’l Conf. Algorithmic Learning Theory, pages 13–25, London, UK, 1999. Springer-Verlag. M. Schultz and T. Joachims. Learning a distance metric from relative comparisons. In Proc. Advances in Neural Information Processing Systems. MIT Press, 2003. C. Shen and H. Li. On the dual formulation of boosting algorithms. IEEE Trans. Pattern Analysis and Machine Intelligence, 32(12):2216–2231, 2010. C. Shen, A. Welsh, and L. Wang. PSDBoost: Matrix-generation linear programming for positive semideﬁnite matrices learning. In D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, editors, Proc. Advances in Neural Information Processing Systems, pages 1473–1480, Vancouver, B.C., Canada, December 2008. MIT Press. C. Shen, J. Kim, L. Wang, and A. van den Hengel. Positive semideﬁnite metric learning with boosting. In Y. Bengio, D. Schuurmans, J. Lafferty, C. Williams, and A. Culotta, editors, Proc. Advances in Neural Information Processing Systems, pages 1651–1659, Vancouver, B.C., Canada, December 2009. MIT Press. Y. Taigman, L. Wolf, and T. Hassner. Multiple one-shots for utilizing class label information. In Proc. British Machine Vision Conf., 2009. K. Tsuda, G. R¨ tsch, and M. K. Warmuth. Matrix exponentiated gradient updates for on-line learna ing and Bregman projection. J. Machine Learning Research, 6:995–1018, December 2005. M. A. Turk and A. P. Pentland. Face recognition using eigenfaces. In Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, 1991. H. Wang, H. Huang, and C. Ding. Discriminant Laplacian embedding. In Proc. AAAI Conf. Artiﬁcial Intelligence, pages 618–623, 2010a. 1035 S HEN , K IM , WANG AND VAN DEN H ENGEL S. Wang and R. Jin. An information geometry approach for distance metric learning. In Proc. Int’l Conf. Artiﬁcial Intelligence and Statistics, pages 591–598, 2009. Z. Wang, Y. Hu, and L.-T. Chia. Image-to-class distance metric learning for image classiﬁcation. In Proc. Euro. Conf. Computer Vision, volume Lecture Notes in Computer Science 6311/2010, pages 706–719, 2010b. M. K. Warmuth, J. Liao, and G. R¨ tsch. Totally corrective boosting algorithms that maximize the a margin. In Int’l Conf. Machine Learning, pages 1001–1008, Pittsburgh, Pennsylvania, 2006. K. Q. Weinberger and L. K. Saul. Unsupervised learning of image manifolds by semideﬁnite programming. Int’l J. Computer Vision, 70(1):77–90, 2006. K. Q. Weinberger and L. K. Saul. Distance metric learning for large margin nearest neighbor classiﬁcation. J. Machine Learning Research, 10:207–244, 2009. K. Q. Weinberger, J. Blitzer, and L. K. Saul. Distance metric learning for large margin nearest neighbor classiﬁcation. In Proc. Advances in Neural Information Processing Systems, pages 1473–1480, 2005. J. Winn, A. Criminisi, and T. Minka. Object categorization by learned universal visual dictionary. In Proc. IEEE Int’l Conf. Computer Vision, pages 1800–1807, 2005. L. Wolf, T. Hassner, and Y. Taigman. Descriptor based methods in the wild. In Faces in Real-Life Images Workshop in Euro. Conf. Computer Vision, 2008. L. Wolf, T. Hassner, and Y. Taigman. Similarity scores based on background samples. In Proc. Asian Conf. Computer Vision, 2009. E. Xing, A. Ng, M. Jordan, and S. Russell. Distance metric learning, with application to clustering with side-information. In Proc. Advances in Neural Information Processing Systems. MIT Press, 2002. J. Yu, J. Amores, N. Sebe, P. Radeva, and Q. Tian. Distance learning for similarity estimation. IEEE Trans. Pattern Analysis and Machine Intelligence, 30(3):451–462, 2008. T. Zhang. Sequential greedy approximation for certain convex optimization problems. IEEE Trans. Information Theory, 49(3):682–691, 2003. C. Zhu, R. H. Byrd, and J. Nocedal. L-BFGS-B: Algorithm 778: L-BFGS-B, FORTRAN routines for large scale bound constrained optimization. ACM Trans. Mathematical Software, 23(4):550– 560, 1997. 1036