jmlr jmlr2012 jmlr2012-92 jmlr2012-92-reference knowledge-graph by maker-knowledge-mining

92 jmlr-2012-Positive Semidefinite Metric Learning Using Boosting-like Algorithms


Source: pdf

Author: Chunhua Shen, Junae Kim, Lei Wang, Anton van den Hengel

Abstract: The success of many machine learning and pattern recognition methods relies heavily upon the identification of an appropriate distance metric on the input data. It is often beneficial to learn such a metric from the input training data, instead of using a default one such as the Euclidean distance. In this work, we propose a boosting-based technique, termed B OOST M ETRIC, for learning a quadratic Mahalanobis distance metric. Learning a valid Mahalanobis distance metric requires enforcing the constraint that the matrix parameter to the metric remains positive semidefinite. Semidefinite programming is often used to enforce this constraint, but does not scale well and is not easy to implement. B OOST M ETRIC is instead based on the observation that any positive semidefinite matrix can be decomposed into a linear combination of trace-one rank-one matrices. B OOST M ETRIC thus uses rank-one positive semidefinite matrices as weak learners within an efficient and scalable boosting-based learning process. The resulting methods are easy to implement, efficient, and can accommodate various types of constraints. We extend traditional boosting algorithms in that its weak learner is a positive semidefinite matrix with trace and rank being one rather than a classifier or regressor. Experiments on various data sets demonstrate that the proposed algorithms compare favorably to those state-of-the-art methods in terms of classification accuracy and running time. Keywords: Mahalanobis distance, semidefinite programming, column generation, boosting, Lagrange duality, large margin nearest neighbor


reference text

A. Bar-Hillel, T. Hertz, N. Shental, and D. Weinshall. Learning a Mahalanobis metric from equivalence constraints. J. Machine Learning Research, 6:937–965, 2005. O. Boiman, E. Shechtman, and M. Irani. In defense of nearest-neighbor based image classification. In Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, 2008. B. Borchers. CSDP, a C library for semidefinite programming. Optimization Methods and Software, 11(1):613–623, 1999. S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004. Z. Cao, Q. Yin, X. Tang, and J. Sun. Face recognition with learning-based descriptor. In Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, 2010. G. B. Dantzig and P. Wolfe. Decomposition principle for linear programs. Operation Research, 8 (1):101–111, 1960. J. V. Davis, B. Kulis, P. Jain, S. Sra, and I. S. Dhillon. Information-theoretic metric learning. In Int’l Conf. Machine Learning, pages 209–216, Corvalis, Oregon, 2007. ACM Press. 1033 S HEN , K IM , WANG AND VAN DEN H ENGEL A. Demiriz, K.P. Bennett, and J. Shawe-Taylor. Linear programming boosting via column generation. Machine Learning, 46(1-3):225–254, 2002. P. Drineas, R. Kannan, and M. Mahoney. Fast Monte Carlo algorithms for matrices II: Computing a compressed approximate matrix decomposition. SIAM J. Computing, 36:2006, 2004. L. Fei-Fei, R. Fergus, and P. Perona. One-shot learning of object categories. IEEE Trans. Pattern Analysis and Machine Intelligence, 28(4):594–611, April 2006. P. A. Fillmore and J. P. Williams. Some convexity theorems for matrices. Glasgow Math. Journal, 12:110–117, 1971. A. Frome, Y. Singer, F. Sha, and J. Malik. Learning globally-consistent local distance functions for shape-based image retrieval and classification. In Proc. IEEE Int’l Conf. Computer Vision, 2007. A. Globerson and S. Roweis. Metric learning by collapsing classes. In Proc. Advances in Neural Information Processing Systems, 2005. J. Goldberger, S. Roweis, G. Hinton, and R. Salakhutdinov. Neighbourhood component analysis. In Proc. Advances in Neural Information Processing Systems. MIT Press, 2004. M. Guillaumin, J. Verbeek, and C. Schmid. Is that you? metric learning approaches for face identification. In Proc. IEEE Int’l Conf. Computer Vision, 2009. T. Hastie and R. Tibshirani. Discriminant adaptive nearest neighbor classification. IEEE Trans. Pattern Analysis and Machine Intelligence, 18(6):607–616, 1996. X. He, S. Yan, Y. Hu, P. Niyogi, and H.-J. Zhang. Face recognition using Laplacianfaces. IEEE Trans. Pattern Analysis and Machine Intelligence, 27(3):328–340, 2005. G. B. Huang, M. Ramesh, T. Berg, and E. Learned-Miller. Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Technical Report 07-49, University of Massachusetts, Amherst, October 2007. G. B. Huang, M. J. Jones, and E. Learned-Miller. LFW results using a combined nowak plus merl recognizer. In Faces in Real-Life Images Workshop in Euro. Conf. Computer Vision, 2008. B. Jian and B. C. Vemuri. Metric learning using Iwasawa decomposition. In Proc. IEEE Int’l Conf. Computer Vision, Rio de Janeiro, Brazil, 2007. IEEE. M. Krein and D. Milman. On extreme points of regular convex sets. Studia Mathematica, 9:133– 138, 1940. N. Kumar, A. C. Berg, P. N. Belhumeur, and S. K. Nayar. Attribute and simile classifiers for face verification. In Proc. IEEE Int’l Conf. Computer Vision, 2009. D. G. Lowe. Distinctive image features from scale-invariant keypoints. Int’l J. Computer Vision, 60 (2):91–110, 2004. M. E. L¨ bbecke and J. Desrosiers. Selected topics in column generation. Operation Research, 53 u (6):1007–1023, 2005. 1034 M ETRIC L EARNING U SING B OOSTING - LIKE A LGORITHMS K. Mikolajczyk and C. Schmid. Scale & affine invariant interest point detectors. Int’l J. Computer Vision, 60(1):63–86, 2004. E. Nowak and F. Jurie. Learning visual similarity measures for comparing never seen objects. In Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, 2007. M. L. Overton and R. S. Womersley. On the sum of the largest eigenvalues of a symmetric matrix. SIAM J. Matrix Analysis and Application, 13(1):41–45, 1992. N. Pinto, J. J. DiCarlo, and D. D. Cox. How far can you get with a modern face recognition test set using only simple features? In Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, 2009. R. Rosales and G. Fung. Learning sparse metrics via linear programming. In Proc. ACM SIGKDD Int’l Conf. Knowledge discovery and Data Mining, pages 367–373. ACM, 2006. S. Rosset, J. Zhu, and T. Hastie. Boosting as a regularized path to a maximum margin classifier. J. Machine Learning Research, 5:941–973, 2004. R. E. Schapire. Theoretical views of boosting and applications. In Proc. Int’l Conf. Algorithmic Learning Theory, pages 13–25, London, UK, 1999. Springer-Verlag. M. Schultz and T. Joachims. Learning a distance metric from relative comparisons. In Proc. Advances in Neural Information Processing Systems. MIT Press, 2003. C. Shen and H. Li. On the dual formulation of boosting algorithms. IEEE Trans. Pattern Analysis and Machine Intelligence, 32(12):2216–2231, 2010. C. Shen, A. Welsh, and L. Wang. PSDBoost: Matrix-generation linear programming for positive semidefinite matrices learning. In D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, editors, Proc. Advances in Neural Information Processing Systems, pages 1473–1480, Vancouver, B.C., Canada, December 2008. MIT Press. C. Shen, J. Kim, L. Wang, and A. van den Hengel. Positive semidefinite metric learning with boosting. In Y. Bengio, D. Schuurmans, J. Lafferty, C. Williams, and A. Culotta, editors, Proc. Advances in Neural Information Processing Systems, pages 1651–1659, Vancouver, B.C., Canada, December 2009. MIT Press. Y. Taigman, L. Wolf, and T. Hassner. Multiple one-shots for utilizing class label information. In Proc. British Machine Vision Conf., 2009. K. Tsuda, G. R¨ tsch, and M. K. Warmuth. Matrix exponentiated gradient updates for on-line learna ing and Bregman projection. J. Machine Learning Research, 6:995–1018, December 2005. M. A. Turk and A. P. Pentland. Face recognition using eigenfaces. In Proc. IEEE Int’l Conf. Computer Vision and Pattern Recognition, 1991. H. Wang, H. Huang, and C. Ding. Discriminant Laplacian embedding. In Proc. AAAI Conf. Artificial Intelligence, pages 618–623, 2010a. 1035 S HEN , K IM , WANG AND VAN DEN H ENGEL S. Wang and R. Jin. An information geometry approach for distance metric learning. In Proc. Int’l Conf. Artificial Intelligence and Statistics, pages 591–598, 2009. Z. Wang, Y. Hu, and L.-T. Chia. Image-to-class distance metric learning for image classification. In Proc. Euro. Conf. Computer Vision, volume Lecture Notes in Computer Science 6311/2010, pages 706–719, 2010b. M. K. Warmuth, J. Liao, and G. R¨ tsch. Totally corrective boosting algorithms that maximize the a margin. In Int’l Conf. Machine Learning, pages 1001–1008, Pittsburgh, Pennsylvania, 2006. K. Q. Weinberger and L. K. Saul. Unsupervised learning of image manifolds by semidefinite programming. Int’l J. Computer Vision, 70(1):77–90, 2006. K. Q. Weinberger and L. K. Saul. Distance metric learning for large margin nearest neighbor classification. J. Machine Learning Research, 10:207–244, 2009. K. Q. Weinberger, J. Blitzer, and L. K. Saul. Distance metric learning for large margin nearest neighbor classification. In Proc. Advances in Neural Information Processing Systems, pages 1473–1480, 2005. J. Winn, A. Criminisi, and T. Minka. Object categorization by learned universal visual dictionary. In Proc. IEEE Int’l Conf. Computer Vision, pages 1800–1807, 2005. L. Wolf, T. Hassner, and Y. Taigman. Descriptor based methods in the wild. In Faces in Real-Life Images Workshop in Euro. Conf. Computer Vision, 2008. L. Wolf, T. Hassner, and Y. Taigman. Similarity scores based on background samples. In Proc. Asian Conf. Computer Vision, 2009. E. Xing, A. Ng, M. Jordan, and S. Russell. Distance metric learning, with application to clustering with side-information. In Proc. Advances in Neural Information Processing Systems. MIT Press, 2002. J. Yu, J. Amores, N. Sebe, P. Radeva, and Q. Tian. Distance learning for similarity estimation. IEEE Trans. Pattern Analysis and Machine Intelligence, 30(3):451–462, 2008. T. Zhang. Sequential greedy approximation for certain convex optimization problems. IEEE Trans. Information Theory, 49(3):682–691, 2003. C. Zhu, R. H. Byrd, and J. Nocedal. L-BFGS-B: Algorithm 778: L-BFGS-B, FORTRAN routines for large scale bound constrained optimization. ACM Trans. Mathematical Software, 23(4):550– 560, 1997. 1036