jmlr jmlr2005 jmlr2005-46 jmlr2005-46-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Aharon Bar-Hillel, Tomer Hertz, Noam Shental, Daphna Weinshall
Abstract: Many learning algorithms use a metric defined over the input space as a principal tool, and their performance critically depends on the quality of this metric. We address the problem of learning metrics using side-information in the form of equivalence constraints. Unlike labels, we demonstrate that this type of side-information can sometimes be automatically obtained without the need of human intervention. We show how such side-information can be used to modify the representation of the data, leading to improved clustering and classification. Specifically, we present the Relevant Component Analysis (RCA) algorithm, which is a simple and efficient algorithm for learning a Mahalanobis metric. We show that RCA is the solution of an interesting optimization problem, founded on an information theoretic basis. If dimensionality reduction is allowed within RCA, we show that it is optimally accomplished by a version of Fisher’s linear discriminant that uses constraints. Moreover, under certain Gaussian assumptions, RCA can be viewed as a Maximum Likelihood estimation of the within class covariance matrix. We conclude with extensive empirical evaluations of RCA, showing its advantage over alternative methods. Keywords: clustering, metric learning, dimensionality reduction, equivalence constraints, side information.
Y. Adini, Y. Moses, and S. Ullman. Face recognition: The problem of compensating for changes in illumination direction. In proc. of IEEE PAMI, volume 19(7), pages 721–732, 1997. A. Bar-Hillel, T. Hertz, N. Shental, and D. weinshall. Learning distance functions using equivalence relations. In T. Fawcett and Nina Mishra, editors, 20th International Conference on Machine Learning, Wahington DC, 2003. AAAI press. 963 BAR H ILLEL , H ERTZ , S HENTAL AND W EINSHALL S. Becker and G. E. Hinton. A self-organising neural network that discovers surfaces in random-dot stereograms. Nature, 355:161–163, 1992. P. N. Belhumeur, J. Hespanha, and D. Kriegman. Eigenfaces vs. fisherfaces: Recognition using class specific linear projection. IEEE PAMI 8, 19(7):711–720, 1997. A. J. Bell and T. J. Sejnowski. An information-maximization approach to blind separation and blind deconvolution. Neural Computation, 7(6):1129–1159, 1995. M. Bilenko, S. Basu, and R.J Mooney. Integrating constraints and metric learning in semisupervised clustering. In Proc. 21st International Conf. on Machine Learning, Banff Canada, 2004. AAAI press. URL citeseer.ist.psu.edu/705723.html. C. L. Blake and C. J. Merz. UCI repository of machine learning databases, 1998. http://www.ics.uci.edu/∼mlearn/MLRepository.html. URL J. S. Boreczky and L. A. Rowe. Comparison of video shot boundary detection techniques. SPIE Storage and Retrieval for Still Images and Video Databases IV, 2664:170–179, 1996. G. Chechik and N. Tishby. Extracting relevant structures with side information. In S Becker, S. Thrune, and K. Obermayer, editors, Advances in Neural Information Processing Systems, volume 15. The MIT Press, 2003. K. Fukunaga. Statistical Pattern Recognition. Academic Press, San Diego, 2nd edition, 1990. P. Geladi and B. Kowalski. Partial least squares regression: A tutorial. Analytica Chimica Acta, 185:1–17, 1986. T. Hastie and R. Tibshirani. Discriminant adaptive nearest neighbor classification and regression. In David S. Touretzky, Michael C. Mozer, and Michael E. Hasselmo, editors, Advances in Neural Information Processing Systems, volume 8, pages 409–415. The MIT Press, 1996. T. Jaakkola and D. Haussler. Exploiting generative models in discriminative classifiers, 1998. D. Klein, S. Kamvar, and C. Manning. From instance-level constraints to space-level constraints: Making the most of prior knowledge in data clustering. In Proc. 19th International Conf. on Machine Learning, 2002. URL citeseer.nj.nec.com/klein02from.html. R. Linsker. An application of the principle of maximum information preservation to linear systems. In David S. Touretzky, editor, Advances in Neural Information Processing Systems, pages 186– 194. Morgan Kaufmann, 1989. K. V. Mardia. Measures of multivariate skewness and kurtosis with applications. Biometrika, 36: 519–530, 1970. W. M. Rand. Objective criteria for the evaluation of clustering method. Journal of the American Statistical Association, 66(366):846–850, 1971. N. Shental, A. Bar-Hillel, T. Hertz, and D. Weinshall. Computing gaussian mixture models with em using equivalence constraints. In Sebastian Thrun, Lawrence Saul, and Bernhard Sch¨ lkopf, o editors, Advances in Neural Information Processing Systems 16. MIT Press, Cambridge, MA, 2004. 964 M AHALANOBIS M ETRIC FROM E QUIVALENCE C ONSTRAINTS N. Shental, T. Hertz, D. Weinshall, and M. Pavel. Adjustment learning and relevant component analysis. In A. Heyden, G. Sparr, M. Nielsen, and P. Johansen, editors, 7th European Conference on Computer Vision, volume 4, 2002. J. B. Tenenbaum and W. T. Freeman. Separating style and content with bilinear models. Neural Computation, 12(6):1247–1283, 2000. B. Thompson. Canonical correlation analysis: Uses and interpretation. Newbury Park CA: SAGE, 1984. S. Thrun. Is learning the n-th thing any easier than learning the first? In David S. Touretzky, Michael C. Mozer, and Michael E. Hasselmo, editors, Advances in Neural Information Processing Systems, volume 8, pages 640–646. The MIT Press, 1996. N. Tishby, F. C. Pereira, and W. Bialek. The information bottleneck method. In Proc. of the 37-th Annual Allerton Conference on Communication, Control and Computing, pages 368–377, 1999. M. A. Turk and A.P Pentland. Face recognition using eigenfaces. In Proc. of IEEE CVPR, pages 586–591, 1991. K. Wagstaff, C. Cardie, S. Rogers, and S. Schroedl. Constrained K-means clustering with background knowledge. In Proc. 18th International Conf. on Machine Learning, pages 577–584. Morgan Kaufmann, San Francisco, CA, 2001. E. P. Xing, A. Y. Ng, M. I. Jordan, and S. Russell. Distance metric learning with application to clustering with side-information. In Advances in Neural Information Processing Systems, volume 15. The MIT Press, 2002. 965