nips nips2010 nips2010-287 nips2010-287-reference knowledge-graph by maker-knowledge-mining

287 nips-2010-Worst-Case Linear Discriminant Analysis

Source: pdf

Author: Yu Zhang, Dit-Yan Yeung

Abstract: Dimensionality reduction is often needed in many applications due to the high dimensionality of the data involved. In this paper, we ďŹ rst analyze the scatter measures used in the conventional linear discriminant analysis (LDA) model and note that the formulation is based on the average-case view. Based on this analysis, we then propose a new dimensionality reduction method called worst-case linear discriminant analysis (WLDA) by deďŹ ning new between-class and within-class scatter measures. This new model adopts the worst-case view which arguably is more suitable for applications such as classiďŹ cation. When the number of training data points or the number of features is not very large, we relax the optimization problem involved and formulate it as a metric learning problem. Otherwise, we take a greedy approach by ďŹ nding one direction of the transformation at a time. Moreover, we also analyze a special case of WLDA to show its relationship with conventional LDA. Experiments conducted on several benchmark datasets demonstrate the effectiveness of WLDA when compared with some related dimensionality reduction methods. 1

reference text

[1] A. Asuncion and D.J. Newman. UCI machine learning repository, 2007.

[2] P. N. Belhumeur, J. P. Hespanha, and D. J. Kriegman. Eigenfaces vs. Fisherfaces: Recognition using class speciďŹ c linear projection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(7):711â€“ 720, 1997.

[3] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, New York, NY, 2004.

[4] J. V. Davis, B. Kulis, P. Jain, S. Sra, and I. S. Dhillon. Information-theoretic metric learning. In Proceedings of the Twenty-Fourth International Conference on Machine Learning, pages 209â€“216, Corvalis, Oregon, USA, 2007.

[5] J. Duchene and S. Leclercq. An optimal transformation for discriminant and principal component analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 10(6):978â€“983, 1988.

[6] D. H. Foley and J. W. Sammon. An optimal set of discriminant vectors. IEEE Transactions on Computers, 24(3):281â€“289, 1975.

[7] K Fukunnaga. Introduction to Statistical Pattern Recognition. Academic Press, New York, 1991.

[8] I. T. Jolliffe. Principal Component Analysis. Springer-Verlag, New York, 2nd edition, 2002.

[9] S.-J. Kim, A. Magnani, and S. Boyd. Robust Fisher discriminant analysis. In Y. Weiss, B. SchÂ¨ lkopf, o and J. Platt, editors, Advances in Neural Information Processing Systems 18, pages 659â€“666. Vancouver, British Columbia, Canada, 2006.

[10] A. Kocsor, K. KovÂ´ cs, and C. SzepesvÂ´ ri. Margin maximizing discriminant analysis. In Proceedings of a a the 15th European Conference on Machine Learning, pages 227â€“238, Pisa, Italy, 2004.

[11] H. Li, T. Jiang, and K. Zhang. EfďŹ cient and robust feature extraction by maximum margin criterion. In S. Thrun, L. K. Saul, and B. SchÂ¨ lkopf, editors, Advances in Neural Information Processing Systems 16, o Vancouver, British Columbia, Canada, 2003.

[12] M. S. Lobo, L. Vandenberghe, S. Boyd, and H. Lebret. Applications of second-order cone programming. Linear Algebra and its Applications, 284:193â€“228, 1998.

[13] A. M. MartÂ´nez and R. Benavente. The AR-face database. Technical Report 24, CVC, 1998. Äą

[14] S. Mika, G. RÂ¨ tsch, J. Weston, B. SchÂ¨ lkopf, A. J. Smola, and K.-R. MÂ¨ ller. Constructing descriptive and a o u discriminative nonlinear features: Rayleigh coefďŹ cients in kernel feature spaces. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(5):623â€“633, 2003.

[15] S. A. Nene, S. K. Nayar, and H. Murase. Columbia object image library (COIL-20). Technical Report 005, CUCS, 1996.

[16] M. L. Overton and R. S. Womersley. Optimality conditions and duality theory for minimizing sums of the largest eigenvalues of symmetric matrices. Math Programming, 62(2):321â€“357, 1993.

[17] T. Sim, S. Baker, and M. Bsat. The CMU pose, illumination and expression database. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(12):1615â€“1618, 2003.

[18] A. J. Smola, S. V. N. Vishwanathan, and T. Hofmann. Kernel methods for missing variables. In Proceedings of the Tenth International Workshop on ArtiďŹ cial Intelligence and Statistics, Barbados, 2005.

[19] L. Vandenberghe and S. Boyd. SemideďŹ nite prgramming. SIAM Review, 38(1):49â€“95, 1996.

[20] H. Wang, S. Yan, D. Xu, X. Tang, and T. Huang. Trace ratio vs. ratio trace for dimensionality reduction. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 1â€“8, Minneapolis, Minnesota, USA, 2007.

[21] K. Q. Weinberger, J. Blitzer, and L. K. Saul. Distance metric learning for large margin nearest neighbor classiďŹ cation. In Y. Weiss, B. SchÂ¨ lkopf, and J. Platt, editors, Advances in Neural Information Processing o Systems 18, pages 1473â€“1480, Vancouver, British Columbia, Canada, 2005.

[22] E. P. Xing, A. Y. Ng, M. I. Jordan, and S. J. Russell. Distance metric learning with application to clustering with side-information. In S. Becker, S. Thrun, and K. Obermayer, editors, Advances in Neural Information Processing Systems 15, pages 505â€“512, Vancouver, British Columbia, Canada, 2002.

[23] J.-P. Ye and T. Xiong. Computational and theoretical analysis of null space and orthogonal linear discriminant analysis. Journal of Machine Learning Research, 7:1183â€“1204, 2006.

[24] A. Yuille and A. Rangarajan. The concave-convex procedure. Neural Computation, 15(4):915â€“936, 2003. 9