jmlr jmlr2010 jmlr2010-98 jmlr2010-98-reference knowledge-graph by maker-knowledge-mining

98 jmlr-2010-Regularized Discriminant Analysis, Ridge Regression and Beyond

Source: pdf

Author: Zhihua Zhang, Guang Dai, Congfu Xu, Michael I. Jordan

Abstract: Fisher linear discriminant analysis (FDA) and its kernel extension—kernel discriminant analysis (KDA)—are well known methods that consider dimensionality reduction and classiﬁcation jointly. While widely deployed in practical problems, there are still unresolved issues surrounding their efﬁcient implementation and their relationship with least mean squares procedures. In this paper we address these issues within the framework of regularized estimation. Our approach leads to a ﬂexible and efﬁcient implementation of FDA as well as KDA. We also uncover a general relationship between regularized discriminant analysis and ridge regression. This relationship yields variations on conventional FDA based on the pseudoinverse and a direct equivalence to an ordinary least squares estimator. Keywords: Fisher discriminant analysis, reproducing kernel, generalized eigenproblems, ridge regression, singular value decomposition, eigenvalue decomposition

reference text

S. Akaho. A kernel method for canonical correlation analysis. In International Meeting of Psychometric Society, 2001. F. R. Bach and M. I. Jordan. Kernel independent component analysis. Journal of Machine Learning Research, 3:1–48, 2002. G. Baudat and F. Anouar. Generalized discriminant analysis using a kernel approach. Neural Computation, 12:2385–2404, 2000. Y.-Q. Cheng, Y.-M. Zhuang, and J.-Y. Yang. Optimal Fisher discriminant analysis using the rank decomposition. Pattern Recognition, 25(1):101–111, 1992. M. Craven, D. Dopasquo, D. Freitag, A. McCallum, T. Mitchell, K. Nigam, and S. Slattery. Learning to extract symbolic knowledge from the World Wide Web. In The Fifteenth Conference on Artiﬁcial Intelligence, 1998. R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classiﬁcation. John Wiley and Sons, New York, second edition, 2001. J. H. Friedman. Regularized discriminant analysis. Journal of the American Statistical Association, 84(405):165–175, 1989. A. S. Georghiades, P. N. Belhumeur, and D. J. Kriegman. From few to many: Illumination cone models for face recognition under variable lighting and pose. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(6):643–660, 2001. G. H. Golub and C. F. Van Loan. Matrix Computations. Johns Hopkins University Press, Baltimore, 1996. T. Hastie, R. Tibshirani, and A. Buja. Flexible discriminant analysis by optimal scoring. Journal of the American Statistical Association, 89(428):1255–1270, 1994. T. Hastie, A. Buja, and R. Tibshirani. Penalized discriminant analysis. Annals of Statistics, 23(1): 73–102, 1995. T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer-Verlag, New York, 2001. 2226 R EGULARIZED D ISCRIMINANT A NALYSIS A. E. Hoerl and R. W. Kennard. Ridge regression. Technometrics, 12:56–82, 1970. R. A. Horn and C. R. Johnson. Matrix Analysis. Cambridge University Press, Cambridge, UK, 1985. P. Howland and H. Park. Generalizing discriminant analysis using the generalized singular value decomposition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(8):995– 1006, 2004. P. Howland, M. Jeon, and H. Park. Structure preserving dimension reduction for clustered text data based on the generalized singular value decomposition. SIAM Journal on Matrix Analysis and Applications, 25(1):165–179, 2003. J. Kittler and P. C. Young. A new approach to feature selection based on the Karhunen-Lo` ve e expansion. Pattern Recognition, 5:335–352, 1973. K. C. Lee, J. Ho, and D. Kriegman. Acquiring linear subspaces for face recognition under variable lighting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(5):684–698, 2005. H. L¨ tkepohl. Handbook of Matrices. John Wiley & Sons, New York, 1996. u K. V. Mardia, J. T. Kent, and J. M. Bibby. Multivariate Analysis. Academic Press, New York, 1979. S. Mika, G. R¨ tsch, J. Weston, B. Sch¨ lkopf, A. Smola, and K. R. M¨ ller. Invariant feature extraca o u tion and classiﬁcation in kernel space. In Advances in Neural Information Processing Systems 12, volume 12, pages 526–532, 2000. C. C. Paige and M. A. Saunders. Towards a generalized singular value decomposition. SIAM Journal on Numerical Analysis, 18(3):398–405, 1981. C. H. Park and H. Park. Nonlinear discriminant analysis using kernel functions and the generalized singular value decomposition. SIAM Journal on Matrix Analysis and Applications, 27(1):87–102, 2005a. C. H. Park and H. Park. A relationship between linear discriminant analysis and the generalized minimum squared error solution. SIAM Journal on Matrix Analysis and Applications, 27(2): 474–492, 2005b. K. Pelckmans, J. De Brabanter, J. A. K. Suykens, and B. De Moor. The differogram: Nonparametric noise variance estimation and its use for model. Neurocomputing, 69:100–122, 2005. V. Roth and V. Steinhage. Nonlinear discriminant analysis using kernel functions. In Advances in Neural Information Processing Systems 12, volume 12, pages 568–574, 2000. J. Shawe-Taylor and N. Cristianini. Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge, UK, 2004. J. A. K. Suykens and J. Vandewalle. Least squares support vector machine classiﬁers. Neural Processing Letters, 9:293–300, 1999. 2227 Z HANG , DAI , X U AND J ORDAN J. A. K. Suykens, T. Van Gestel, J. De Brabanter, B. De Moor, and J. Vandewalle. Least Squares Support Vector Machines. World Scientiﬁc, Singapore, 2002. C. F. Van Loan. Generalizing the singular value decomposition. SIAM Journal on Numerical Analysis, 13(3):76–83, 1976. T. Van Gestel, J. A. K. Suykens, J. De Brabanter, B. De Moor, and J. Vandewalle. Kernel canonical correlation analysis and least squares support vector machines. In The International Conference on Artiﬁcial Neural Networks (ICANN), pages 381–386, 2001. T. Van Gestel, J. A. K. Suykens, G. Lanckriet, A. Lambrechts, B. De Moor, and J. Vandewalle. Bayesian framework for least-squares support vector machine classiﬁers, Gaussian processes, and kernel Fisher discriminant analysis. Neural Computation, 14:1115–1147, 2002. A. R. Webb. Statistical Pattern Recognition. John Wiley & Sons, Hoboken, NJ, 2002. J. Ye. Least squares linear discriminant analysis. In The Twenty-Fourth International Conference on Machine Learning (ICML), 2007. J. Ye, Q. Li, H. Xiong, H. Park, R. Janardan, and V. Kumar. An incremental dimension reduction algorithm via QR decomposition. In ACM SIGKDD, pages 364–373, 2004. Z. Zhang and G. Dai. Optimal scoring for unsupervised learning. In Advances in Neural Information Processing Systems 23, volume 12, pages 2241–2249, 2009. Z. Zhang and M. I. Jordan. Multiway spectral clustering: A margin-based perspective. Statistical Science, 3:383–403, 2008. 2228