nips nips2007 nips2007-70 nips2007-70-reference knowledge-graph by maker-knowledge-mining

70 nips-2007-Discriminative K-means for Clustering


Source: pdf

Author: Jieping Ye, Zheng Zhao, Mingrui Wu

Abstract: We present a theoretical study on the discriminative clustering framework, recently proposed for simultaneous subspace selection via linear discriminant analysis (LDA) and clustering. Empirical results have shown its favorable performance in comparison with several other popular clustering algorithms. However, the inherent relationship between subspace selection and clustering in this framework is not well understood, due to the iterative nature of the algorithm. We show in this paper that this iterative subspace selection and clustering is equivalent to kernel K-means with a specific kernel Gram matrix. This provides significant and new insights into the nature of this subspace selection procedure. Based on this equivalence relationship, we propose the Discriminative K-means (DisKmeans) algorithm for simultaneous LDA subspace selection and clustering, as well as an automatic parameter estimation procedure. We also present the nonlinear extension of DisKmeans using kernels. We show that the learning of the kernel matrix over a convex set of pre-specified kernel matrices can be incorporated into the clustering formulation. The connection between DisKmeans and several other clustering algorithms is also analyzed. The presented theories and algorithms are evaluated through experiments on a collection of benchmark data sets. 1


reference text

[1] M. Belkin and P. Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation. In NIPS, 2003.

[2] C.L. Blake and C.J. Merz. UCI repository of machine learning databases, 1998.

[3] O. Chapelle, B. Sch¨ lkopf, and A. Zien. Semi-Supervised Learning. The MIT Press, 2006. o

[4] I. S. Dhillon, Y. Guan, and B. Kulis. A unified view of kernel k-means, spectral clustering and graph partitioning. Technical report, Department of Computer Sciences, University of Texas at Austin, 2005.

[5] C. Ding and T. Li. Adaptive dimension reduction using discriminant analysis and k-means clustering. In ICML, 2007.

[6] J. H. Friedman. Regularized discriminant analysis. JASA, 84(405):165–175, 1989.

[7] K. Fukunaga. Introduction to Statistical Pattern Classification. Academic Press.

[8] G. H. Golub and C. F. Van Loan. Matrix Computations. The Johns Hopkins Univ. Press, 1996.

[9] I.T. Jolliffe. Principal Component Analysis. Springer; 2nd edition, 2002.

[10] F. De la Torre Frade and T. Kanade. Discriminative cluster analysis. In ICML, pages 241–248, 2006.

[11] G.R.G. Lanckriet, N. Cristianini, P. Bartlett, L. E. Ghaoui, and M. I. Jordan. Learning the kernel matrix with semidefinite programming. JMLR, 5:27–72, 2004.

[12] C.E. Rasmussen and C.K.I. Williams. Gaussian Processes for Machine Learning. The MIT Press, 2006.

[13] S. T. Roweis and L. K. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 290:2323–2326, 2000.

[14] B. Sch¨ lkopf and A. Smola. Learning with Kernels: Support Vector Machines, Regularization, Optimizao tion and Beyond. MIT Press, 2002.

[15] L. Vandenberghe and S. Boyd. Semidefinite programming. SIAM Review, 38:49–95, 1996.

[16] J. Ye, Z. Zhao, and H. Liu. Adaptive distance metric learning for clustering. In CVPR, 2007. 8