nips nips2001 nips2001-171 nips2001-171-reference knowledge-graph by maker-knowledge-mining

171 nips-2001-Spectral Relaxation for K-means Clustering

Source: pdf

Author: Hongyuan Zha, Xiaofeng He, Chris Ding, Ming Gu, Horst D. Simon

Abstract: The popular K-means clustering partitions a data set by minimizing a sum-of-squares cost function. A coordinate descend method is then used to find local minima. In this paper we show that the minimization can be reformulated as a trace maximization problem associated with the Gram matrix of the data vectors. Furthermore, we show that a relaxed version of the trace maximization problem possesses global optimal solutions which can be obtained by computing a partial eigendecomposition of the Gram matrix, and the cluster assignment for each data vectors can be found by computing a pivoted QR decomposition of the eigenvector matrix. As a by-product we also derive a lower bound for the minimum of the sum-of-squares cost function. 1

reference text

[1] P. S. Bradley and Usama M. Fayyad. (1998). R efining Initial Points for K-Means Clustering. Proc. 15th International Conf. on Machine Learning, 91- 99.

[2] P. S. Bradley, K. Bennett and A. Demiritz. Constrained K-means Clustering. Microsoft Research, MSR-TR-2000-65, 2000.

[3] M. Girolani. (2001). Mercer Kernel Based Clustering in Feature Space. To appear in IEEE Transactions on Neural Networks.

[4] G. Golub and C. Van Loan . (1996) . Matrix Computations. Johns Hopkins University Press, 3rd Edition.

[5] Ming Gu, Hongyuan Zha, Chris Ding, Xiaofeng He and Horst Simon. (2001) . Spectral Embedding for K- Way Graph Clustering. Technical Report, Department of Computer Science and Engineering, CSE-OI-007, Pennsylvania State University.

[6] J.A. Hartigan and M.A. Wong. (1979). A K-means Clustering Algorithm. Applied Statistics, 28:100- 108.

[7] L. Lovasz and M.D. Plummer. (1986) Matching Theory. Amsterdam: North Holland.

[8] A. McCallum. Bow: A toolkit for statistical language modeling, text retrieval, classification and clustering. http : //www . CS. cmu. edu/ mccallum/bow.

[9] B. Schi:ilkopf, A. Smola and K.R. Miiller. (1998). Nonlinear Component Analysis as a Kernel Eigenvalue Problem. N eural Computation, 10: 1299- 1219.

[10] N . Slonim and N. Tishby. (2000). Document clustering using word clusters via the information bottleneck method. Proceedings of SIGIR-2000.

[11] G.W. Stewart and J.G. Sun. (1990). Matrix Perturbation Theory. Academic Press, San Diego , CA.