nips nips2001 nips2001-171 nips2001-171-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Hongyuan Zha, Xiaofeng He, Chris Ding, Ming Gu, Horst D. Simon
Abstract: The popular K-means clustering partitions a data set by minimizing a sum-of-squares cost function. A coordinate descend method is then used to find local minima. In this paper we show that the minimization can be reformulated as a trace maximization problem associated with the Gram matrix of the data vectors. Furthermore, we show that a relaxed version of the trace maximization problem possesses global optimal solutions which can be obtained by computing a partial eigendecomposition of the Gram matrix, and the cluster assignment for each data vectors can be found by computing a pivoted QR decomposition of the eigenvector matrix. As a by-product we also derive a lower bound for the minimum of the sum-of-squares cost function. 1
[1] P. S. Bradley and Usama M. Fayyad. (1998). R efining Initial Points for K-Means Clustering. Proc. 15th International Conf. on Machine Learning, 91- 99.
[2] P. S. Bradley, K. Bennett and A. Demiritz. Constrained K-means Clustering. Microsoft Research, MSR-TR-2000-65, 2000.
[3] M. Girolani. (2001). Mercer Kernel Based Clustering in Feature Space. To appear in IEEE Transactions on Neural Networks.
[4] G. Golub and C. Van Loan . (1996) . Matrix Computations. Johns Hopkins University Press, 3rd Edition.
[5] Ming Gu, Hongyuan Zha, Chris Ding, Xiaofeng He and Horst Simon. (2001) . Spectral Embedding for K- Way Graph Clustering. Technical Report, Department of Computer Science and Engineering, CSE-OI-007, Pennsylvania State University.
[6] J.A. Hartigan and M.A. Wong. (1979). A K-means Clustering Algorithm. Applied Statistics, 28:100- 108.
[7] L. Lovasz and M.D. Plummer. (1986) Matching Theory. Amsterdam: North Holland.
[8] A. McCallum. Bow: A toolkit for statistical language modeling, text retrieval, classification and clustering. http : //www . CS. cmu. edu/ mccallum/bow.
[9] B. Schi:ilkopf, A. Smola and K.R. Miiller. (1998). Nonlinear Component Analysis as a Kernel Eigenvalue Problem. N eural Computation, 10: 1299- 1219.
[10] N . Slonim and N. Tishby. (2000). Document clustering using word clusters via the information bottleneck method. Proceedings of SIGIR-2000.
[11] G.W. Stewart and J.G. Sun. (1990). Matrix Perturbation Theory. Academic Press, San Diego , CA.