nips nips2005 nips2005-177 nips2005-177-reference knowledge-graph by maker-knowledge-mining

177 nips-2005-Size Regularized Cut for Data Clustering


Source: pdf

Author: Yixin Chen, Ya Zhang, Xiang Ji

Abstract: We present a novel spectral clustering method that enables users to incorporate prior knowledge of the size of clusters into the clustering process. The cost function, which is named size regularized cut (SRcut), is defined as the sum of the inter-cluster similarity and a regularization term measuring the relative size of two clusters. Finding a partition of the data set to minimize SRcut is proved to be NP-complete. An approximation algorithm is proposed to solve a relaxed version of the optimization problem as an eigenvalue problem. Evaluations over different data sets demonstrate that the method is not sensitive to outliers and performs better than normalized cut. 1


reference text

[1] S. Arora, D. Karger, and M. Karpinski, “Polynomial Time Approximation Schemes for Dense Instances of NP-hard Problems,” Proc. ACM Symp. on Theory of Computing, pp. 284-293, 1995.

[2] A. Banerjee and J. Ghosh, “On Scaling up Balanced Clustering Algorithms,” Proc. SIAM Int’l Conf. on Data Mining, pp. 333-349, 2002.

[3] P. K. Chan, D. F. Schlag, and J. Y. Zien, “Spectral k-Way Ratio-Cut Partitioning and Clustering,” IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, 13:1088-1096, 1994.

[4] M. Charikar, S. Khuller, D. M. Mount, and G. Narasimhan, “Algorithms for Facility Location Problems with Outliers,” Proc. ACM-SIAM Symp. on Discrete Algorithms, pp. 642-651, 2001.

[5] I. S. Dhillon, “Co-clustering Documents and Words using Bipartite Spectral Graph Partitioning,” Proc. ACM SIGKDD Conf. Knowledge Discovery and Data Mining, pp. 269-274, 2001.

[6] C. Ding, “Data Clustering: Principal Components, Hopfield and Self-Aggregation Networks,” Proc. Int’l Joint Conf. on Artificial Intelligence, pp. 479-484, 2003.

[7] C. Ding, X. He, H. Zha, M. Gu, and H. Simon, “Spectral Min-Max Cut for Graph Partitioning and Data Clustering,” Proc. IEEE Int’l Conf. Data Mining, pp. 107-114, 2001.

[8] G. H. Golub and C. F. Van Loan, Matrix Computations, John Hopkins Press, 1999.

[9] R. Kannan, S. Vempala, and A. Vetta, “On Clusterings - Good, Bad and Spectral,” Proc. IEEE Symp. on Foundations of Computer Science, pp. 367-377, 2000.

[10] D. R. Karget and M. Minkoff, “Building Steiner Trees with Incomplete Global Knowledge,” Proc. IEEE Symp. on Foundations of Computer Science, pp. 613-623, 2000

[11] B. Kernighan and S. Lin, “An Efficient Heuristic Procedure for Partitioning Graphs,” The Bell System Technical Journal, 49:291-307, 1970.

[12] A. Y. Ng, M. I. Jordan, and Y. Weiss, “On Spectral Clustering: Analysis and an Algorithm,” Advances in Neural Information Processing Systems 14, pp. 849-856, 2001.

[13] W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery, Numerical Recipes in C, second edition, Cambridge University Press, 1992.

[14] A. Rahimi and B. Recht, “Clustering with Normalized Cuts is Clustering with a Hyperplane,” Statistical Learning in Computer Vision, 2004.

[15] J. Shi and J. Malik, “Normalized Cuts and Image Segmentation,” IEEE Trans. on Pattern Analysis and Machine Intelligence, 22:888-905, 2000.

[16] K. Wagstaff, C. Cardie, S. Rogers, and S. Schrodl, “Constrained K-means Clustering with Background Knowledge,” Proc. Int’l Conf. on Machine Learning, pp. 577-584, 2001.

[17] E. P. Xing, A. Y. Ng, M. I. Jordan, and S. Russell, “Distance Metric Learning, with Applications to Clustering with Side Information,” Advances in Neural Information Processing Systems 15, pp. 505-512, 2003.

[18] X. Yu and J. Shi, “Segmentation Given Partial Grouping Constraints,” IEEE Trans. on Pattern Analysis and Machine Intelligence, 26:173-183, 2004.

[19] H. Zha, X. He, C. Ding, H. Simon, and M. Gu, “Spectral Relaxation for K-means Clustering,” Advances in Neural Information Processing Systems 14, pp. 1057-1064, 2001.