nips nips2012 nips2012-69 nips2012-69-reference knowledge-graph by maker-knowledge-mining

69 nips-2012-Clustering Sparse Graphs

Source: pdf

Author: Yudong Chen, Sujay Sanghavi, Huan Xu

Abstract: We develop a new algorithm to cluster sparse unweighted graphs – i.e. partition the nodes into disjoint clusters so that there is higher density within clusters, and low across clusters. By sparsity we mean the setting where both the in-cluster and across cluster edge densities are very small, possibly vanishing in the size of the graph. Sparsity makes the problem noisier, and hence more difﬁcult to solve. Any clustering involves a tradeoff between minimizing two kinds of errors: missing edges within clusters and present edges across clusters. Our insight is that in the sparse case, these must be penalized differently. We analyze our algorithm’s performance on the natural, classical and widely studied “planted partition” model (also called the stochastic block model); we show that our algorithm can cluster sparser graphs, and with smaller clusters, than all previous methods. This is seen empirically as well. 1

reference text

[1] P. Holland andK.B. Laskey and S. Leinhardt. Stochastic blockmodels: Some ﬁrst steps. Social Networks, 5:109–137, 1983.

[2] N. Bansal, A. Blum, and S. Chawla. Correlation clustering. Machine Learning, 56(1):89–113, 2004.

[3] H. Becker. A survey of correlation clustering. http://www1.cs.columbia.edu/ hila/clustering.pdf, 2005. Available online at

[4] B. Bollob´ s and AD Scott. Max cut for random graphs with a planted partition. Combinatorics, a Probability and Computing, 13(4-5):451–474, 2004.

[5] R.B. Boppana. Eigenvalues and graph bisection: An average-case analysis. In Foundations of Computer Science, 1987., 28th Annual Symposium on, pages 280–285. IEEE, 1987.

[6] E.J. Candes, X. Li, Y. Ma, and J. Wright. Robust principal component analysis? Arxiv preprint arXiv:0912.3599, 2009.

[7] T. Carson and R. Impagliazzo. Hill-climbing ﬁnds random planted bisections. In Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms, pages 903–909. Society for Industrial and Applied Mathematics, 2001.

[8] V. Chandrasekaran, S. Sanghavi, S. Parrilo, and A. Willsky. Rank-sparsity incoherence for matrix decomposition. SIAM Journal on Optimization, 21(2):572–596, 2011.

[9] M. Charikar, V. Guruswami, and A. Wirth. Clustering with qualitative information. In Foundations of Computer Science, 2003. Proceedings. 44th Annual IEEE Symposium on, pages 524–533. IEEE, 2003.

[10] A. Condon and R.M. Karp. Algorithms for graph partitioning on the planted partition model. Random Structures and Algorithms, 18(2):116–140, 2001.

[11] E. Demaine and N. Immorlica. Correlation clustering with partial information. Approximation, Randomization, and Combinatorial Optimization.. Algorithms and Techniques, pages 71–80, 2003.

[12] E.D. Demaine, D. Emanuel, A. Fiat, and N. Immorlica. Correlation clustering in general weighted graphs. Theoretical Computer Science, 361(2):172–187, 2006.

[13] D. Emanuel and A. Fiat. Correlation clustering–minimizing disagreements on arbitrary weighted graphs. Algorithms-ESA 2003, pages 208–220, 2003.

[14] U. Feige and J. Kilian. Heuristics for semirandom graph problems. Journal of Computer and System Sciences, 63(4):639–671, 2001. 8

[15] J. Giesen and D. Mitsche. Bounding the misclassiﬁcation error in spectral partitioning in the planted partition model. In Graph-Theoretic Concepts in Computer Science, pages 409–420. Springer, 2005.

[16] J. Giesen and D. Mitsche. Reconstructing many partitions using spectral techniques. In Fundamentals of Computation Theory, pages 433–444. Springer, 2005.

[17] A. Jalali, Y. Chen, S. Sanghavi, and H. Xu. Clustering partially observed graphs via convex optimization. Arxiv preprint arXiv:1104.4803, 2011.

[18] M. Jerrum and G.B. Sorkin. The metropolis algorithm for graph bisection. Discrete Applied Mathematics, 82(1-3):155–175, 1998.

[19] C. Mathieu and W. Schudy. Correlation clustering with noisy input. In Proceedings of the Twenty-First Annual ACM-SIAM Symposium on Discrete Algorithms, pages 712–728. Society for Industrial and Applied Mathematics, 2010.

[20] F. McSherry. Spectral partitioning of random graphs. In Foundations of Computer Science, 2001. Proceedings. 42nd IEEE Symposium on, pages 529–537. IEEE, 2001.

[21] S. Oymak and B. Hassibi. Finding dense clusters via ”low rank+ sparse” decomposition. Arxiv preprint arXiv:1104.5186, 2011.

[22] K. Rohe, S. Chatterjee, and B. Yu. Spectral clustering and the high-dimensional stochastic block model. Technical report, Technical Report 791, Statistics Department, UC Berkeley, 2010.

[23] R. Shamir and D. Tsur. Improved algorithms for the random cluster graph model. Random Structures & Algorithms, 31(4):418–449, 2007.

[24] R. Sibson. Slink: an optimally efﬁcient algorithm for the single-link cluster method. The Computer Journal, 16(1):30–34, 1973.

[25] C. Swamy. Correlation clustering: maximizing agreements via semideﬁnite programming. In Proceedings of the ﬁfteenth annual ACM-SIAM symposium on Discrete algorithms, pages 526–527. Society for Industrial and Applied Mathematics, 2004.

[26] U. Von Luxburg. A tutorial on spectral clustering. Statistics and Computing, 17(4):395–416, 2007. 9