nips nips2011 nips2011-186 nips2011-186-reference knowledge-graph by maker-knowledge-mining

186 nips-2011-Noise Thresholds for Spectral Clustering

Source: pdf

Author: Sivaraman Balakrishnan, Min Xu, Akshay Krishnamurthy, Aarti Singh

Abstract: Although spectral clustering has enjoyed considerable empirical success in machine learning, its theoretical properties are not yet fully developed. We analyze the performance of a spectral algorithm for hierarchical clustering and show that on a class of hierarchically structured similarity matrices, this algorithm can tolerate noise that grows with the number of data points while still perfectly recovering the hierarchical clusters with high probability. We additionally improve upon previous results for k-way spectral clustering to derive conditions under which spectral clustering makes no mistakes. Further, using minimax analysis, we derive tight upper and lower bounds for the clustering problem and compare the performance of spectral clustering to these information theoretic limits. We also present experiments on simulated and real world data illustrating our results. 1

reference text

[1] Dimitris Achlioptas and Frank Mcsherry. On spectral learning of mixtures of distributions. In Computational Learning Theory, pages 458–469, 2005.

[2] S. Charles Brubaker and Santosh Vempala. Isotropic pca and afﬁne-invariant clustering. In FOCS, pages 551–560, 2008.

[3] Wei-Chen Chen. Phylogenetic Clustering with R package phyclust, 2010.

[4] Joseph L. DeRisi, Vishwanath R. Iyer, and Patrick O. Brown. Exploring the Metabolic and Genetic Control of Gene Expression on a Genomic Scale. Science, 278(5338):680–686, 1997.

[5] Robert D. Finn, Jaina Mistry, John Tate, Penny Coggill, Andreas Heger, Joanne E. Pollington, O. Luke Gavin, Prasad Gunesekaran, Goran Ceric, Kristoffer Forslund, Liisa Holm, Erik L. Sonnhammer, Sean R. Eddy, and Alex Bateman. The Pfam Protein Families Database. Nucleic Acids Research, 2010.

[6] Dorit S. Hochbaum and David B. Shmoys. A Best Possible Heuristic for the K-Center Problem. Mathematics of Operations Research, 10:180–184, 1985.

[7] Ling Huang, Donghui Yan, Michael I. Jordan, and Nina Taft. Spectral Clustering with Perturbed Data. In Advances in Neural Inforation Processing Systems, 2009.

[8] Ravindran Kannan, Hadi Salmasian, and Santosh Vempala. The spectral method for general mixture models. In 18th Annual Conference on Learning Theory (COLT, pages 444–457, 2005.

[9] Frank McSherry. Spectral partitioning of random graphs. In IEEE Symposium on Foundations of Computer Science, page 529, 2001.

[10] Andrew Y. Ng, Michael I. Jordan, and Yair Weiss. On Spectral Clustering: Analysis and an Algorithm. In Advances in Neural Information Processing Systems, pages 849–856. MIT Press, 2001.

[11] Karl Rohe, Sourav Chatterjee, and Bin Yu. Spectral Clustering and the High-Dimensional Stochastic Block Model. Technical Report 791, Statistics Department, UC Berkeley, 2010.

[12] Sajama and Alon Orlitsky. Estimating and Computing Density Based Distance Metrics. In ICML05, 22nd International Conference on Machine Learning, 2005.

[13] Dan Spielman. Lecture Notes on Spectral Graph Theory, 2009.

[14] Terence Tao. Course notes on random Matrix Theory, 2010.

[15] Alexandre B. Tsybakov. Introduction a lÉstimation Non-paramÃl’trique. Springer, 2004.

[16] Ulrike von Luxburg. A Tutorial on Spectral Clustering. Technical Report 149, Max Planck Institute for Biological Cybernetics, August 2006.

[17] Ulrike von Luxburg, Mikhail Belkin, and Olivier Bousquet. Consistency of Spectral Clustering. In The Annals of Statistics, pages 857–864. MIT Press, 2004. 9