nips nips2004 nips2004-103 nips2004-103-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Ulrike V. Luxburg, Olivier Bousquet, Mikhail Belkin
Abstract: An important aspect of clustering algorithms is whether the partitions constructed on finite samples converge to a useful clustering of the whole data space as the sample size increases. This paper investigates this question for normalized and unnormalized versions of the popular spectral clustering algorithm. Surprisingly, the convergence of unnormalized spectral clustering is more difficult to handle than the normalized case. Even though recently some first results on the convergence of normalized spectral clustering have been obtained, for the unnormalized case we have to develop a completely new approach combining tools from numerical integration, spectral and perturbation theory, and probability. It turns out that while in the normalized case, spectral clustering usually converges to a nice partition of the data space, in the unnormalized case the same only holds under strong additional assumptions which are not always satisfied. We conclude that our analysis gives strong evidence for the superiority of normalized spectral clustering. It also provides a basis for future exploration of other Laplacian-based methods. 1
P. Anselone. Collectively compact operator approximation theory. Prentice-Hall, 1971. M. Belkin and P. Niyogi. Using manifold structure for partially labeled classification. In Advances in Neural Information Processing Systems 15, 2003. O. Chapelle, J. Weston, and B. Sch¨ lkopf. Cluster kernels for semi-supervised learning. In o Advances in Neural Information Processing Systems 15, 2003. F. Chatelin. Spectral Approximation of Linear Operators. Academic Press, 1983. S. Guattery and G. L. Miller. On the quality of spectral separators. SIAM Journal of Matrix Anal. Appl., 19(3), 1998. J. Hartigan. Statistical theory in clustering. Journal of classification, 2:63–76, 1985. R. Kannan, S. Vempala, and A. Vetta. On clusterings - good, bad and spectral. Technical report, Computer Science Department, Yale University, 2000. T. Kato. Perturbation theory for linear operators. Springer, Berlin, 1966. M. Meila and J. Shi. A random walks view of spectral segmentation. In 8th International Workshop on Artificial Intelligence and Statistics, 2001. A. Ng, M. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. In Advances in Neural Information Processing Systems 14, 2001. D. Pollard. Strong consistency of k-means clustering. Ann. of Stat., 9(1):135–140, 1981. D. Spielman and S. Teng. Spectral partitioning works: planar graphs and finite element meshes. In 37th Annual Symposium on Foundations of Computer Science, 1996. A. v. d. Vaart and J. Wellner. Weak Convergence and Empirical Processes. Springer, 1996. R. Van Driessche and D. Roose. An improved spectral bisection algorithm and its application to dynamic load balancing. Parallel Comput., 21(1), 1995. U. von Luxburg. Statistical Learning with Similarity and Dissimilarity Functions. PhD thesis, draft, available at http://www.kyb.tuebingen.mpg.de/˜ule, 2004. U. von Luxburg, O. Bousquet, and M. Belkin. On the convergence of spectral clustering on random samples: the normalized case. In COLT, 2004. Y. Weiss. Segmentation using eigenvectors: A unifying view. In Proceedings of the International Conference on Computer Vision, pages 975–982, 1999. D. Zhou, O. Bousquet, T. Lal, J. Weston, and B. Sch¨ lkopf. Learning with local and global o consistency. In Advances in Neural Information Processing Systems 16, 2004. X. Zhu, Z. Ghahramani, and J. Lafferty. Semi-supervised learning using Gaussian fields and harmonic functions. In ICML, 2003.