nips nips2006 nips2006-181 nips2006-181-reference knowledge-graph by maker-knowledge-mining

181 nips-2006-Stability of $K$-Means Clustering

Source: pdf

Author: Alexander Rakhlin, Andrea Caponnetto

Abstract: We phrase K-means clustering as an empirical risk minimization procedure over a class HK and explicitly calculate the covering number for this class. Next, we show that stability of K-means clustering is characterized by the geometry of HK with respect to the underlying distribution. We prove that in the case of a unique global minimizer, the clustering solution is stable with respect to complete changes of the data, while for the case of multiple minimizers, the change of Ω(n1/2 ) samples deﬁnes the transition between stability and instability. While for a ﬁnite number of minimizers this result follows from multinomial distribution estimates, the case of inﬁnite minimizers requires more reﬁned tools. We conclude by proving that stability of the functions in HK implies stability of the actual centers of the clusters. Since stability is often used for selecting the number of clusters in practice, we hope that our analysis serves as a starting point for ﬁnding theoretically grounded recipes for the choice of K. 1

reference text

[1] Shai Ben-David. A framework for statistical clustering with a constant time approximation algorithms for k-median clustering. In COLT, pages 415–426, 2004.

[2] Ulrike von Luxburg and Shai Ben-David. Towards a statistical theory of clustering. PASCAL Workshop on Statistics and Optimization of Clustering, 2005.

[3] Shai Ben-David, Ulrike von Luxburg, and David Pal. A sober look at clustering stability. In COLT, 2006.

[4] A. Rakhlin. Stability of clustering methods. NIPS Workshop ”Theoretical Foundations of Clustering”, December 2005.

[5] A. Ben-Hur, A. Elisseeff, and I. Guyon. A stability based method for discovering structure in clustered data. In Pasiﬁc Symposium on Biocomputing, volume 7, pages 6–17, 2002.

[6] T. Lange, M. Braun, V. Roth, and J. Buhmann. Stability-based model selection. In NIPS, 2003.

[7] Joachim M. Buhmann. Empirical risk approximation: An induction principle for unsupervised learning. Technical Report IAI-TR-98-3, 3, 1998.

[8] A. Caponnetto and A. Rakhlin. Some properties of empirical risk minimization over Donsker classes. AI Memo 2005-018, Massachusetts Institute of Technology, May 2005.

[9] A. Caponnetto and A. Rakhlin. Stability properties of empirical risk minimization over Donsker classes. Journal of Machine Learning Research. Accepted. Available at http://cbcl.mit.edu/people/rakhlin/erm.pdf, 2006.

[10] Trevor Hastie, Robert Tibshirani, and Jerome Friedman. The Elements of Statistical Learning - Data Mining, Inference, and Prediction. Springer, 2002.

[11] Marina Meil˘ . Comparing clusterings: an axiomatic view. In ICML ’05: Proceedings of the a 22nd international conference on Machine learning, pages 577–584, New York, NY, USA, 2005. ACM Press.

[12] S.A. van de Geer. Empirical Processes in M-Estimation. Cambridge University Press, 2000.