nips nips2011 nips2011-198 nips2011-198-reference knowledge-graph by maker-knowledge-mining

198 nips-2011-On U-processes and clustering performance

Source: pdf

Author: Stéphan J. Clémençcon

Abstract: Many clustering techniques aim at optimizing empirical criteria that are of the form of a U -statistic of degree two. Given a measure of dissimilarity between pairs of observations, the goal is to minimize the within cluster point scatter over a class of partitions of the feature space. It is the purpose of this paper to deﬁne a general statistical framework, relying on the theory of U -processes, for studying the performance of such clustering methods. In this setup, under adequate assumptions on the complexity of the subsets forming the partition candidates, the √ excess of clustering risk is proved to be of the order OP (1/ n). Based on recent results related to the tail behavior of degenerate U -processes, it is also shown how to establish tighter rate bounds. Model selection issues, related to the number of clusters forming the data partition in particular, are also considered. 1

reference text

[BB06] G. Biau and L. Bleakley. Statistical Inference on Graphs. Statistics & Decisions, 24:209–232, 2006. [BBL05] S. Boucheron, O. Bousquet, and G. Lugosi. Theory of Classiﬁcation: A Survey of Some Recent Advances. ESAIM: Probability and Statistics, 9:323–375, 2005. [BD04] S. Ben-David. A framework for statistical clustering with a constant time approximation algorithms for k-median clustering. In Proceedings of COLT’04, Lecture Notes in Computer Science, Volume 3120/2004, 415-426, 2004. [BDL08] G. Biau, L. Devroye, and G. Lugosi. On the Performance of Clustering in Hilbert Space. IEEE Trans. Inform. Theory, 54(2):781–790, 2008. [BvL09] S. Bubeck and U. von Luxburg. Nearest neighbor clustering: A baseline method for consistent clustering with arbitrary objective functions. Journal of Machine Learning Research, 10:657–698, 2009. [CFZ09] B. Clarke, E. Fokou´ , and H.. Zhang. Principles and Theory for Data-Mining and Machinee Learning. Springer, 2009. [CLV08] S. Cl´ mencon, G. Lugosi, and N. Vayatis. Ranking and empirical risk minimization of U-statistics. e ¸ The Annals of Statistics, 36(2):844–874, 2008. [DGL96] L. Devroye, L. Gy¨ rﬁ, and G. Lugosi. A Probabilistic Theory of Pattern Recognition. Springer, o 1996. [dlPG99] V. de la Pena and E. Gin´ . Decoupling: from Dependence to Independence. Springer, 1999. e [Dud99] R.M. Dudley. Uniform Central Limit Theorems. Cambridge University Press, 1999. [Har78] J.A. Hartigan. Asymptotic distributions for clustering criteria. The Annals of Statistics, 6:117–131, 1978. [Hoe48] W. Hoeffding. A class of statistics with asymptotically normal distribution. Ann. Math. Stat., 19:293–325, 1948. [HTF09] T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning (2nd ed.), pages 520–528. Springer, 2009. [KN02] S. Kutin and P. Niyogi. Almost-everywhere algorithmic stability and generalization error. In Proceedings of the of the 18th Conference in Uncertainty in Artiﬁcial Intelligence, 2002. [Kol06] V. Koltchinskii. Local Rademacher complexities and oracle inequalities in risk minimization (with discussion). The Annals of Statistics, 34:2593–2706, 2006. [PFvN89] R. Peck, L. Fisher, and J. van Ness. Bootstrap conﬁdence intervals for the number of clusters in cluster analysis. J. Am. Stat. Assoc., 84:184–191, 1989. [Pol81] D. Pollard. Strong consistency of k-means clustering. The Annals of Statistics, 9:135–140, 1981. [Pol82] D. Pollard. A central limit theorem for k-means clustering. The Annals of Probability, 10:919–926, 1982. [Ser80] R.J. Serﬂing. Approximation theorems of mathematical statistics. Wiley, 1980. [ST08] O. Shamir and N. Tishby. Model selection and stability in k-means clustering. In in Proceedings of the 21rst Annual Conference on Learning Theory, 2008. [ST09] O. Shamir and N. Tishby. On the reliability of clustering stability in the large sample regime. In Advances in Neural Information Processing Systems 21, 2009. [TWH01] R. Tibshirani, G. Walther, and T. Hastie. Estimating the number of clusters in a data set via the gap statistic. J. Royal Stat. Soc., 63(2):411–423, 2001. [vdV98] A. van der Vaart. Asymptotic Statistics. Cambridge University Press, 1998. [vL09] U. von Luxburg. Clustering stability: An overview. Foundations and Trends in Machine Learning, 2(3):235–274, 2009. [vLBD05] U. von Luxburg and S. Ben-David. Towards a statistical theory of clustering. In Pascal workshop on Statistics and Optimization of Clustering, 2005. [vLBD06] U. von Luxburg and S. Ben-David. A sober look at clustering stability. In Proceedings of the 19th Conference on Learning Theory, 2006. [vLBD08] U. von Luxburg and S. Ben-David. Relating clustering stability to properties of cluster boundaries. In Proceedings of the 21th Conference on Learning Theory, 2008. [WT10] D. M. Witten and R. Tibshirani. A framework for feature selection in clustering. J. Amer. Stat. Assoc., 105(490):713–726, 2010. 9