nips nips2010 nips2010-223 nips2010-223-reference knowledge-graph by maker-knowledge-mining

223 nips-2010-Rates of convergence for the cluster tree


Source: pdf

Author: Kamalika Chaudhuri, Sanjoy Dasgupta

Abstract: For a density f on Rd , a high-density cluster is any connected component of {x : f (x) ≥ λ}, for some λ > 0. The set of all high-density clusters form a hierarchy called the cluster tree of f . We present a procedure for estimating the cluster tree given samples from f . We give finite-sample convergence rates for our algorithm, as well as lower bounds on the sample complexity of this estimation problem. 1


reference text

[1] O. Bousquet, S. Boucheron, and G. Lugosi. Introduction to statistical learning theory. Lecture Notes in Artificial Intelligence, 3176:169–207, 2004.

[2] T. Cover and J. Thomas. Elements of Information Theory. Wiley, 2005.

[3] S. Dasgupta and Y. Freund. Random projection trees for vector quantization. IEEE Transactions on Information Theory, 55(7):3229–3242, 2009.

[4] S. Dasgupta, A. Kalai, and C. Monteleoni. Analysis of perceptron-based active learning. Journal of Machine Learning Research, 10:281–299, 2009.

[5] J.A. Hartigan. Consistency of single linkage for high-density clusters. Journal of the American Statistical Association, 76(374):388–394, 1981.

[6] M. Maier, M. Hein, and U. von Luxburg. Optimal construction of k-nearest neighbor graphs for identifying noisy clusters. Theoretical Computer Science, 410:1749–1764, 2009.

[7] M. Penrose. Single linkage clustering and continuum percolation. Journal of Multivariate Analysis, 53:94–109, 1995.

[8] D. Pollard. Strong consistency of k-means clustering. Annals of Statistics, 9(1):135–140, 1981.

[9] P. Rigollet and R. Vert. Fast rates for plug-in estimators of density level sets. Bernoulli, 15(4):1154–1178, 2009.

[10] A. Rinaldo and L. Wasserman. 38(5):2678–2722, 2010. Generalized density clustering. Annals of Statistics,

[11] A. Singh, C. Scott, and R. Nowak. Adaptive hausdorff estimation of density level sets. Annals of Statistics, 37(5B):2760–2782, 2009.

[12] W. Stuetzle and R. Nugent. A generalized single linkage method for estimating the cluster tree of a density. Journal of Computational and Graphical Statistics, 19(2):397–418, 2010.

[13] D. Wishart. Mode analysis: a generalization of nearest neighbor which reduces chaining effects. In Proceedings of the Colloquium on Numerical Taxonomy held in the University of St. Andrews, pages 282–308, 1969.

[14] M.A. Wong and T. Lane. A kth nearest neighbour clustering procedure. Journal of the Royal Statistical Society Series B, 45(3):362–368, 1983.

[15] B. Yu. Assouad, Fano and Le Cam. Festschrift for Lucien Le Cam, pages 423–435, 1997. 9