jmlr jmlr2012 jmlr2012-109 jmlr2012-109-reference knowledge-graph by maker-knowledge-mining

109 jmlr-2012-Stability of Density-Based Clustering


Source: pdf

Author: Alessandro Rinaldo, Aarti Singh, Rebecca Nugent, Larry Wasserman

Abstract: High density clusters can be characterized by the connected components of a level set L(λ) = {x : p(x) > λ} of the underlying probability density function p generating the data, at some appropriate level λ ≥ 0. The complete hierarchical clustering can be characterized by a cluster tree T = λ L(λ). In this paper, we study the behavior of a density level set estimate L(λ) and cluster tree estimate T based on a kernel density estimator with kernel bandwidth h. We define two notions of instability to measure the variability of L(λ) and T as a function of h, and investigate the theoretical properties of these instability measures. Keywords: clustering, density estimation, level sets, stability, model selection


reference text

S. Ben-David, U. von Luxburg, and D. P´ l. A sober look at clustering stability. In COLT, pages a 5–19, 2006. A. Ben-Hur, A. Elisseef, and I. Guyon. A stability based method for discovering structure in clustered data. Pacific Symposium on Biocomputing, pages 6–17, 2002. 946 S TABILITY OF D ENSITY-BASED C LUSTERING B. Cadre, B. Pelletier, sity level sets at a and P. Pudlo. fixed probability, Clustering by estimation of den2009. Manuscript available at http://w3.bretagne.ens-cachan.fr/math/people/benoit.cadre/fichiers/tlevel.pdf. G. Carlsson and F. Memoli. Characterization, stability and convergence of hierarchical clustering methods. Journal of Machine Learning Research, 11:1425–1470, 2010. K. Chaudhuri and S. Dasgupta. Rates of convergence for the cluster tree. In Neural Information Processing Systems (NIPS), December 2010. P. Chaudhuri and S.J. Marron. Scale space view of curve estimation. Annals of Statistics, 28(2): 408–428, 2000. A. Cuevas and A. Rodr´guez-Casal. On boundary estimation. Advances in Applied Probability, 36: ı 340–354, 2004. A. Cuevas, W. Gonz´ lez-Manteiga, and A. Rodr´guez-Casal. Plug-in estimation of general level a ı sets. Australian and New Zealand Journal of Statistics, 48(1):7–19, 2006. P. Deheuvels, J. Einmahl, D. Mason, and F. F. Ruymgaart. The almost sure behavior of maximal and minimal multivariate kn-spacings. Journal of Multivariate Analysis, 24:155–176, 1988. L. Devroye, , L. Gy¨ rfi, and G. Lugosi. A Probabilistic Theory of Pattern Recognition. Springer, o 1996. B. Fischer and J. M. Buhmann. Bagging for path-based clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25:1411–1415, 2003. E. Gin´ and A. Guillou. Rates of strong uniform consistency for multivariate kernel density estimae tors. Annales de l’institut Henri Poincar´ (B), Probabilit´ s et Statistiques, 38:907–921, 2002. e e J. Hartigan. Clustering Algorithms. Wiley, 1975. C. Jackson. Displaying uncertainty with shading. The American Statistician, 62(4):340–347, 2008. S. Kpotufe and U. von Luxburg. Pruning nearest neighbor cluster trees. In Proceedings of the 28th International Conference on Machine Learning (ICML-11). Omnipress, 2011. T. Lange, V. Roth, M. Braun, and J.Buhmann. Stability-based validation of clustering solutions. Neural Computation, 16:1299–1323, 2004. E. Mammen and A. B. Tsybakov. Smooth discrimination analysis. Annals of Statistics, 27(6): 1808–1829, 1999. P. Massart. Concentration Inequalities and Model Selection. Number 1896 in Springer Lecture Notes in Mathematics. Springer, 2006. N. Meinshausen and P. B¨ hlmann. Stability selection. Journal of the Royal Statistical Society: u Series B (Statistical Methodology), 72:417–473, 2010. M. Penrose. Random Geometric Graphs. Oxford University Press, 2003. 947 R INALDO , S INGH , N UGENT AND WASSERMAN W. Polonik. Measuring mass concentration and estimating density contour clusters–an excess mass approach. Annals of Statistics, 32(3):855–881, 1995. P. Rigollet and R. Vert. Optimal rates for plug-in estimators of density level sets. Bernoulli, 15: 1154–1178, 2009. A. Rinaldo and L. Wasserman. Generalized density clustering. The Annals of Statistics, 38(5): 2678–2722, 2010. D. W. Scott. Multivariate Density Estimation: Theory, Practice, and Visualization. Series in Probability and Mathematical Statistics: Applied Probability and Statistics. Wiley, 1992. A. Singh, C. Scott, and R. Nowak. Adaptive hausdorff estimation of level sets. The Annals of Statistics, 37(5B):2760–2782, 2009. I. Steinwart. Adaptive density level set clustering. In Proceedings of the Twenty-Fourth Annual Conference on Learning Theory (COLT’11). Omnipress, 2011. W. Stuetzle and R. Nugent. A generalized single linkage method for estimating the cluster tree of a density. Journal of Computational and Graphical Statistics, 19:1–22, 2009. A. B. Tsybakov. On nonparametric estimation of density level sets. Annals of Statistics, 25(3): 948–969, 1997. A. B. Tsybakov. Optimal aggregation of classifiers in statistical learning. Annals of Statistics, 32 (1):135–166, 2004. U. von Luxburg. Clustering stability: An overview. Foundations and Trends in Machine Learning, 2:235–274, 2009. M. P. Wand. Fast computation of multivariate kernel estimators. Journal of Computational and Graphical Statistics, 3(4):433–445, 1994. L. Wasserman. All of Statistics. Springer, New York, N.Y., 2004. 948