jmlr jmlr2010 jmlr2010-19 jmlr2010-19-reference knowledge-graph by maker-knowledge-mining

19 jmlr-2010-Characterization, Stability and Convergence of Hierarchical Clustering Methods


Source: pdf

Author: Gunnar Carlsson, Facundo Mémoli

Abstract: We study hierarchical clustering schemes under an axiomatic view. We show that within this framework, one can prove a theorem analogous to one of Kleinberg (2002), in which one obtains an existence and uniqueness theorem instead of a non-existence result. We explore further properties of this unique scheme: stability and convergence are established. We represent dendrograms as ultrametric spaces and use tools from metric geometry, namely the Gromov-Hausdorff distance, to quantify the degree to which perturbations in the input metric space affect the result of hierarchical methods. Keywords: clustering, hierarchical clustering, stability of clustering, Gromov-Hausdorff distance


reference text

S. Ben-David, U. von Luxburg, and D. P´ l. A sober look at clustering stability. In G´ bor Lugosi and a a Hans-Ulrich Simon, editors, COLT, volume 4005 of Lecture Notes in Computer Science, pages 5–19. Springer, 2006. ISBN 3-540-35294-5. 1468 C HARACTERIZATION , S TABILITY AND C ONVERGENCE OF H IERARCHICAL C LUSTERING M ETHODS F. L. Bookstein, B. Chernoff, R. L. Elder, J. M. Humphries Jr., G. R. Smith, and R. E. Strauss. Morphometrics in Evolutionary Biology: the Geometry of Size and Shape Change, with Examples from Fishes. Academy of Natural Sciences of Philadelphia., 1985. M. R. Bridson and A. Haefliger. Metric Spaces of Non-positive Curvature, volume 319 of Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences]. Springer-Verlag, Berlin, 1999. ISBN 3-540-64324-9. D. Burago, Y. Burago, and S. Ivanov. A Course in Metric Geometry, volume 33 of AMS Graduate Studies in Math. American Mathematical Society, 2001. G. Carlsson and F. M´ moli. Persistent clustering and a theorem of J. Kleinberg. ArXiv e-prints, e August 2008. G. Carlsson and F. M´ moli. Classifying clustering schemes. Technical report, Stanford, 2009. e G. Carlsson and F. M´ moli. Multiparameter clustering methods. In Claus Weihs Hermann Locareke Junge, editor, ’Classification as a Tool for Research’. Proceedings of the 11th IFCS Biennial Conference and 33rd Annual Conference of the Gesellschaft fr Klassifikation e.V., Berlin-HeidelbergNew York, to appear 2009. Springer. M. Gromov. Metric Structures for Riemannian and non-Riemannian spaces. Modern Birkh¨ user a Classics. Birkh¨ user Boston Inc., Boston, MA, english edition, 2007. ISBN 978-0-8176-4582-3; a 0-8176-4582-9. Based on the 1981 French original, With appendices by M. Katz, P. Pansu and S. Semmes, Translated from the French by Sean Michael Bates. M. Gromov. Hyperbolic groups. In Essays in Group Theory, volume 8 of Math. Sci. Res. Inst. Publ., pages 75–263. Springer, New York, 1987. J. A. Hartigan. Consistency of single linkage for high-density clusters. J. Amer. Statist. Assoc., 76 (374):388–394, 1981. ISSN 0162-1459. J. A. Hartigan. Statistical theory in clustering. J. Classification, 2(1):63–76, 1985. ISSN 0176-4268. A. K. Jain and R. C. Dubes. Algorithms for Clustering Data. Prentice Hall Advanced Reference Series. Prentice Hall Inc., Englewood Cliffs, NJ, 1988. ISBN 0-13-022278-X. N. Jardine and R. Sibson. Mathematical Taxonomy. John Wiley & Sons Ltd., London, 1971. Wiley Series in Probability and Mathematical Statistics. D. G. Kendall, D. Barden, T. K. Carne, and H. Le. Shape and Shape Theory. Wiley Series in Probability and Statistics. John Wiley & Sons Ltd., Chichester, 1999. ISBN 0-471-96823-4. J. M. Kleinberg. An impossibility theorem for clustering. In Suzanna Becker, Sebastian Thrun, and Klaus Obermayer, editors, NIPS, pages 446–453. MIT Press, 2002. ISBN 0-262-02550-7. G. N. Lance and W. T. Williams. A general theory of classificatory sorting strategies 1. Hierarchical systems. Computer Journal, 9(4):373–380, February 1967. ISSN 0010-4620. W. Stuetzle. Estimating the cluster type of a density by analyzing the minimal spanning tree of a sample. J. Classification, 20(1):25–47, 2003. ISSN 0176-4268. 1469 ´ C ARLSSON AND M E MOLI U. von Luxburg and S. Ben-David. Towards a statistical theory of clustering. Technical report, PASCAL workshop on clustering, London, 2005. D. Wishart. Mode analysis: a generalization of nearest neighbor which reduces chaining effects. In Numerical Taxonomy, pages 282–311. Academic Press, 1969. R. Zadeh and S. Ben-David. A uniqueness theorem for clustering. In Proceedings of the Proceedings of the Twenty-Fifth Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI09), page 8, Corvallis, Oregon, 2009. AUAI Press. 1470