nips nips2008 nips2008-117 nips2008-117-reference knowledge-graph by maker-knowledge-mining

117 nips-2008-Learning Taxonomies by Dependence Maximization


Source: pdf

Author: Matthew Blaschko, Arthur Gretton

Abstract: We introduce a family of unsupervised algorithms, numerical taxonomy clustering, to simultaneously cluster data, and to learn a taxonomy that encodes the relationship between the clusters. The algorithms work by maximizing the dependence between the taxonomy and the original data. The resulting taxonomy is a more informative visualization of complex data than simple clustering; in addition, taking into account the relations between different clusters is shown to substantially improve the quality of the clustering, when compared with state-ofthe-art algorithms in the literature (both spectral clustering and a previous dependence maximization approach). We demonstrate our algorithm on image and text data. 1


reference text

[1] R. Agarwala, V. Bafna, M. Farach, B. Narayanan, M. Paterson, and M. Thorup. On the approximability of numerical taxonomy (fitting distances by tree metrics). In SODA, pages 365–372, 1996.

[2] N. Ailon and M. Charikar. Fitting tree metrics: Hierarchical clustering and phylogeny. In Foundations of Computer Science, pages 73–82, 2005.

[3] F. R. Bach and M. I. Jordan. Learning spectral clustering, with application to speech separation. JMLR, 7:1963–2001, 2006.

[4] R. Baire. Lecons sur les Fonctions Discontinues. Gauthier Villars, 1905. ¸

[5] C. Baker. Joint measures and cross-covariance operators. Transactions of the American Mathematical Society, 186:273–289, 1973.

[6] M. B. Blaschko and A. Gretton. Taxonomy inference using kernel dependence measures. Technical report, Max Planck Institute for Biological Cybernetics, 2008.

[7] D. Blei, T. Griffiths, M. Jordan, and J. Tenenbaum. Hierarchical topic models and the nested chinese restaurant process. In NIPS 16, 2004.

[8] P. Buneman. The Recovery of Trees from Measures of Dissimilarity. In D. Kendall and P. Tautu, editors, Mathematics the the Archeological and Historical Sciences, pages 387–395. Edinburgh U.P., 1971.

[9] T. M. Cover and J. A. Thomas. Elements of Information Theory. Wiley, 1991.

[10] N. Cristianini, J. Shawe-Taylor, A. Elisseeff, and J. Kandola. On kernel-target alignment. In NIPS 14, 2002.

[11] M. Farach, S. Kannan, and T. Warnow. A robust model for finding optimal evolutionary trees. In STOC, pages 137–145, 1993.

[12] K. Fukumizu, F. R. Bach, and M. I. Jordan. Dimensionality reduction for supervised learning with reproducing kernel Hilbert spaces. JMLR, 5:73–99, 2004.

[13] K. Fukumizu, A. Gretton, X. Sun, and B. Sch¨ lkopf. Kernel measures of conditional dependence. In o NIPS 20, 2008.

[14] A. Gretton, O. Bousquet, A. Smola, and B. Sch¨ lkopf. Measuring statistical dependence with Hilberto Schmidt norms. In Algorithmic Learning Theory, pages 63–78, 2005.

[15] B. Harb, S. Kannan, and A. McGregor. Approximating the best-fit tree under lp norms. In APPROXRANDOM, pages 123–133, 2005.

[16] D. Haussler. Convolution kernels on discrete structures. Technical Report UCSC-CRL-99-10, University of California at Santa Cruz, 1999.

[17] R. A. Horn and C. R. Johnson. Matrix Analysis. Cambridge University Press, Cambridge, 1985.

[18] A. K. Jain and R. C. Dubes. Algorithms for Clustering Data. Prentice Hall, 1988.

[19] P. Macnaughton Smith, W. Williams, M. Dale, and L. Mockett. Dissimilarity analysis: a new technique of hierarchical subdivision. Nature, 202:1034–1035, 1965.

[20] C. D. Meyer, Jr. Generalized inversion of modified matrices. SIAM Journal on Applied Mathematics, 24(3):315–323, 1973.

[21] A. Y. Ng, M. I. Jordan, and Y. Weiss. On Spectral Clustering: Analysis and an Algorithm. In NIPS, pages 849–856, 2001.

[22] L. Song, A. Smola, A. Gretton, and K. M. Borgwardt. A Dependence Maximization View of Clustering. In ICML, pages 815–822, 2007.

[23] Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei. Hierarchical dirichlet processes. JASA, 101(476):1566–1581, 2006.

[24] U. von Luxburg. A Tutorial on Spectral Clustering. Statistics and Computing, 17(4):395–416, 2007.

[25] M. S. Waterman, T. F. Smith, M. Singh, and W. A. Beyer. Additive Evolutionary Trees. Journal of Theoretical Biology, 64:199–213, 1977. 8