nips nips2005 nips2005-159 nips2005-159-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Mukund Narasimhan, Nebojsa Jojic, Jeff A. Bilmes
Abstract: We show that Queyranne’s algorithm for minimizing symmetric submodular functions can be used for clustering with a variety of different objective functions. Two specific criteria that we consider in this paper are the single linkage and the minimum description length criteria. The first criterion tries to maximize the minimum distance between elements of different clusters, and is inherently “discriminative”. It is known that optimal clusterings into k clusters for any given k in polynomial time for this criterion can be computed. The second criterion seeks to minimize the description length of the clusters given a probabilistic generative model. We show that the optimal partitioning into 2 clusters, and approximate partitioning (guaranteed to be within a factor of 2 of the the optimal) for more clusters can be computed. To the best of our knowledge, this is the first time that a tractable algorithm for finding the optimal clustering with respect to the MDL criterion for 2 clusters has been given. Besides the optimality result for the MDL criterion, the chief contribution of this paper is to show that the same algorithm can be used to optimize a broad class of criteria, and hence can be used for many application specific criterion for which efficient algorithm are not known.
[1] M. Queyranne. “Minimizing symmetric submodular functions”, Math. Programming, 82, pages 3–12. 1998.
[2] R. Rizzi, “On Minimizing symmetric set functions”, Combinatorica 20(3), pages 445–450, 2000.
[3] L. Xu, J. Neufeld, B. Larson and D. Schuurmans. “Maximum Margin Clustering”, in Advances in Neural Information Processing Systems 17, pages 1537-1544, 2005.
[4] Z. Lin and R. B. Altman. “Finding Haplotype Tagging SNPs by Use of Principal Components Analysis”, Am. J. Hum. Genet. 75, pages 850-861, 2004.
[5] Jain, A.K. and R.C. Dubes, “Algorithms for Clustering Data.” Englewood Cliffs, N.J.: Prentice Hall, 1988.
[6] P. Brucker, “On the complexity of clustering problems,” in R. Henn, B. Korte, and W. Oletti (eds.), Optimization and Operations Research, Lecture Notes in Economics and Mathematical Systems, Springer, Berlin 157.
[7] P. Kontkanen, P. Myllym¨ ki, W. Buntine, J. Rissanen and H. Tirri. “An MDL framework for a data clustering”, HIIT Technical Report 2004.
[8] M. Delattre and P. Hansen. “Bicriterion Cluster Analysis”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol-2, No. 4, 1980
[9] M. Narasimhan, N. Jojic and J. Bilmes. “Q-Clustering”, Technical Report, Dept. of Electrical Engg., University of Washington, UWEETR-2006-0001, 2005