nips nips2007 nips2007-61 nips2007-61-reference knowledge-graph by maker-knowledge-mining

61 nips-2007-Convex Clustering with Exemplar-Based Models

Source: pdf

Author: Danial Lashkari, Polina Golland

Abstract: Clustering is often formulated as the maximum likelihood estimation of a mixture model that explains the data. The EM algorithm widely used to solve the resulting optimization problem is inherently a gradient-descent method and is sensitive to initialization. The resulting solution is a local optimum in the neighborhood of the initial guess. This sensitivity to initialization presents a signiﬁcant challenge in clustering large data sets into many clusters. In this paper, we present a different approach to approximate mixture ﬁtting for clustering. We introduce an exemplar-based likelihood function that approximates the exact likelihood. This formulation leads to a convex minimization problem and an efﬁcient algorithm with guaranteed convergence to the globally optimal solution. The resulting clustering can be thought of as a probabilistic mapping of the data points to the set of exemplars that minimizes the average distance and the information-theoretic cost of mapping. We present experimental results illustrating the performance of our algorithm and its comparison with the conventional approach to mixture model clustering. 1

reference text

[1] J. Puzicha, T. Hofmann, and J. M. Buhmann, “Theory of proximity based clustering: Structure detection by optimization,” Pattern Recognition, Vol. 33, No. 4, pp. 617–634, 2000.

[2] A. Y. Ng, M. I. Jordan, and Y. Weiss, “On Spectral Clustering: Analysis and an Algorithml,” Advances in Neural Information Processing Systems, Vol. 14, pp. 849–856, 2001.

[3] I. Csisz´ r and P. Shields, “Information Theory and Statistics: A Tutorial,” Foundations and Trends in a Communications and Information Theory, Vol. 1, No. 4, pp. 417–528, 2004.

[4] M. Meil˘ , and D. Heckerman, “An Experimental Comparison of Model-Based Clustering Methods,” Maa chine Learning, Vol. 42, No. 1-2, pp. 9–29, 2001.

[5] J. Han, and M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann, 2001.

[6] B. J. Frey, and D. Dueck, “Clustering by Passing Messages Between Data Points,” Science, Vol. 315, No. 5814, pp. 972–976, 2007.

[7] A. Banerjee, S. Merugu, I. S.Dhillon, and J. Ghosh, “Clustering with Bregman Divergences,” Journal of Machine Learning Research, Vol. 6, No. 6, pp. 1705-1749, 2005.

[8] T. M. Cover, and J. A. Thomas, Elements of information theory, New York, Wiley, 1991.

[9] K. Rose, “Deterministic Annealing for Clustering, Compression, Classiﬁcation, Regression, and Related Optimization Problems,” Proceedings of the IEEE, Vol. 86, No. 11, pp. 2210–2239, 1998.

[10] R. E. .Blahut, “Computation of Channel Capacity and Rate-Distortion Functions,” IEEE Transactions on Information Theory, Vol. IT-18, No. 4, pp. 460–473, 1974.

[11] M. Pardalos, and J. B. Rosen, “Methods for Global Concave Minimization: A Bibliographic Survey,” SIAM Review, Vol. 28, No. 3., pp. 367–379, 1986. 8