nips nips2004 nips2004-167 nips2004-167-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Zhengdong Lu, Todd K. Leen
Abstract: While clustering is usually an unsupervised operation, there are circumstances in which we believe (with varying degrees of certainty) that items A and B should be assigned to the same cluster, while items A and C should not. We would like such pairwise relations to influence cluster assignments of out-of-sample data in a manner consistent with the prior knowledge expressed in the training set. Our starting point is probabilistic clustering based on Gaussian mixture models (GMM) of the data distribution. We express clustering preferences in the prior distribution over assignments of data points to clusters. This prior penalizes cluster assignments according to the degree with which they violate the preferences. We fit the model parameters with EM. Experiments on a variety of data sets show that PPC can consistently improve clustering results.
[1] K. Wagstaff, C. Cardie, S. Rogers, and S. Schroedl. Constrained K-means clustering with background knowledge. In Proceedings of the Eighteenth International Conference on Machine Learning, pages 577–584, 2001.
[2] S. Basu, A. Bannerjee, and R. Mooney. Semi-supervised clustering by seeding. In Proceedings of the Nineteenth International Conference on Machine Learning, pages 19–26, 2002.
[3] D. Klein, S. Kamvar, and C. Manning. From instance Level to space-level constraints: making the most of prior knowledge in data clustering. In Proceedings of the Nineteenth International Conference on Machine Learning, pages 307–313, 2002.
[4] N. Shental, A. Bar-Hillel, T. Hertz, and D. Weinshall. Computing Gaussian mixture models with EM using equivalence constraints. In Advances in Neural Information Processing System, volume 15, 2003.
[5] A. Dempster, N. Laird, and D. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39:1–38, 1977.
[6] R. Neal. Probabilistic inference using Markov Chain Monte Carlo methods. Technical Report CRG-TR-93-1, Computer Science Department, Toronto University, 1993.
[7] C. Bouman and M. Shapiro. A multiscale random field model for Bayesian image segmentation. IEEE Trans. Image Processing, 3:162–177, March 1994.