nips nips2006 nips2006-67 nips2006-67-reference knowledge-graph by maker-knowledge-mining

67 nips-2006-Differential Entropic Clustering of Multivariate Gaussians

Source: pdf

Author: Jason V. Davis, Inderjit S. Dhillon

Abstract: Gaussian data is pervasive and many learning algorithms (e.g., k-means) model their inputs as a single sample drawn from a multivariate Gaussian. However, in many real-life settings, each input object is best described by multiple samples drawn from a multivariate Gaussian. Such data can arise, for example, in a movie review database where each movie is rated by several users, or in time-series domains such as sensor networks. Here, each input can be naturally described by both a mean vector and covariance matrix which parameterize the Gaussian distribution. In this paper, we consider the problem of clustering such input objects, each represented as a multivariate Gaussian. We formulate the problem using an information theoretic approach and draw several interesting theoretical connections to Bregman divergences and also Bregman matrix divergences. We evaluate our method across several domains, including synthetic data, sensor network data, and a statistical debugging application. 1

reference text

[1] A. Banerjee, S. Merugu, I. Dhillon, and S. Ghosh. Clustering with Bregman divergences. In Siam International Conference on Data Mining, pages 234–245, 2004.

[2] L. Bregman. The relaxation method ﬁnding the common point of convex sets and its application to the solutions of problems in convex programming. In USSR Comp. of Mathematics and Mathematical Physics, volume 7, pages 200–217, 1967.

[3] T. M. Cover and J. A. Thomas. Elements of information theory. Wiley Series in Telecommunications, 1991.

[4] A. Deshpande, C. Guestrin, S. Madden, J. Hellerstein, and W. Hong. Model-based approximate querying in sensor networks. In International Journal of Very Large Data Bases, 2005.

[5] I. Dhillon, S. Mallela, and R. Kumar. A divisive information-theoretic feature clustering algorithm for text classiﬁcation. In Journal of Machine Learning Research, volume 3, pages 1265–1287, 2003.

[6] R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classiﬁcation. John Wiley and Sons, Inc., 2001.

[7] J. Ha, H. Ramadan, J. Davis, C. Rossbach, I. Roy, and E. Witchel. Navel: Automating software support by classifying program behavior. Technical Report TR-06-11, University of Texas at Austin, 2006.

[8] S. Madden. Intel lab data. http://berkeley.intel-research.net/labdata, 2004.

[9] T. Myrvoll and F. Soong. On divergence based clustering of normal distributions and its application to HMM adaptation. In Eurospeech, pages 1517–1520, 2003.

[10] Y. Singer and M. Warmuth. Batch and on-line parameter estimation of Gaussian mixtures based on the joint entropy. In Neural Information Processing Systems, 1998.

[11] T. Yoshimura, T. Masuko, K. Tokuda, T. Kobayashi, and T. Kitamura. Speaker interpolation in HMMbased speech synthesis. In European Conference on Speech Communication and Technology, 1997.

[12] A. Zheng, M. Jordan, B. Liblit, and A. Aiken. Statistical debugging of sampled programs. In Neural Information Processing Systems, 2004.