nips nips2005 nips2005-102 nips2005-102-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: David Barber, Felix V. Agakov
Abstract: We propose a simple information-theoretic approach to soft clustering based on maximizing the mutual information I(x, y) between the unknown cluster labels y and the training patterns x with respect to parameters of specifically constrained encoding distributions. The constraints are chosen such that patterns are likely to be clustered similarly if they lie close to specific unknown vectors in the feature space. The method may be conveniently applied to learning the optimal affinity matrix, which corresponds to learning parameters of the kernelized encoder. The procedure does not require computations of eigenvalues of the Gram matrices, which makes it potentially attractive for clustering large data sets. 1
Agakov, F. V. and Barber, D. (2006). Auxiliary Variational Information Maximization for Dimensionality Reduction. In Proceedings of the PASCAL Workshop on Subspace, Latent Structure and Feature Selection Techniques. Springer. To appear. Bach, F. R. and Jordan, M. I. (2003). Learning spectral clustering. In NIPS. MIT Press. Barber, D. and Agakov, F. V. (2003). The IM Algorithm: A Variational Approach to Information Maximization. In NIPS. MIT Press. Brunel, N. and Nadal, J.-P. (1998). Mutual Information, Fisher Information and Population Coding. Neural Computation, 10:1731–1757. Chechik, G. and Tishby, N. (2002). Extracting relevant structures with side information. In NIPS, volume 15. MIT Press. Cover, T. M. and Thomas, J. A. (1991). Elements of Information Theory. Wiley, NY. Dhillon, I. S. and Guan, Y. (2003). Information Theoretic Clustering of Sparse CoOccurrence Data. In Proceedings of the 3rd IEEE International Conf. on Data Mining. Dhillon, I. S., Guan, Y., and Kulis, B. (2004). Kernel k-means, Spectral Clustering and Normalized Cuts. In KDD. ACM. Fisher, J. W. and Principe, J. C. (1998). A methodology for information theoretic feature extraction. In Proc. of the IEEE International Joint Conference on Neural Networks. Linsker, R. (1988). Towards an Organizing Principle for a Layered Perceptual Network. In Advances in Neural Information Processing Systems. American Institute of Physics. Ng, A. Y., Jordan, M., and Weiss, Y. (2001). On spectral clustering: Analysis and an algorithm. In NIPS, volume 14. MIT Press. Scholkopf, B. and Smola, A. (2002). Learning with Kernels. MIT Press. Shi, J. and Malik, J. (2000). Normalized Cuts and Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8):888–905. Tishby, N., Pereira, F. C., and Bialek, W. (1999). The information bottleneck method. In Proceedings of the 37-th Annual Allerton Conference on Communication, Control and Computing. Kluwer Academic Publishers. Torkkola, K. and Campbell, W. M. (2000). Mutual Information in Learning Feature Transformations. In ICML. Morgan Kaufmann.