nips nips2003 nips2003-92 nips2003-92-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Gal Chechik, Amir Globerson, Naftali Tishby, Yair Weiss
Abstract: The problem of extracting the relevant aspects of data was addressed through the information bottleneck (IB) method, by (soft) clustering one variable while preserving information about another - relevance - variable. An interesting question addressed in the current work is the extension of these ideas to obtain continuous representations that preserve relevant information, rather than discrete clusters. We give a formal definition of the general continuous IB problem and obtain an analytic solution for the optimal representation for the important case of multivariate Gaussian variables. The obtained optimal representation is a noisy linear projection to eigenvectors of the normalized correlation matrix Σx|y Σ−1 , which x is also the basis obtained in Canonical Correlation Analysis. However, in Gaussian IB, the compression tradeoff parameter uniquely determines the dimension, as well as the scale of each eigenvector. This introduces a novel interpretation where solutions of different ranks lie on a continuum parametrized by the compression level. Our analysis also provides an analytic expression for the optimal tradeoff - the information curve - in terms of the eigenvalue spectrum. 1
[1] N. Tishby, F.C. Pereira, and W. Bialek. The information bottleneck method. In Proc. of 37th Allerton Conference on communication and computation, 1999.
[2] N. Slonim. Information Bottlneck theory and applications. PhD thesis, Hebrew University of Jerusalem, 2003.
[3] H. Hotelling. The most predictable criterion. Journal of Educational Psychology,, 26:139–142, 1935.
[4] S. Becker and G.E. Hinton. A self-organizing neural network that discovers surfaces in random-dot stereograms. Nature, 355(6356):161–163, 1992.
[5] S. Becker. Mutual information maximization: Models of cortical self-organization. Network: Computation in Neural Systems, pages 7–31, 1996.
[6] T. Berger abd R. Zamir. A semi-continuous version of the berger-yeung problem. IEEE Transactions on Information Theory, pages 1520–1526, 1999.
[7] G. Chechik and A. Globerson. Information bottleneck and linear projections of gaussian processes. Technical Report 4, Hebrew University, May 2003.
[8] A.D. Wyner. On source coding with side information at the decoder. IEEE Trans. on Info Theory, IT-21:294–300, 1975.
[9] R. Gilad-Bachrach, A. Navot, and N. Tishby. An information theoretic tradeoff between complexity and accuracy. In Proceedings of the COLT, Washington., 2003.
[10] G. Chechik and N. Tishby. Extracting relevant structures with side information. In S. Becker, S. Thrun, and K. Obermayer, editors, Advances in Neural Information Processing Systems 15, 2002.
[11] N. Slonim and Y. Weiss. Maximum likelihood and the information bottleneck. In S. Becker, S. Thrun, and K. Obermayer, editors, Advances in Neural Information Processing Systems 15, 2002.
[12] S. Mika, G. Ratsch, J. Weston, B. Scholkopf, A. Smola, and K. Muller. Invariant feature extraction and classification in kernel spaces. In S.A. Solla, T.K. Leen, and K.R. Muller, editors, Advances in Neural Information Processing Systems 12, 2000.
[13] A.J. Bell and T.J. Sejnowski. An information maximization approach to blind seperation and blind deconvolution. Neural Computation, 7:1129–1159, 1995.