nips nips2011 nips2011-123 nips2011-123-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Jakob H. Macke, Iain Murray, Peter E. Latham
Abstract: Maximum entropy models have become popular statistical models in neuroscience and other areas in biology, and can be useful tools for obtaining estimates of mutual information in biological systems. However, maximum entropy models fit to small data sets can be subject to sampling bias; i.e. the true entropy of the data can be severely underestimated. Here we study the sampling properties of estimates of the entropy obtained from maximum entropy models. We show that if the data is generated by a distribution that lies in the model class, the bias is equal to the number of parameters divided by twice the number of observations. However, in practice, the true distribution is usually outside the model class, and we show here that this misspecification can lead to much larger bias. We provide a perturbative approximation of the maximally expected bias when the true model is out of model class, and we illustrate our results using numerical simulations of an Ising model; i.e. the second-order maximum entropy distribution on binary data. 1
[1] C.E. Shannon and W. Weaver. The mathematical theory of communication. University of Illinois Press, 1949.
[2] T.M. Cover, J.A. Thomas, J. Wiley, et al. Elements of information theory, volume 6. Wiley Online Library, 1991.
[3] F. Rieke, D. Warland, R. de R uytervansteveninck, and W. Bialek. Spikes: exploring the neural code (computational neuroscience). The MIT Press, 1999. 8
[4] A. Borst and F. E. Theunissen. Information theory and neural coding. Nat Neurosci, 2(11):947–957, 1999 Nov.
[5] L. Paninski. Estimation of entropy and mutual information. Neural Computation, 15(6):1191–1253, 2003.
[6] B. B. Averbeck, P. E. Latham, and A. Pouget. Neural correlations, population coding and computation. Nature Reviews Neuroscience, 7(5):358–66, 2006.
[7] R. Quian Quiroga and S. Panzeri. Extracting information from neuronal populations: information theory and decoding approaches. Nat Rev Neurosci, 10(3):173–185, 2009.
[8] G. Miller. Note on the bias of information estimates. In Information Theory in Psychology II-B, chapter 95-100. Free Press, Glencole, IL, 1955.
[9] A. Treves and S. Panzeri. The upward bias in measures of information derived from limited data samples. Neural Computation, 7(2):399–407, 1995.
[10] S. Panzeri, R. Senatore, M. A. Montemurro, and R. S. Petersen. Correcting for the sampling bias problem in spike train information measures. J Neurophysiol, 98(3):1064–1072, 2007.
[11] Robin A A Ince, Alberto Mazzoni, Rasmus S Petersen, and Stefano Panzeri. Open source tools for the information theoretic analysis of neural data. Front Neurosci, 4, 2010.
[12] E. Ising. Beitrag zur Theorie des Ferromagnetismus. Z. Phys, 31:253, 1925.
[13] E. Schneidman, M. J. 2nd Berry, R. Segev, and W. Bialek. Weak pairwise correlations imply strongly correlated network states in a neural population. Nature, 440(7087):1007–12, 2006.
[14] J. Shlens, G. D. Field, J. L. Gauthier, M. I. Grivich, D. Petrusca, A. Sher, A. M. Litke, and E. J. Chichilnisky. The structure of multi-neuron firing patterns in primate retina. J Neurosci, 26(32):8254–66, 2006.
[15] I. E. Ohiorhenuan, F. Mechler, K. P. Purpura, A. M. Schmid, Q. Hu, and J. D. Victor. Sparse coding and high-order correlations in fine-scale cortical networks. Nature, 466(7306):617–621, 2010.
[16] G. Tkacik, E. Schneidman, M. J. Berry, II, and W. Bialek. Spin glass models for a network of real neurons. arXiv:q-bio/0611072v2, 2009.
[17] Y. Roudi, J. Tyrcha, and J. Hertz. Ising model for neural data: model quality and approximate methods for extracting functional connectivity. Phys Rev E Stat Nonlin Soft Matter Phys, 79(5 Pt 1):051915, May 2009.
[18] Y. Roudi, E. Aurell, and J. A. Hertz. Statistical physics of pairwise probability models. Front Comput Neurosci, 3:22, 2009.
[19] T. Mora, A. M. Walczak, W. Bialek, and C. G. Jr Callan. Maximum entropy models for antibody diversity. Proc Natl Acad Sci U S A, 107(12):5405–5410, 2010.
[20] A.W. Van der Vaart. Asymptotic statistics. Cambridge Univ Pr, 2000.
[21] J.H. Macke, P. Berens, A.S. Ecker, A.S. Tolias, and M. Bethge. Generating spike trains with specified correlation coefficients. Neural Computation, 21(2):397–423, 2009.
[22] J.H. Macke, M. Opper, and M. Bethge. Common input explains higher-order correlations and entropy in a simple model of neural population activity. Physical Review Letters, 106(20):208102, 2011.
[23] I. Nemenman, W. Bialek, and R.D.R. Van Steveninck. Entropy and information in neural spike trains: Progress on the sampling problem. Physical Review E, 69(5):056111, 2004.
[24] N.A. Ahmed and D. V. Gokhale. Entropy expressions and their estimators for multivariate distributions. Information Theory, IEEE Transactions on, 35(3):688–692, 1989.
[25] O. Oyman, R. U. Nabar, H. Bolcskei, and A. J. Paulraj. Characterizing the statistical properties of mutual information in MIMO channels: insights into diversity-multiplexing tradeoff. In Signals, Systems and Computers, 2002. Conference Record of the Thirty-Sixth Asilomar Conference on, volume 1, pages 521– 525. IEEE, 2002.
[26] N. Misra, H. Singh, and E. Demchuk. Estimation of the entropy of a multivariate normal distribution. Journal of multivariate analysis, 92(2):324–342, 2005.
[27] G. Marrelec and H. Benali. Large-sample asymptotic approximations for the sampling and posterior distributions of differential entropy for multivariate normal distributions. Entropy, 13(4):805–819, 2011.
[28] S. Srivastava and M.R. Gupta. Bayesian estimation of the entropy of the multivariate Gaussian. In Information Theory, 2008. ISIT 2008. IEEE International Symposium on, pages 1103–1107. IEEE, 2008.
[29] N.R. Goodman. The distribution of the determinant of a complex Wishart distributed matrix. The Annals of mathematical statistics, 34(1):178–180, 1963.
[30] M. Gupta and S. Srivastava. Parametric Bayesian estimation of differential entropy and relative entropy. Entropy, 12(4):818–843, 2010. 9