nips nips2013 nips2013-173 nips2013-173-reference knowledge-graph by maker-knowledge-mining

173 nips-2013-Least Informative Dimensions

Source: pdf

Author: Fabian Sinz, Anna Stockl, January Grewe, January Benda

Abstract: We present a novel non-parametric method for ﬁnding a subspace of stimulus features that contains all information about the response of a system. Our method generalizes similar approaches to this problem such as spike triggered average, spike triggered covariance, or maximally informative dimensions. Instead of maximizing the mutual information between features and responses directly, we use integral probability metrics in kernel Hilbert spaces to minimize the information between uninformative features and the combination of informative features and responses. Since estimators of these metrics access the data via kernels, are easy to compute, and exhibit good theoretical convergence properties, our method can easily be generalized to populations of neurons or spike patterns. By using a particular expansion of the mutual information, we can show that the informative features must contain all information if we can make the uninformative features independent of the rest. 1

reference text

[1] F. R. Bach and M. I. Jordan. Predictive low-rank decomposition for kernel methods. In Proceedings of the 22nd international conference on Machine learning - ICML ’05, pages 33–40, New York, New York, USA, 2005. ACM Press.

[2] E. D. Boer and P. Kuyper. Triggered Correlation, 1968. 8

[3] N. Brenner, W. Bialek, and R. De Ruyter Van Steveninck. Adaptive rescaling maximizes information transmission. Neuron, 26(3):695–702, 2000.

[4] E. J. Chichilnisky. A simple white noise analysis of neuronal light responses. Network: Comput. Neural Syst, 12:199–213, 2001.

[5] K. Fukumizu, F. R. Bach, and M. I. Jordan. Dimensionality Reduction for Supervised Learning with Reproducing Kernel Hilbert Spaces. Journal of Machine Learning Research, 5(1):73–99, 2004.

[6] K. Fukumizu, F. R. Bach, and M. I. Jordan. Kernel dimension reduction in regression. Annals of Statistics, 37(4):1871–1905, 2009.

[7] A. Gretton, K. M. Borgwardt, M. Rasch, B. Sch¨ lkopf, and A. Smola. A kernel method for the two sample o problem. In B. Sch¨ lkopf, J. Platt, and T. Hoffman, editors, Advances in Neural Information Processing o Systems 19, pages 513—-520, Cambridge, MA, 2007. MIT Press.

[8] A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Sch¨ lkopf, and A. Smola. A Kernel Two-Sample Test. o Journal of Machine Learning Research, 13:723–773, 2012.

[9] A. Gretton, O. Bousquet, A. Smola, and B. Sch¨ lkopf. Measuring Statistical Dependence with Hilberto Schmidt Norms. In S. Jain, H. U. Simon, and E. Tomita, editors, Advances in Neural Information Processing Systems, pages 63–77. Springer Berlin / Heidelberg, 2005.

[10] A. Gretton, K. Fukumizu, Z. Harchaoui, and B. K. Sriperumbudur. A Fast, Consistent Kernel Two-Sample Test. In Y Bengio, D Schuurmans, J Lafferty, C K I Williams, and A Culotta, editors, Advances in Neural Information Processing Systems, pages 673–681. Curran, Red Hook, NY, USA, 2009.

[11] J. D. Hunter. Matplotlib: A 2D graphics environment. Computing In Science & Engineering, 9(3):90–95, 2007.

[12] J. Macke, G. Zeck, and M. Bethge. Receptive Fields without Spike-Triggering. Advances in Neural Information Processing Systems 20, pages 1–8, 2007.

[13] J. H. Manton. Optimization algorithms exploiting unitary constraints. Signal Processing, IEEE Transactions on, 50(3):635–650, 2002.

[14] P. Z. Marmarelis and K. Naka. White-noise analysis of a neuron chain: an application of the Wiener theory. Science, 175(27):1276–1278, 1972.

[15] P McCullagh and J A Nelder. Generalized Linear Models, Second Edition. Chapman and Hall, 1989.

[16] T. P. Minka. Old and New Matrix Algebra Useful for Statistics. MIT Media Lab Note, pages 1–19, 2000.

[17] A. M¨ ller. Integral Probability Metrics and Their Generating Classes of Functions. Advances in Applied u Probability, 29(2):429–443, 1997.

[18] L. Paninski. Maximum likelihood estimation of cascade point-process neural encoding models. Network: Computation in Neural Systems, 15(4):243–262, 2004.

[19] J. W. Pillow and E. P. Simoncelli. Dimensionality reduction in neural models: an information-theoretic generalization of spike-triggered average and covariance analysis. Journal of Vision, 6(4):414–428, 2006.

[20] H. Scheich, T. H. Bullock, and R. H Hamstra. Coding properties of two classes of afferent nerve ﬁbers: high-frequency electroreceptors in the electric ﬁsh, Eigenmannia. Journal of Neurophysiology, 36(1):39– 60, 1973.

[21] B. Sch¨ lkopf and A. J. Smola. Learning with Kernels: Support Vector Machines, Regularization, Optio mization, and Beyond, volume 98 of Adaptive computation and machine learning. MIT Press, 2001.

[22] T. Sharpee, N. C. Rust, and W. Bialek. Analyzing neural responses to natural signals: maximally informative dimensions. Neural Computation, 16(2):223–250, 2004.

[23] A. Smola, A. Gretton, L. Song, and B. Sch¨ lkopf. A Hilbert Space Embedding for Distribuo tions. In Algorithmic Learning Theory: 18th International Conference, pages 13–31. Springer-Verlag, Berlin/Heidelberg, 2007.

[24] L. Song, A. Smola, A. Gretton, J. Bedo, and K. Borgwardt. Feature selection via dependence maximization. Journal of Machine Learning Research, 13(May):1393–1434, 2012.

[25] B. K. Sriperumbudur, K. Fukumizu, A. Gretton, and G. R. G. Lanckriet. On Integral Probability Metrics, phi-divergences and binary classiﬁcation. Technical Report 1, arXiv, 2009.

[26] B. K. Sriperumbudur, A. Gretton, K. Fukumizu, G. Lanckriet, and B. Sch¨ lkopf. Injective Hilbert Space o Embeddings of Probability Measures. In Proceedings of the 21st Annual Conference on Learning Theory, number i, pages 111–122. Omnipress, 2008.

[27] B. K. Sriperumbudur, A. Gretton, K. Fukumizu, B. Sch¨ lkopf, and G.R. G. Lanckriet. Hilbert Space o Embeddings and Metrics on Probability Measures. Journal of Machine Learning Research, 11(1):48, 2010.

[28] R. S. Williamson, M. Sahani, and J. W. Pillow. Equating information-theoretic and likelihood-based methods for neural dimensionality reduction. Technical Report 1, arXiv, 2013. 9