jmlr jmlr2011 jmlr2011-42 jmlr2011-42-reference knowledge-graph by maker-knowledge-mining

42 jmlr-2011-In All Likelihood, Deep Belief Is Not Enough

Source: pdf

Author: Lucas Theis, Sebastian Gerwinn, Fabian Sinz, Matthias Bethge

Abstract: Statistical models of natural images provide an important tool for researchers in the ﬁelds of machine learning and computational neuroscience. The canonical measure to quantitatively assess and compare the performance of statistical models is given by the likelihood. One class of statistical models which has recently gained increasing popularity and has been applied to a variety of complex data is formed by deep belief networks. Analyses of these models, however, have often been limited to qualitative analyses based on samples due to the computationally intractable nature of their likelihood. Motivated by these circumstances, the present article introduces a consistent estimator for the likelihood of deep belief networks which is computationally tractable and simple to apply in practice. Using this estimator, we quantitatively investigate a deep belief network for natural image patches and compare its performance to the performance of other models for natural image patches. We ﬁnd that the deep belief network is outperformed with respect to the likelihood even by very simple mixture models. Keywords: deep belief network, restricted Boltzmann machine, likelihood estimation, natural image statistics, potential log-likelihood

reference text

M. Bethge and R. Hosseini. Method and device for image compression. Patent WO/2009/146933, 2007. S. S. Chen and R. A. Gopinath. Gaussianization. Advances in Neural Information Processing Systems 13, 2001. P. Comon. Independent component analysis, a new concept? Signal processing, 36(3):287–314, 1994. J. Eichhorn, F. Sinz, and M. Bethge. Natural image coding in V1: How much use is orientation selectivity? PLoS Computational Biology, 5(4), 2009. D. J. Felleman and D. C. van Essen. Distributed hierarchical processing in the primate cerebral cortex. Cerebral Cortex, 1991. 3093 T HEIS , G ERWINN , S INZ AND B ETHGE K. Fukushima. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics, 1980. J. A. Guerrero-Colon, E. P. Simoncelli, and J. Portilla. Image denoising using mixtures of gaussian scale mixtures. Proceedings of the 15th IEEE International Conference on Image Processing, 2008. M. Gutmann and A. Hyv¨ rinen. Noise-contrastive estimation: A new estimation principle for una normalized statistical models. Proceedings of the 13th International Conference on Artiﬁcial Intelligence and Statistics, 2010. G. E. Hinton. Training products of experts by minimizing contrastive divergence. Neural Computation, 14(8):1771–1800, 2002. G. E. Hinton and R. Salakhutdinov. Reducing the dimensionality of data with neural networks. Science, 313:504–507, Jan 2006. G. E. Hinton, P. Dayan, B. Frey, and R. Neal. The ”wake-sleep” algorithm for unsupervised neural networks. Science, Jan 1995. G. E. Hinton, S. Osindero, and Y. Teh. A fast learning algorithm for deep belief nets. Neural Computation, 18(7):1527–1554, Jul 2006. R. Hosseini and M. Bethge. Hierachical models of natural images. Frontiers in Computational Neuroscience, 2009. Y. Karklin and M. S. Lewicki. Emergence of complex cell properties by learning to generalize in natural scenes. Nature, 2009. A. Kong, J. S. Liu, and W. H. Wong. Sequential imputations and bayesian missing data problems. Journal of the American Statistical Association, 89(425):278–288, 1994. Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel. Backpropagation applied to handwritten zip code recognition. Neural Computation, 1989. H. Lee and A. Ng. Sparse deep belief net model for visual area v2. Advances in Neural Information Processing Systems 19, 2007. P. Long and R. Servedio. Restricted boltzmann machines are hard to approximately evaluate or simulate. Proceedings of the 27th International Conference on Machine Learning, 2010. D. J. C. MacKay. Information Theory, Inference, and Learning Algorithms. Cambridge University Press, 2003. T. Minka. Divergence measures and message passing. Microsoft Research Technical Report (MSRTR-2005-173), 2005. A. Mohamed, G. Dahl, and G. E. Hinton. Deep belief networks for phone recognition. NIPS 22 workshop on deep learning for speech recognition, 2009. 3094 I N A LL L IKELIHOOD , D EEP B ELIEF I S N OT E NOUGH I. Murray and R. Salakhutdinov. Evaluating probabilities under high-dimensional latent variable models. Advances in Neural Information Processing Systems 21, 2009. R. M. Neal. Annealed importance sampling. Statistics and Computing, 11(2):125–139, Jan 2001. J. Ngiam, Z. Chen, P. Koh, and A. Y. Ng. Learning deep energy models. Proceedings of the 28th International Conference on Machine Learning, 2011. S. Osindero and G. E. Hinton. Modeling image patches with a directed hierarchy of markov random ﬁelds. Advances in Neural Information Processing Systems 20, 2008. M. Ranzato and G. E. Hinton. Modeling pixel means and covariances using factorized third-order boltzmann machines. IEEE Conference on Computer Vision and Pattern Recognition, pages 1–8, May 2010. M. Ranzato, A. Krizhevsky, and G. E. Hinton. Factored 3-way restricted boltzmann machines for modeling natural images. Proceedings of the 13th International Conference on Artiﬁcial Intelligence and Statistics, 2010a. M. A. Ranzato, V. Mnih, and G. E. Hinton. Generating more realistic images using gated mrfs. Advances in Neural Information Processing Systems 23, 2010b. M. A. Ranzato, J. Susskind, V. Mnih, and G. E. Hinton. On deep generative models with applications to recognition. IEEE Conference on Computer Vision and Pattern Recognition, 2011. N. Le Roux and Y. Bengio. Representational power of restricted boltzmann machines and deep belief networks. Neural Computation, 20(6):1631–1649, 2008. N. Le Roux, N. Heess, J. Shotton, and J. Winn. Learning a generative model of images by factoring appearance and shape. Microsoft Research Technical Report (MSR-TR-2010-7), 2010. R. Salakhutdinov. Learning Deep Generative Models. PhD thesis, Dept. of Computer Science, University of Toronto, Sep 2009. R. Salakhutdinov and I. Murray. On the quantitative analysis of deep belief networks. Proceedings of the 25th International Conference on Machine Learning, 25, Apr 2008. O. G. Selfridge. Pandemonium: A paradigm for learning. Mechanisation of thought processes: Proceedings of a symposium held at the National Physical Laboratory, pages 115–122, 1958. P. Smolensky. Information processing in dynamical systems: Foundations of harmony theory. Parallel Distributed Processing: Explorations in the Microstructure of Cognition, 1:194–281, Jan 1986. J. Sohl-Dickstein, P. Battaglino, and M. R. DeWeese. Minimum probability ﬂow learning. pre-print, 2009. arXiv:0906.4779. J. M. Susskind, G. E. Hinton, J. R. Movellan, and A. K. Anderson. Generating facial expressions with deep belief nets. Affective Computing, Emotion Modelling, Synthesis and Recognition, pages 421–440, 2008. 3095 T HEIS , G ERWINN , S INZ AND B ETHGE I. Sutskever and G. E. Hinton. Deep, narrow sigmoid belief networks are universal approximators. Neural Computation, 20(11):2629–2636, Nov 2008. G. W. Taylor, G. E. Hinton, and S. Roweis. Modeling human motion using binary latent variables. Advances in Neural Information Processing Systems 19, 2007. T. Tieleman. Training restricted boltzmann machines using approximations to the likelihood gradient. Proceedings of the 25th International Conference on Machine Learning, 2008. J. H. van Hateren and A. van der Schaaf. Independent component ﬁlters of natural images compared with simple cells in primary visual cortex. Proceedings of the Royal Society B: Biological Sciences, 265(1394), Mar 1998. M. Welling and G. E. Hinton. A new learning algorithm for mean ﬁeld boltzmann machines. International Joint Conference on Neural Networks, 2002. M. Welling, M. Rosen-Zvi, and G. E. Hinton. Exponential family harmoniums with an application to information retrieval. Advances in Neural Information Processing Systems 17, 2005. L. Younes. Parametric inference for imperfectly observed gibbsian ﬁelds. Probability Theory and Related Fields, 1989. 3096