nips nips2012 nips2012-65 nips2012-65-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Kevin Swersky, Ilya Sutskever, Daniel Tarlow, Richard S. Zemel, Ruslan Salakhutdinov, Ryan P. Adams
Abstract: The Restricted Boltzmann Machine (RBM) is a popular density model that is also good for extracting features. A main source of tractability in RBM models is that, given an input, the posterior distribution over hidden variables is factorizable and can be easily computed and sampled from. Sparsity and competition in the hidden representation is beneficial, and while an RBM with competition among its hidden units would acquire some of the attractive properties of sparse coding, such constraints are typically not added, as the resulting posterior over the hidden units seemingly becomes intractable. In this paper we show that a dynamic programming algorithm can be used to implement exact sparsity in the RBM’s hidden units. We also show how to pass derivatives through the resulting posterior marginals, which makes it possible to fine-tune a pre-trained neural network with sparse hidden layers. 1
[1] P. Smolensky. Information processing in dynamical systems: foundations of harmony theory. In Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol. 1, pages 194–281. MIT Press, 1986.
[2] G.E. Hinton, S. Osindero, and Y.W. Teh. A fast learning algorithm for deep belief nets. Neural Computation, 18(7):1527–1554, 2006.
[3] H. Lee, R. Grosse, R. Ranganath, and A.Y. Ng. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In International Conference on Machine Learning, 2009. 8
[4] Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle. Greedy layer-wise training of deep networks. Advances in Neural Information Processing Systems, 2007.
[5] J. Snoek, R. P. Adams, and H. Larochelle. Nonparametric guidance of autoencoder representations using label information. Journal of Machine Learning Research, 13:2567–2588, 2012.
[6] J. Yang, K. Yu, Y. Gong, and T. Huang. Linear spatial pyramid matching using sparse coding for image classification. In Computer Vision and Pattern Recognition, 2009.
[7] I. Goodfellow, Q. Le, A. Saxe, H. Lee, and A.Y. Ng. Measuring invariances in deep networks. Advances in Neural Information Processing Systems, 2009.
[8] B.A. Olshausen and D.J. Field. Sparse coding with an overcomplete basis set: A strategy employed by V1? Vision Research, 37(23):3311–3325, 1997.
[9] I. Goodfellow, A. Courville, and Y. Bengio. Large-scale feature learning with spike-and-slab sparse coding. International Conference on Machine Learning, 2012.
[10] H. Lee, C. Ekanadham, and A. Ng. Sparse deep belief net model for visual area V2. Advances in Neural Information Processing Systems, 2007.
[11] R. Gupta, A. Diwan, and S. Sarawagi. Efficient inference with cardinality-based clique potentials. In International Conference on Machine Learning, 2007.
[12] D. Tarlow, I. Givoni, and R. Zemel. HOP-MAP: Efficient message passing for high order potentials. In Artificial Intelligence and Statistics, 2010.
[13] M. H. Gail, J. H. Lubin, and L. V. Rubinstein. Likelihood calculations for matched case-control studies and survival studies with tied death times. Biometrika, 68:703–707, 1981.
[14] J. Domke. Implicit differentiation by perturbation. Advances in Neural Information Processing Systems, 2010.
[15] G.E. Hinton. Training products of experts by minimizing contrastive divergence. Neural Computation, 14(8):1771–1800, 2002.
[16] V. Nair and G.E. Hinton. 3d object recognition with deep belief nets. Advances in Neural Information Processing Systems, 2009.
[17] R. E. Barlow and K. D. Heidtmann. Computing k-out-of-n system reliability. IEEE Transactions on Reliability, 33:322–323, 1984.
[18] D. Tarlow, K. Swersky, R. Zemel, R.P. Adams, and B. Frey. Fast exact inference for recursive cardinality models. In Uncertainty in Artificial Intelligence, 2012.
[19] L. Belfore. An O(n) log2(n) algorithm for computing the reliability of k-out-of-n:G and k-to-l-out-of-n:G systems. IEEE Transactions on Reliability, 44(1), 1995.
[20] T. Tieleman. Training restricted Boltzmann machines using approximations to the likelihood gradient. In International Conference on Machine Learning, 2008.
[21] H.J. Kappen. Deterministic learning rules for Boltzmann machines. Neural Networks, 8(4):537–548, 1995.
[22] H. Larochelle, Y. Bengio, and J. Turian. Tractable multivariate binary density estimation and the restricted Boltzmann forest. Neural Computation, 22(9):2285–2307, 2010.
[23] G.E. Hinton and R.R. Salakhutdinov. Reducing the dimensionality of data with neural networks. Science, 313(5786):504–507, 2006.
[24] K. Swersky, M. Ranzato, D. Buchman, B.M. Marlin, and N. de Freitas. On autoencoders and score matching for energy based models. In International Conference on Machine Learning, 2011.
[25] P. Vincent. A connection between score matching and denoising autoencoders. Neural Computation, 23(7):1661–1674, 2011.
[26] H. Larochelle, D. Erhan, A. Courville, J. Bergstra, and Y. Bengio. An empirical evaluation of deep architectures on problems with many factors of variation. In International Conference on Machine Learning, 2007.
[27] G.E. Hinton. A practical guide to training restricted Boltzmann machines. Technical Report UTML-TR 2010003, Department of Computer Science, University of Toronto, 2010.
[28] J. Bergstra and Y. Bengio. Random search for hyper-parameter optimization. The Journal of Machine Learning Research, 13:281–305, 2012.
[29] A. Krizhevsky. Learning multiple layers of features from tiny images. Master’s thesis, University of Toronto, 2009. 9