nips nips2013 nips2013-331 nips2013-331-reference knowledge-graph by maker-knowledge-mining

331 nips-2013-Top-Down Regularization of Deep Belief Networks

Source: pdf

Author: Hanlin Goh, Nicolas Thome, Matthieu Cord, Joo-Hwee Lim

Abstract: Designing a principled and effective algorithm for learning deep architectures is a challenging problem. The current approach involves two training phases: a fully unsupervised learning followed by a strongly discriminative optimization. We suggest a deep learning strategy that bridges the gap between the two phases, resulting in a three-phase learning procedure. We propose to implement the scheme using a method to regularize deep belief networks with top-down information. The network is constructed from building blocks of restricted Boltzmann machines learned by combining bottom-up and top-down sampled signals. A global optimization procedure that merges samples from a forward bottom-up pass and a top-down pass is used. Experiments on the MNIST dataset show improvements over the existing algorithms for deep belief networks. Object recognition results on the Caltech-101 dataset also yield competitive results. 1

reference text

[1] G. E. Hinton, S. Osindero, and Y.-W. Teh, “A fast learning algorithm for deep belief networks,” Neural Computation, vol. 18, no. 7, pp. 1527–1554, 2006.

[2] Y. LeCun, “Une proc´ dure d’apprentissage pour r´ seau a seuil asymmetrique (a learning scheme for e e asymmetric threshold networks),” in Cognitiva 85, 1985.

[3] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by back-propagating errors,” Nature, vol. 323, pp. 533 – 536, October 1986.

[4] Y. Bengio, “Learning deep architectures for AI,” Foundations and Trends in Machine Learning, vol. 2, no. 1, pp. 1–127, 2009.

[5] Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle, “Greedy layer-wise training of deep networks,” in NIPS, 2006.

[6] M. Ranzato, C. Poultney, S. Chopra, and Y. LeCun, “Efﬁcient learning of sparse representations with an energy-based model,” in NIPS, 2006.

[7] P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Manzagol, “Extracting and composing robust features with denoising autoencoders,” in ICML, 2008.

[8] P. Smolensky, “Information processing in dynamical systems: Foundations of harmony theory,” in Parallel Distributed Processing: Volume 1: Foundations, ch. 6, pp. 194–281, MIT Press, 1986.

[9] G. E. Hinton, “Training products of experts by minimizing contrastive divergence,” Neural Computation, vol. 14, no. 8, p. 1771–1800, 2002.

[10] H. Lee, C. Ekanadham, and A. Ng, “Sparse deep belief net model for visual area V2,” in NIPS, 2008.

[11] H. Goh, N. Thome, and M. Cord, “Biasing restricted Boltzmann machines to manipulate latent selectivity and sparsity,” in NIPS Workshop, 2010.

[12] H. Larochelle and Y. Bengio, “Classiﬁcation using discriminative restricted Boltzmann machines,” in ICML, 2008.

[13] N. Le Roux and Y. Bengio, “Representational power of restricted Boltzmann machines and deep belief networks,” Neural Computation, vol. 20, pp. 1631–1649, June 2008.

[14] I. Sutskever and G. E. Hinton, “Learning multilevel distributed representations for high-dimensional sequences,” in AISTATS, 2007.

[15] G. E. Hinton and R. Salakhutdinov, “Reducing the dimensionality of data with neural networks,” Science, vol. 28, pp. 504–507, 2006.

[16] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, pp. 2278–2324, November 1998.

[17] L. Fei-Fei, R. Fergus, and P. Perona, “Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories,” CVPR Workshop, 2004.

[18] R. Salakhutdinov and G. E. Hinton, “Learning a nonlinear embedding by preserving class neighbourhood structure,” in AISTATS, 2007.

[19] L. Deng and D. Yu, “Deep convex net: A scalable architecture for speech pattern classiﬁcation,” in Interspeech, 2011.

[20] D. C. Ciresan, U. Meier, and J. Schmidhuber, “Multi-column deep neural networks for image classiﬁca¸ tion,” in CVPR, 2012.

[21] D. Lowe, “Object recognition from local scale-invariant features,” in CVPR, 1999.

[22] Y. Boureau, F. Bach, Y. LeCun, and J. Ponce, “Learning mid-level features for recognition,” in CVPR, 2010.

[23] H. Goh, N. Thome, M. Cord, and J.-H. Lim, “Unsupervised and supervised visual codes with restricted Boltzmann machines,” in ECCV, 2012.

[24] S. Lazebnik, C. Schmid, and J. Ponce, “Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories,” in CVPR, 2006.

[25] S. Avila, N. Thome, M. Cord, E. Valle, and A. Ara´ jo, “Pooling in image representation: the visual u codeword point of view,” Computer Vision and Image Understanding, pp. 453–465, May 2013.

[26] C. Theriault, N. Thome, and M. Cord, “Extended coding and pooling in the HMAX model,” IEEE Transaction on Image Processing, 2013.

[27] K. Sohn, D. Y. Jung, H. Lee, and A. Hero III, “Efﬁcient learning of sparse, distributed, convolutional feature representations for object recognition,” in ICCV, 2011.

[28] K. Sohn, G. Zhou, C. Lee, and H. Lee, “Learning and selecting features jointly with point-wise gated boltzmann machines,” in ICML, 2013. 9