nips nips2013 nips2013-160 nips2013-160-reference knowledge-graph by maker-knowledge-mining

160 nips-2013-Learning Stochastic Feedforward Neural Networks

Source: pdf

Author: Yichuan Tang, Ruslan Salakhutdinov

Abstract: Multilayer perceptrons (MLPs) or neural networks are popular models used for nonlinear regression and classiﬁcation tasks. As regressors, MLPs model the conditional distribution of the predictor variables Y given the input variables X. However, this predictive distribution is assumed to be unimodal (e.g. Gaussian). For tasks involving structured prediction, the conditional distribution should be multi-modal, resulting in one-to-many mappings. By using stochastic hidden variables rather than deterministic ones, Sigmoid Belief Nets (SBNs) can induce a rich multimodal distribution in the output space. However, previously proposed learning algorithms for SBNs are not efﬁcient and unsuitable for modeling real-valued data. In this paper, we propose a stochastic feedforward network with hidden layers composed of both deterministic and stochastic variables. A new Generalized EM training procedure using importance sampling allows us to efﬁciently learn complicated conditional distributions. Our model achieves superior performance on synthetic and facial expressions datasets compared to conditional Restricted Boltzmann Machines and Mixture Density Networks. In addition, the latent features of our model improves classiﬁcation and can learn to generate colorful textures of objects. 1

reference text

[1] C. M. Bishop. Mixture density networks. Technical Report NCRG/94/004, Aston University, 1994.

[2] R. M. Neal. Connectionist learning of belief networks. volume 56, pages 71–113, July 1992.

[3] R. M. Neal. Learning stochastic feedforward networks. Technical report, University of Toronto, 1990.

[4] Lawrence K. Saul, Tommi Jaakkola, and Michael I. Jordan. Mean ﬁeld theory for sigmoid belief networks. Journal of Artiﬁcial Intelligence Research, 4:61–76, 1996.

[5] David Barber and Peter Sollich. Gaussian ﬁelds for approximate inference in layered sigmoid belief networks. In Sara A. Solla, Todd K. Leen, and Klaus-Robert M¨ ller, editors, NIPS, pages 393–399. The u MIT Press, 1999.

[6] G. Taylor, G. E. Hinton, and S. Roweis. Modeling human motion using binary latent variables. In NIPS, 2006.

[7] Carl Edward Rasmussen. Gaussian processes for machine learning. MIT Press, 2006.

[8] H. Rue and L. Held. Gaussian Markov Random Fields: Theory and Applications, volume 104 of Monographs on Statistics and Applied Probability. Chapman & Hall, London, 2005.

[9] John Lafferty. Conditional random ﬁelds: Probabilistic models for segmenting and labeling sequence data. pages 282–289. Morgan Kaufmann, 2001.

[10] Volodymyr Mnih, Hugo Larochelle, and Geoffrey Hinton. Conditional restricted boltzmann machines for structured output prediction. In Proceedings of the International Conference on Uncertainty in Artiﬁcial Intelligence, 2011.

[11] Yujia Li, Daniel Tarlow, and Richard Zemel. Exploring compositional high order pattern potentials for structured output learning. In Proceedings of International Conference on Computer Vision and Pattern Recognition, 2013.

[12] R. M. Neal and G. E. Hinton. A new view of the EM algorithm that justiﬁes incremental, sparse and other variants. In M. I. Jordan, editor, Learning in Graphical Models, pages 355–368. 1998.

[13] G. E. Hinton. Training products of experts by minimizing contrastive divergence. Neural Computation, 14:1771–1800, 2002.

[14] R. M. Neal. Annealed importance sampling. Statistics and Computing, 11:125–139, 2001.

[15] R. Salakhutdinov and I. Murray. On the quantitative analysis of deep belief networks. In Proceedings of the Intl. Conf. on Machine Learning, volume 25, 2008.

[16] J.M. Susskind. The Toronto Face Database. Technical report, 2011. http://aclab.ca/users/josh/TFD.html.

[17] Zoubin Ghahramani and G. E. Hinton. The EM algorithm for mixtures of factor analyzers. Technical Report CRG-TR-96-1, University of Toronto, 1996.

[18] Ian Nabney. NETLAB: algorithms for pattern recognitions. Advances in pattern recognition. SpringerVerlag, 2002.

[19] V. Nair and G. E. Hinton. 3-D object recognition with deep belief nets. In NIPS 22, 2009.

[20] J. M. Geusebroek, G. J. Burghouts, and A. W. M. Smeulders. The amsterdam library of object images. International Journal of Computer Vision, 61(1), January 2005.

[21] Eran Borenstein and Shimon Ullman. Class-speciﬁc, top-down segmentation. In In ECCV, pages 109– 124, 2002.

[22] G. E. Hinton, P. Dayan, B. J. Frey, and R. M. Neal. The wake-sleep algorithm for unsupervised neural networks. Science, 268(5214):1158–1161, 1995.

[23] R. Salakhutdinov and H. Larochelle. Efﬁcient learning of deep boltzmann machines. AISTATS, 2010. 9