nips nips2013 nips2013-200 nips2013-200-reference knowledge-graph by maker-knowledge-mining

200 nips-2013-Multi-Prediction Deep Boltzmann Machines


Source: pdf

Author: Ian Goodfellow, Mehdi Mirza, Aaron Courville, Yoshua Bengio

Abstract: We introduce the multi-prediction deep Boltzmann machine (MP-DBM). The MPDBM can be seen as a single probabilistic model trained to maximize a variational approximation to the generalized pseudolikelihood, or as a family of recurrent nets that share parameters and approximately solve different inference problems. Prior methods of training DBMs either do not perform well on classification tasks or require an initial learning pass that trains the DBM greedily, one layer at a time. The MP-DBM does not require greedy layerwise pretraining, and outperforms the standard DBM at classification, classification with missing inputs, and mean field prediction tasks.1 1


reference text

[1] Arnold, L. and Ollivier, Y. (2012). Layer-wise learning of deep generative models. Technical report, arXiv:1212.1524.

[2] Bastien, F., Lamblin, P., Pascanu, R., Bergstra, J., Goodfellow, I. J., Bergeron, A., Bouchard, N., and Bengio, Y. (2012). Theano: new features and speed improvements. Deep Learning and Unsupervised Feature Learning NIPS 2012 Workshop.

[3] Bengio, Y., Thibodeau-Laufer, E., and Yosinski, J. (2013). Deep generative stochastic networks trainable by backprop. Technical Report arXiv:1306.1091, Universite de Montreal.

[4] Bergstra, J., Breuleux, O., Bastien, F., Lamblin, P., Pascanu, R., Desjardins, G., Turian, J., Warde-Farley, D., and Bengio, Y. (2010). Theano: a CPU and GPU math expression compiler. In Proceedings of the Python for Scientific Computing Conference (SciPy). Oral Presentation.

[5] Blackwell, D. (1947). Conditional Expectation and Unbiased Sequential Estimation. Ann.Math.Statist., 18, 105–110.

[6] Brakel, P., Stroobandt, D., and Schrauwen, B. (2013). Training energy-based models for time-series imputation. Journal of Machine Learning Research, 14, 2771–2797.

[7] Goodfellow, I. J., Warde-Farley, D., Lamblin, P., Dumoulin, V., Mirza, M., Pascanu, R., Bergstra, J., Bastien, F., and Bengio, Y. (2013a). Pylearn2: a machine learning research library. arXiv preprint arXiv:1308.4214.

[8] Goodfellow, I. J., Courville, A., and Bengio, Y. (2013b). Scaling up spike-and-slab models for unsupervised feature learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8), 1902–1914.

[9] Heckerman, D., Chickering, D. M., Meek, C., Rounthwaite, R., and Kadie, C. (2000). Dependency networks for inference, collaborative filtering, and data visualization. Journal of Machine Learning Research, 1, 49–75.

[10] Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., and Salakhutdinv, R. (2012). Improving neural networks by preventing co-adaptation of feature detectors. Technical report, arXiv:1207.0580.

[11] Kolmogorov, A. (1953). Unbiased Estimates:. American Mathematical Society translations. American Mathematical Society.

[12] Le Roux, N. and Bengio, Y. (2008). Representational power of restricted Boltzmann machines and deep belief networks. Neural Computation, 20(6), 1631–1649.

[13] LeCun, Y., Huang, F.-J., and Bottou, L. (????). Learning methods for generic object recognition with invariance to pose and lighting. In CVPR’2004, pages 97–104.

[14] LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324.

[15] Montavon, G. and M¨ ller, K.-R. (2012). Learning feature hierarchies with centered deep Boltzmann u machines. CoRR, abs/1203.4416.

[16] Neal, R. M. (2001). Annealed importance sampling. Statistics and Computing, 11(2), 125–139.

[17] Rao, C. R. (1973). Linear Statistical Inference and its Applications. J. Wiley and Sons, New York, 2nd edition.

[18] Salakhutdinov, R. and Hinton, G. (2009). Deep Boltzmann machines. In Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics (AISTATS 2009), volume 8.

[19] Srebro, N. and Shraibman, A. (2005). Rank, trace-norm and max-norm. In Proceedings of the 18th Annual Conference on Learning Theory, pages 545–560. Springer-Verlag.

[20] Stoyanov, V., Ropson, A., and Eisner, J. (2011). Empirical risk minimization of graphical model parameters given approximate inference, decoding, and model structure. In AISTATS’2011.

[21] Tieleman, T. (2008). Training restricted Boltzmann machines using approximations to the likelihood gradient. In ICML’2008, pages 1064–1071.

[22] Younes, L. (1999). On the convergence of Markovian stochastic algorithms with rapidly decreasing ergodicity rates. Stochastics and Stochastic Reports, 65(3), 177–228. 9