nips nips2013 nips2013-127 nips2013-127-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Yoshua Bengio, Li Yao, Guillaume Alain, Pascal Vincent
Abstract: Recent work has shown how denoising and contractive autoencoders implicitly capture the structure of the data-generating density, in the case where the corruption noise is Gaussian, the reconstruction error is the squared error, and the data is continuous-valued. This has led to various proposals for sampling from this implicitly learned density function, using Langevin and Metropolis-Hastings MCMC. However, it remained unclear how to connect the training procedure of regularized auto-encoders to the implicit estimation of the underlying datagenerating distribution when the data are discrete, or using other forms of corruption process and reconstruction errors. Another issue is the mathematical justification which is only valid in the limit of small corruption noise. We propose here a different attack on the problem, which deals with all these issues: arbitrary (but noisy enough) corruption, arbitrary reconstruction loss (seen as a log-likelihood), handling both discrete and continuous-valued variables, and removing the bias due to non-infinitesimal corruption noise (or non-infinitesimal contractive penalty). 1
Alain, G. and Bengio, Y. (2013). What regularized auto-encoders learn from the data generating distribution. In International Conference on Learning Representations (ICLR’2013). Bengio, Y. and Yao, L. (2013). Bounding the test log-likelihood of generative models. Technical report, U. Montreal, arXiv. Bengio, Y., Larochelle, H., and Vincent, P. (2006a). Non-local manifold Parzen windows. In NIPS’05, pages 115–122. MIT Press. Bengio, Y., Monperrus, M., and Larochelle, H. (2006b). Nonlocal estimation of manifold structure. Neural Computation, 18(10). Bengio, Y., Thibodeau-Laufer, E., and Yosinski, J. (2013a). Deep generative stochastic networks trainable by backprop. Technical Report arXiv:1306.1091, Universite de Montreal. Bengio, Y., Courville, A., and Vincent, P. (2013b). Unsupervised feature learning and deep learning: A review and new perspectives. IEEE Trans. Pattern Analysis and Machine Intelligence (PAMI). Bergstra, J., Breuleux, O., Bastien, F., Lamblin, P., Pascanu, R., Desjardins, G., Turian, J., WardeFarley, D., and Bengio, Y. (2010). Theano: a CPU and GPU math expression compiler. In Proceedings of the Python for Scientific Computing Conference (SciPy). Cho, K., Raiko, T., and Ilin, A. (2013). Enhanced gradient for training restricted boltzmann machines. Neural computation, 25(3), 805–831. Heckerman, D., Chickering, D. M., Meek, C., Rounthwaite, R., and Kadie, C. (2000). Dependency networks for inference, collaborative filtering, and data visualization. Journal of Machine Learning Research, 1, 49–75. Hinton, G. E. (1999). Products of experts. In ICANN’1999. Hinton, G. E., Osindero, S., and Teh, Y. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18, 1527–1554. Hyv¨ rinen, A. (2005). Estimation of non-normalized statistical models using score matching. Joura nal of Machine Learning Research, 6, 695–709. Kingma, D. and LeCun, Y. (2010). Regularized estimation of image statistics by score matching. In J. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. Zemel, and A. Culotta, editors, Advances in Neural Information Processing Systems 23, pages 1126–1134. Ranzato, M., Boureau, Y.-L., and LeCun, Y. (2008). Sparse feature learning for deep belief networks. In NIPS’07, pages 1185–1192, Cambridge, MA. MIT Press. Rifai, S., Vincent, P., Muller, X., Glorot, X., and Bengio, Y. (2011). Contractive auto-encoders: Explicit invariance during feature extraction. In ICML’2011. Rifai, S., Bengio, Y., Dauphin, Y., and Vincent, P. (2012). A generative process for sampling contractive auto-encoders. In ICML’2012. Swersky, K., Ranzato, M., Buchman, D., Marlin, B., and de Freitas, N. (2011). On autoencoders and score matching for energy based models. In ICML’2011. ACM. Vincent, P. (2011). A connection between score matching and denoising autoencoders. Neural Computation, 23(7). 9