nips nips2003 nips2003-196 nips2003-196-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Max Welling, Andriy Mnih, Geoffrey E. Hinton
Abstract: In models that define probabilities via energies, maximum likelihood learning typically involves using Markov Chain Monte Carlo to sample from the model’s distribution. If the Markov chain is started at the data distribution, learning often works well even if the chain is only run for a few time steps [3]. But if the data distribution contains modes separated by regions of very low density, brief MCMC will not ensure that different modes have the correct relative energies because it cannot move particles from one mode to another. We show how to improve brief MCMC by allowing long-range moves that are suggested by the data distribution. If the model is approximately correct, these long-range moves have a reasonable acceptance rate.
[1] S. Becker and Y. LeCun. Improving the convergence of back-propagation learning with sec ond-order methods. In D. Touretzky, G. Hinton, and T. Sejnowski, editors, Proc. of the 1988 Connectionist Models Summer School, pages 29–37, San Mateo, 1989. Morgan Kaufman.
[2] Y. Bengio, R. Ducharme, and P. Vincent. A neural probabilistic language model. In Advances in Neural Information Processing Systems, 2001, 2001.
[3] G.E. Hinton. Training products of experts by minimizing contrastive divergence. Neural Computation, 14:1771–1800, 2002.
[4] C. Jarzynski. Targeted free energy perturbation. Technical Report LAUR-01-2157, Los Alamos National Laboratory, 2001.
[5] R.M. Neal. Probabilistic inference using markov chain monte carlo methods. Technical Report CRG-TR-93-1, University of Toronto, Computer Science, 1993.
[6] C. Sminchisescu, M.Welling, and G. Hinton. Generalized darting monte carlo. Technical report, University of Toronto, 2003. Technical Report CSRG-478.
[7] H. Tjelemeland and B.K. Hegstad. Mode jumping proposals in mcmc. Technical report, Norwegian University of Science and Technology, Trondheim, Norway, 1999. Rep. No. Statistics no.1/1999.
[8] A. Voter. A monte carlo method for determining free-energy differences and transition state theory rate constants. 82(4), 1985. 3 However, note that in cases where the modes are well separated, even Markov chains that run for an extraordinarily long time will not mix properly between those modes, and the results of this paper become relevant.