nips nips2004 nips2004-185 nips2004-185-reference knowledge-graph by maker-knowledge-mining

185 nips-2004-The Convergence of Contrastive Divergences

Source: pdf

Author: Alan L. Yuille

Abstract: This paper analyses the Contrastive Divergence algorithm for learning statistical parameters. We relate the algorithm to the stochastic approximation literature. This enables us to specify conditions under which the algorithm is guaranteed to converge to the optimal solution (with probability 1). This includes necessary and sufﬁcient conditions for the solution to be unbiased.

reference text

[1]. G. Hinton. “Training Products of Experts by Minimizing Contrastive Divergence””. Neural Computation. 14, pp 1771-1800. 2002.

[2]. Y.W. Teh, M. Welling, S. Osindero and G.E. Hinton. “Energy-Based Models for Sparse Overcomplete Representations”. Journal of Machine Learning Research. To appear. 2003.

[3]. D. MacKay. “Failures of the one-step learning algorithm”. Available electronically at http://www.inference.phy.cam.ac.uk/mackay/abstracts/gbm.html. 2001.

[4]. C.K.I. Williams and F.V. Agakov. “An Analysis of Contrastive Divergence Learning in Gaussian Boltzmann Machines”. Technical Report EDI-INF-RR-0120. Institute for Adaptive and Neural Computation. University of Edinburgh. 2002.

[5]. H. Robbins and S. Monro. “A Stochastic Approximation Method”. Annals of Mathematical Sciences. Vol. 22, pp 400-407. 1951.

[6]. H.J. Kushner and D.S. Clark. Stochastic Approximation for Constrained and Unconstrained Systems. New York. Springer-Verlag. 1978.

[7]. L. Younes. “On the Convergence of Markovian Stochastic Algorithms with Rapidly Decreasing Ergodicity rates.” Stochastics and Stochastic Reports, 65, 177-228. 1999.

[8]. S.C. Zhu and X. Liu. “Learning in Gibbsian Fields: How Accurate and How Fast Can It Be?”. IEEE Trans. Pattern Analysis and Machine Intelligence. Vol. 24, No. 7, July 2002.

[9]. H.J. Kushner. “Asymptotic Global Behaviour for Stochastic Approximation and Diffusions with Slowly Decreasing Noise Effects: Global Minimization via Monte Carlo”. SIAM J. Appl. Math. 47:169-185. 1987.

[10]. G.B. Orr and T.K. Leen. “Weight Space Probability Densities on Stochastic Learning: II Transients and Basin Hopping Times”. Advances in Neural Information Processing Systems, 5. Eds. Giles, Hanson, and Cowan. Morgan Kaufmann, San Mateo, CA. 1993.

[11]. G.R. Grimmett and D. Stirzaker. Probability and Random Processes. Oxford University Press. 2001.

[12]. B. Van Roy. Course notes. Prof. (www.stanford.edu/class/msande339/notes/lecture6.ps). B. Van Roy. Stanford.

[13]. P. Bremaud. Markov Chains: Gibbs Fields, Monte Carlo Simulation, and Queues. Springer. New York. 1999.