nips nips2009 nips2009-132 nips2009-132-reference knowledge-graph by maker-knowledge-mining

132 nips-2009-Learning in Markov Random Fields using Tempered Transitions


Source: pdf

Author: Ruslan Salakhutdinov

Abstract: Markov random fields (MRF’s), or undirected graphical models, provide a powerful framework for modeling complex dependencies among random variables. Maximum likelihood learning in MRF’s is hard due to the presence of the global normalizing constant. In this paper we consider a class of stochastic approximation algorithms of the Robbins-Monro type that use Markov chain Monte Carlo to do approximate maximum likelihood learning. We show that using MCMC operators based on tempered transitions enables the stochastic approximation algorithm to better explore highly multimodal distributions, which considerably improves parameter estimates in large, densely-connected MRF’s. Our results on MNIST and NORB datasets demonstrate that we can successfully learn good generative models of high-dimensional, richly structured data that perform well on digit and object recognition tasks.


reference text

[1] J. Besag. Efficiency of pseudolikelihood estimation for simple Gaussian fields. Biometrica, 64:616–618, 1977.

[2] G. Desjardins, A. Courville, Y. Bengio, P. Vincent, and O. Delalleau. Tempered Markov chain Monte Carlo for training of restricted Boltzmann machines. Technical Report 1345, University of Montreal, 2009.

[3] C. Geyer. Markov chain Monte Carlo maximum likelihood. In Computing Science and Statistics, pages 156–163, 1991.

[4] G. Hinton. Training products of experts by minimizing contrastive divergence. Neural Computation, 14(8):1711–1800, 2002.

[5] A. Kulesza and F. Pereira. Structured learning with approximate inference. In NIPS, 2007.

[6] Y. LeCun, F. J. Huang, and L. Bottou. Learning methods for generic object recognition with invariance to pose and lighting. In CVPR (2), pages 97–104, 2004.

[7] E. Marinari and G. Parisi. Simulated tempering: A new Monte Carlo scheme. Europhysics Letters, 19:451–458, 1992. 8

[8] V. Nair and G. Hinton. Implicit mixtures of restricted Boltzmann machines. In Advances in Neural Information Processing Systems, volume 21, 2009.

[9] R. Neal. Sampling from multimodal distributions using tempered transitions. Statistics and Computing, 6:353–366, 1996.

[10] R. Neal. Annealed importance sampling. Statistics and Computing, 11:125–139, 2001.

[11] P. Pletscher, C. Ong, and J. Buhmann. Spanning tree approximations for conditional random fields. In Proceedings of the International Conference on Artificial Intelligence and Statistics, volume 5, 2009.

[12] H. Robbins and S. Monro. A stochastic approximation method. Ann. Math. Stat., 22:400–407, 1951.

[13] R. Salakhutdinov. Learning and evaluating Boltzmann machines. Technical Report UTML TR 2008-002, Department of Computer Science, University of Toronto, 2008.

[14] R. Salakhutdinov and G. Hinton. Deep Boltzmann machines. In Proceedings of the International Conference on Artificial Intelligence and Statistics, volume 5, pages 448–455, 2009.

[15] T. Tieleman. Training restricted Boltzmann machines using approximations to the likelihood gradient. In Machine Learning, Proceedings of the Twenty-first International Conference (ICML 2008). ACM, 2008.

[16] M. Wainwright, T. Jaakkola, and A. Willsky. Tree-reweighted belief propagation algorithms and approximate ML estimation by pseudo-moment matching. In AI and Statistics, volume 9, 2003.

[17] M. Welling and C. Sutton. Learning in Markov random fields with Contrastive Free Energies. In Proceedings of the International Conference on Artificial Intelligence and Statistics, volume 10, 2005.

[18] J. S. Yedidia, W. T. Freeman, and Y. Weiss. Constructing free-energy approximations and generalized belief propagation algorithms. IEEE Transactions on Information Theory, 51(7):2282–2312, 2005.

[19] L. Younes. Estimation and annealing for Gibbsian fields. Ann. Inst. Henri Poincar´ (B), e 24(2):269–294, 1988.

[20] S. Zhu and X. Liu. Learning in Gibbsian fields: How accurate and how fast can it be? In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR00), pages 2–9. IEEE, 2000. 9