nips nips2012 nips2012-82 nips2012-82-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Yichuan Zhang, Zoubin Ghahramani, Amos J. Storkey, Charles A. Sutton
Abstract: Continuous relaxations play an important role in discrete optimization, but have not seen much use in approximate probabilistic inference. Here we show that a general form of the Gaussian Integral Trick makes it possible to transform a wide class of discrete variable undirected models into fully continuous systems. The continuous representation allows the use of gradient-based Hamiltonian Monte Carlo for inference, results in new ways of estimating normalization constants (partition functions), and in general opens up a number of new avenues for inference in difficult discrete systems. We demonstrate some of these continuous relaxation inference algorithms on a number of illustrative problems. 1
[1] D. J. Amit. Modeling Brain Function. Cambridge University Press, 1989.
[2] A. Coolen, S. Laughton, and D. Sherrington. Modern analytic techniques to solve the dynamics of recurrent neural networks. In Advances in Neural Information Processing Systems 8 (NIPS95), 1996.
[3] S. Ermon, C. P. Gomes, A. Sabharwal, and B. Selman. Accelerated adaptive Markov chain for partition function computation. In J. Shawe-Taylor, R. Zemel, P. Bartlett, F. Pereira, and K. Weinberger, editors, Advances in Neural Information Processing Systems 24, pages 2744–2752. 2011.
[4] J. Finkel, T. Grenager, and C. D. Manning. Incorporating non-local information into information extraction systems by Gibbs sampling. In Annual Meeting of the Association for Computational Linguistics (ACL), 2005.
[5] D. Frietag and A. McCallum. Information extraction with HMMs and shrinkage. In AAAI Workshop on Machine Learning for Information Extraction, 1999.
[6] M. Girolami, B. Calderhead, and S. A. Chin. Riemannian manifold Hamiltonian Monte Carlo. Journal of the Royal Statistical Society, B, 73(2):1–37, 2011.
[7] R. J. Glauber. Time-dependent statistics of the Ising model. J. Math. Phys., 4:294–307, 1963.
[8] J. Hertz, A. Krogh, and R. G. Palmer. Introduction to the Theory of Neural Computation. Perseus Books, 1991.
[9] G. E. Hinton and R. R. Salakhutdinov. Reducing the dimensionality of data with neural networks. Science, 313:504–507, 2006.
[10] J. Hubbard. Calculation of partition functions. Phys. Rev. Lett., 3:77–78, Jul 1959. doi: 10.1103/ PhysRevLett.3.77. URL http://link.aps.org/doi/10.1103/PhysRevLett.3.77.
[11] M. Johnson, T. Griffiths, and S. Goldwater. Bayesian inference for PCFGs via Markov chain Monte Carlo. In HLT/NAACL, 2007.
[12] J. Martens and I. Sutskever. Parallelizable sampling of Markov random fields. In Conference on Artificial Intelligence and Statistics (AISTATS), 2010.
[13] R. V. Mendes and J. T. Duarte. Vector fields and neural networks. Complex Systems, 6:21–30, 1992.
[14] J. Møller, A. N. Pettitt, R. Reeves, and K. K. Berthelsen. An efficient Markov chain Monte Carlo method for distributions with intractable normalising constants. Biometrika, 93(2):pp. 451–458, 2006.
[15] I. Murray and R. Salakhutdinov. Evaluating probabilities under high-dimensional latent variable models. In D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, editors, Advances in Neural Information Processing Systems 21, pages 1137–1144. 2009.
[16] I. Murray, R. P. Adams, and D. J. MacKay. Elliptical slice sampling. JMLR: W&CP;, 9:541–548, 2010.
[17] R. Neal. Bayesian Learning for Neural Networks. PhD thesis, Computer Science, University of Toronto, 1995.
[18] R. M. Neal. MCMC using Hamiltonian dynamics. In S. Brooks, A. Gelman, G. Jones, and X.-L. Meng, editors, Handbook of Markov Chain Monte Carlo. Chapman & Hall / CRC Press, 2010.
[19] S. Nowozin and C. H. Lampert. Structured prediction and learning in computer vision. Foundations and Trends in Computer Graphics and Vision, 6(3-4), 2011.
[20] Y. Qi and T. P. Minka. Hessian-based Markov chain Monte-Carlo algorithms. In First Cape Cod Workshop on Monte Carlo Methods, September 2002.
[21] U. Ramacher. The Hamiltonian approach to neural networks dynamics. In Proc. IEEE JCNN, volume 3, 1991.
[22] H. S. Seung, T. J. Richardson, J. C. Lagarias, and J. J. Hopfield. Minimax and Hamiltonian dynamics of excitatory-inhibitory networks. In Advances in Neural Information Processing Systems 10 (NIPS97), 1998.
[23] C. Sutton and A. McCallum. Collective segmentation and labeling of distant entities in information extraction. In ICML Workshop on Statistical Relational Learning and Its Connections to Other Fields, 2004.
[24] C. Sutton and A. McCallum. An Introduction to Conditional Random Fields. Foundations and Trends in Machine Learning, 4(4), 2012.
[25] M. Welling, M. Rosen-Zvi, and G. Hinton. Exponential family harmoniums with an application to information retrieval. In Advances in Neural Information Processing 17, 2004.
[26] P. D. Wilde. Class of Hamiltonian neural networks. Physical Review E, 2:1392–1396, 47.
[27] Y. Zhang and C. Sutton. Quasi-Newton Markov chain Monte Carlo. In Advances in Neural Information Processing Systems (NIPS), 2011. 9