nips nips2013 nips2013-312 nips2013-312-reference knowledge-graph by maker-knowledge-mining

312 nips-2013-Stochastic Gradient Riemannian Langevin Dynamics on the Probability Simplex

Source: pdf

Author: Sam Patterson, Yee Whye Teh

Abstract: In this paper we investigate the use of Langevin Monte Carlo methods on the probability simplex and propose a new method, Stochastic gradient Riemannian Langevin dynamics, which is simple to implement and can be applied to large scale data. We apply this method to latent Dirichlet allocation in an online minibatch setting, and demonstrate that it achieves substantial performance improvements over the state of the art online variational Bayesian methods. 1

reference text

[ABW12] Sungjin Ahn, Anoop Korattikara Balan, and Max Welling, Bayesian posterior sampling via stochastic gradient ﬁsher scoring., ICML, 2012. [AKW12] S. Ahn, A. Korattikara, and M. Welling, Bayesian posterior sampling via stochastic gradient Fisher scoring, Proceedings of the International Conference on Machine Learning, 2012. [Ama95] S. Amari, Information geometry of the EM and em algorithms for neural networks, Neural Networks 8 (1995), no. 9, 1379–1408. [AWST09] A. Asuncion, M. Welling, P. Smyth, and Y. W. Teh, On smoothing and inference for topic models, Proceedings of the International Conference on Uncertainty in Artiﬁcial Intelligence, vol. 25, 2009. [Bea03] M. J. Beal, Variational algorithms for approximate bayesian inference, Ph.D. thesis, Gatsby Computational Neuroscience Unit, University College London, 2003. [BNJ03] D. M. Blei, A. Y. Ng, and M. I. Jordan, Latent Dirichlet allocation, Journal of Machine Learning Research 3 (2003), 993–1022. [GC11] M. Girolami and B. Calderhead, Riemann manifold Langevin and Hamiltonian Monte Carlo methods, Journal of the Royal Statistical Society B 73 (2011), 1–37. [GCPT07] A. Globerson, G. Chechik, F. Pereira, and N. Tishby, Euclidean Embedding of Co-occurrence Data, The Journal of Machine Learning Research 8 (2007), 2265–2295. [GRS96] W. R. Gilks, S. Richardson, and D. J. Spiegelhalter, Markov chain monte carlo in practice, Chapman and Hall, 1996. [GS04] T. L. Grifﬁths and M. Steyvers, Finding scientiﬁc topics, Proceedings of the National Academy of Sciences, 2004. [HBB10] M. D. Hoffman, D. M. Blei, and F. Bach, Online learning for latent dirichlet allocation, Advances in Neural Information Processing Systems, 2010. [Hec99] D. Heckerman, A tutorial on learning with Bayesian networks, Learning in Graphical Models (M. I. Jordan, ed.), Kluwer Academic Publishers, 1999. [Ken78] J. Kent, Time-reversible diffusions, Advances in Applied Probability 10 (1978), 819–835. [Ken90] A. D. Kennedy, The theory of hybrid stochastic algorithms, Probabilistic Methods in Quantum Field Theory and Quantum Gravity, Plenum Press, 1990. [MHB12] D. Mimno, M. Hoffman, and D. Blei, Sparse stochastic inference for latent Dirichlet allocation, Proceedings of the International Conference on Machine Learning, 2012. [NASW09] D. Newman, A. Asuncion, P. Smyth, and M. Welling, Distributed algorithms for topic models, Journal of Machine Learning Research (2009). [Nea10] R. M. Neal, MCMC using Hamiltonian dynamics, Handbook of Markov Chain Monte Carlo (S. Brooks, A. Gelman, G. Jones, and X.-L. Meng, eds.), Chapman & Hall / CRC Press, 2010. [PSD00] J.K. Pritchard, M. Stephens, and P. Donnelly, Inference of population structure using multilocus genotype data, Genetics 155 (2000), 945–959. [RM51] H. Robbins and S. Monro, A stochastic approximation method, Annals of Mathematical Statistics 22 (1951), no. 3, 400–407. [RS02] G. O. Roberts and O. Stramer, Langevin diffusions and metropolis-hastings algorithms, Methodology and Computing in Applied Probability 4 (2002), 337–357, 10.1023/A:1023562417138. [Sat01] M. Sato, Online model selection based on the variational Bayes, Neural Computation 13 (2001), 1649–1681. [TNW07] Y. W. Teh, D. Newman, and M. Welling, A collapsed variational Bayesian inference algorithm for latent Dirichlet allocation, Advances in Neural Information Processing Systems, vol. 19, 2007, pp. 1353–1360. [WJ08] M. J. Wainwright and M. I. Jordan, Graphical models, exponential families, and variational inference, Foundations and Trends in Machine Learning 1 (2008), no. 1-2, 1–305. [WMSM09] Hanna M. Wallach, Iain Murray, Ruslan Salakhutdinov, and David Mimno, Evaluation methods for topic models, Proceedings of the 26th International Conference on Machine Learning (ICML) (Montreal) (L´ on Bottou and Michael Littman, eds.), Omnipress, June 2009, pp. 1105–1112. e [WT11] M. Welling and Y. W. Teh, Bayesian learning via stochastic gradient Langevin dynamics, Proceedings of the International Conference on Machine Learning, 2011. 9