nips nips2013 nips2013-34 nips2013-34-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Matthew Johnson, James Saunderson, Alan Willsky
Abstract: Sampling inference methods are computationally difficult to scale for many models in part because global dependencies can reduce opportunities for parallel computation. Without strict conditional independence structure among variables, standard Gibbs sampling theory requires sample updates to be performed sequentially, even if dependence between most variables is not strong. Empirical work has shown that some models can be sampled effectively by going “Hogwild” and simply running Gibbs updates in parallel with only periodic global communication, but the successes and limitations of such a strategy are not well understood. As a step towards such an understanding, we study the Hogwild Gibbs sampling strategy in the context of Gaussian distributions. We develop a framework which provides convergence conditions and error bounds along with simple proofs and connections to methods in numerical linear algebra. In particular, we show that if the Gaussian precision matrix is generalized diagonally dominant, then any Hogwild Gibbs sampler, with any update schedule or allocation of variables to processors, yields a stable sampling process with the correct sample mean. 1
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19] F. Niu, B. Recht, C. Ré, and S.J. Wright. “Hogwild!: A lock-free approach to parallelizing stochastic gradient descent”. In: Advances in Neural Information Processing Systems (2011). D. Newman, A. Asuncion, P. Smyth, and M. Welling. “Distributed inference for latent dirichlet allocation”. In: Advances in Neural Information Processing Systems 20.1081-1088 (2007), pp. 17–24. D. Newman, A. Asuncion, P. Smyth, and M. Welling. “Distributed algorithms for topic models”. In: The Journal of Machine Learning Research 10 (2009), pp. 1801–1828. Z. Liu, Y. Zhang, E.Y. Chang, and M. Sun. “PLDA+: Parallel latent dirichlet allocation with data placement and pipeline processing”. In: ACM Transactions on Intelligent Systems and Technology (TIST) 2.3 (2011), p. 26. R. Bekkerman, M. Bilenko, and J. Langford. Scaling up machine learning: Parallel and distributed approaches. Cambridge University Press, 2012. A. Ihler and D. Newman. “Understanding Errors in Approximate Distributed Latent Dirichlet Allocation”. In: Knowledge and Data Engineering, IEEE Transactions on 24.5 (2012), pp. 952–960. Y. Liu, O. Kosut, and A. S. Willsky. “Sampling GMRFs by Subgraph Correction”. In: NIPS 2012 Workshop: Perturbations, Optimization, and Statistics (2012). G. Papandreou and A. Yuille. “Gaussian sampling by local perturbations”. In: Neural Information Processing Systems (NIPS). 2010. A. Parker and C. Fox. “Sampling Gaussian distributions in Krylov spaces with conjugate gradients”. In: SIAM Journal on Scientific Computing 34.3 (2012), pp. 312–334. Colin Fox and Albert Parker. “Convergence in Variance of First-Order and Second-Order Chebyshev Accelerated Gibbs Samplers”. 2013. URL: http://www.physics.otago. ac.nz/data/fox/publications/SIAM_CS_2012-11-30.pdf. J. Gonzalez, Y. Low, A. Gretton, and C. Guestrin. “Parallel Gibbs Sampling: From Colored Fields to Thin Junction Trees”. In: In Artificial Intelligence and Statistics (AISTATS). Ft. Lauderdale, FL, May 2011. M. J. Wainwright and M. I. Jordan. “Graphical models, exponential families, and variational inference”. In: Foundations and Trends R in Machine Learning 1.1-2 (2008), pp. 1–305. A. Berman and R.J. Plemmons. “Nonnegative Matrices in the Mathematical Sciences”. In: Classics in Applied Mathematics, 9 (1979). M. J. Castel V. Migallón and J. Penadés. “On Parallel two-stage methods for Hermitian positive definite matrices with applications to preconditioning”. In: Electronic Transactions on Numerical Analysis 12 (2001), pp. 88–112. D. Serre. Nov. 2011. URL: http : / / mathoverflow . net / questions / 80793 / is - gauss - seidel - guaranteed - to - converge - on - semi - positive definite-matrices/80845#80845. Nicholas Ruozzi and Sekhar Tatikonda. “Message-Passing Algorithms for Quadratic Minimization”. In: Journal of Machine Learning Research 14 (2013), pp. 2287–2314. URL: http://jmlr.org/papers/v14/ruozzi13a.html. A. Frommer and D.B. Szyld. “On asynchronous iterations”. In: Journal of computational and applied mathematics 123.1 (2000), pp. 201–216. A. Frommer and D.B. Szyld. “Asynchronous two-stage iterative methods”. In: Numerische Mathematik 69.2 (1994), pp. 141–153. J. A. Kelner, L. Orecchia, A. Sidford, and Z. A. Zhu. A Simple, Combinatorial Algorithm for Solving SDD Systems in Nearly-Linear Time. 2013. arXiv: 1301.6628 [cs.DS]. 9