nips nips2008 nips2008-28 nips2008-28-reference knowledge-graph by maker-knowledge-mining

28 nips-2008-Asynchronous Distributed Learning of Topic Models

Source: pdf

Author: Padhraic Smyth, Max Welling, Arthur U. Asuncion

Abstract: Distributed learning is a problem of fundamental interest in machine learning and cognitive science. In this paper, we present asynchronous distributed learning algorithms for two well-known unsupervised learning frameworks: Latent Dirichlet Allocation (LDA) and Hierarchical Dirichlet Processes (HDP). In the proposed approach, the data are distributed across P processors, and processors independently perform Gibbs sampling on their local data and communicate their information in a local asynchronous manner with other processors. We demonstrate that our asynchronous algorithms are able to learn global topic models that are statistically as accurate as those learned by the standard LDA and HDP samplers, but with signiﬁcant improvements in computation time and memory. We show speedup results on a 730-million-word text corpus using 32 processors, and we provide perplexity results for up to 1500 virtual processors. As a stepping stone in the development of asynchronous HDP, a parallel HDP sampler is also introduced. 1

reference text

[1] D. Blei, A. Ng, and M. Jordan. Latent Dirichlet allocation. JMLR, 3:993–1022, 2003.

[2] Y. Teh, M. Jordan, M. Beal, and D. Blei. Hierarchical Dirichlet processes. JASA, 101(476), 2006.

[3] D. Mimno and A. McCallum. Organizing the OCA: learning faceted subjects from a library of digital books. In JCDL ’07, pages 376–385, New York, NY, USA, 2007. ACM.

[4] R. Nallapati, W. Cohen, and J. Lafferty. Parallelized variational EM for latent Dirichlet allocation: An experimental evaluation of speed and scalability. In ICDM Workshop On High Perf. Data Mining, 2007.

[5] D. Newman, A. Asuncion, P. Smyth, and M. Welling. Distributed inference for latent Dirichlet allocation. In NIPS 20. MIT Press, Cambridge, MA, 2008.

[6] S. Boyd, A. Ghosh, B. Prabhakar, and D. Shah. Gossip algorithms: design, analysis and applications. In INFOCOM, pages 1653–1664, 2005.

[7] T. L. Grifﬁths and M. Steyvers. Finding scientiﬁc topics. PNAS, 101 Suppl 1:5228–5235, April 2004.

[8] J. Wolfe, A. Haghighi, and D. Klein. Fully distributed EM for very large datasets. In ICML ’08, pages 1184–1191, New York, NY, USA, 2008. ACM.

[9] A. Brockwell. Parallel Markov chain Monte Carlo simulation by pre-fetching. JCGS, 15, No. 1, 2006.

[10] W. Kowalczyk and N. Vlassis. Newscast EM. In NIPS 17. MIT Press, Cambridge, MA, 2005. 8