nips nips2007 nips2007-73 nips2007-73-reference knowledge-graph by maker-knowledge-mining

73 nips-2007-Distributed Inference for Latent Dirichlet Allocation

Source: pdf

Author: David Newman, Padhraic Smyth, Max Welling, Arthur U. Asuncion

Abstract: We investigate the problem of learning a widely-used latent-variable model – the Latent Dirichlet Allocation (LDA) or “topic” model – using distributed computation, where each of processors only sees of the total data set. We propose two distributed inference schemes that are motivated from different perspectives. The ﬁrst scheme uses local Gibbs sampling on each processor with periodic updates—it is simple to implement and can be viewed as an approximation to a single processor implementation of Gibbs sampling. The second scheme relies on a hierarchical Bayesian extension of the standard LDA model to directly account for the fact that data are distributed across processors—it has a theoretical guarantee of convergence but is more complex to implement than the approximate method. Using ﬁve real-world text corpora we show that distributed learning works very well for LDA models, i.e., perplexity and precision-recall scores for distributed learning are indistinguishable from those obtained with single-processor learning. Our extensive experimental results include large-scale distributed computation on 1000 virtual processors; and speedup experiments of learning topics in a 100-million word corpus using 16 processors. ¢ ¤ ¦¥£ ¢ ¢

reference text

[1] C. Chu, S. Kim, Y. Lin, Y. Yu, G. Bradski, A. Ng, and K. Olukotun. Map-Reduce for machine learning on multicore. In NIPS 19, pages 281–288. MIT Press, Cambridge, MA, 2007.

[2] W. Kowalczyk and N. Vlassis. Newscast EM. In NIPS 17, pages 713–720. MIT Press, Cambridge, MA, 2005.

[3] A. Das, M. Datar, A. Garg, and S. Rajaram. Google news personalization: Scalable online collaborative ﬁltering. In 16th International World Wide Web Conference, 2007.

[4] D. Blei, A. Ng, and M. Jordan. Latent Dirichlet allocation. JMLR, 3:993–1022, 2003.

[5] T. Grifﬁths and M. Steyvers. Finding scientiﬁc topics. In Proceedings of the National Academy of Sciences, volume 101, pages 5228–5235, 2004.

[6] Y.W. Teh, M. Jordan, M. Beal, and A. Blei. Sharing clusters among related groups: Hierarchical Dirichlet processes. In NIPS 17, pages 1385–1392. MIT Press, Cambridge, MA, 2005.

[7] W. Li and A. McCallum. Pachinko allocation: DAG-structured mixture models of topic correlations. In ICML, pages 577–584, 2006.

[8] G. Wei and M. Tanner. A Monte Carlo implementation of the EM algorithm and the poor man’s data augmentation algorithms. Journal of the American Statistical Association, 85(411):699– 704, 1990.

[9] T. Minka. Estimating a Dirichlet distribution. http://research.microsoft.com/ minka/papers/dirichlet/, 2003.

[10] A. Brockwell. Parallel markov chain monte carlo simulation by pre-fetching. J.Comp.Graph.Stats, volume 15, pages 246–261, 2006. In

[11] A. McCallum D. Mimno. Organizing the oca: Learning faceted subjects from a library of digital books. In Joint Conference in Digital Libraries, pages 376–385, 2007. 9