nips nips2007 nips2007-73 nips2007-73-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: David Newman, Padhraic Smyth, Max Welling, Arthur U. Asuncion
Abstract: We investigate the problem of learning a widely-used latent-variable model – the Latent Dirichlet Allocation (LDA) or “topic” model – using distributed computation, where each of processors only sees of the total data set. We propose two distributed inference schemes that are motivated from different perspectives. The first scheme uses local Gibbs sampling on each processor with periodic updates—it is simple to implement and can be viewed as an approximation to a single processor implementation of Gibbs sampling. The second scheme relies on a hierarchical Bayesian extension of the standard LDA model to directly account for the fact that data are distributed across processors—it has a theoretical guarantee of convergence but is more complex to implement than the approximate method. Using five real-world text corpora we show that distributed learning works very well for LDA models, i.e., perplexity and precision-recall scores for distributed learning are indistinguishable from those obtained with single-processor learning. Our extensive experimental results include large-scale distributed computation on 1000 virtual processors; and speedup experiments of learning topics in a 100-million word corpus using 16 processors. ¢ ¤ ¦¥£ ¢ ¢
[1] C. Chu, S. Kim, Y. Lin, Y. Yu, G. Bradski, A. Ng, and K. Olukotun. Map-Reduce for machine learning on multicore. In NIPS 19, pages 281–288. MIT Press, Cambridge, MA, 2007.
[2] W. Kowalczyk and N. Vlassis. Newscast EM. In NIPS 17, pages 713–720. MIT Press, Cambridge, MA, 2005.
[3] A. Das, M. Datar, A. Garg, and S. Rajaram. Google news personalization: Scalable online collaborative filtering. In 16th International World Wide Web Conference, 2007.
[4] D. Blei, A. Ng, and M. Jordan. Latent Dirichlet allocation. JMLR, 3:993–1022, 2003.
[5] T. Griffiths and M. Steyvers. Finding scientific topics. In Proceedings of the National Academy of Sciences, volume 101, pages 5228–5235, 2004.
[6] Y.W. Teh, M. Jordan, M. Beal, and A. Blei. Sharing clusters among related groups: Hierarchical Dirichlet processes. In NIPS 17, pages 1385–1392. MIT Press, Cambridge, MA, 2005.
[7] W. Li and A. McCallum. Pachinko allocation: DAG-structured mixture models of topic correlations. In ICML, pages 577–584, 2006.
[8] G. Wei and M. Tanner. A Monte Carlo implementation of the EM algorithm and the poor man’s data augmentation algorithms. Journal of the American Statistical Association, 85(411):699– 704, 1990.
[9] T. Minka. Estimating a Dirichlet distribution. http://research.microsoft.com/ minka/papers/dirichlet/, 2003.
[10] A. Brockwell. Parallel markov chain monte carlo simulation by pre-fetching. J.Comp.Graph.Stats, volume 15, pages 246–261, 2006. In
[11] A. McCallum D. Mimno. Organizing the oca: Learning faceted subjects from a library of digital books. In Joint Conference in Digital Libraries, pages 376–385, 2007. 9