nips nips2010 nips2010-60 nips2010-60-reference knowledge-graph by maker-knowledge-mining

60 nips-2010-Deterministic Single-Pass Algorithm for LDA

Source: pdf

Author: Issei Sato, Kenichi Kurihara, Hiroshi Nakagawa

Abstract: We develop a deterministic single-pass algorithm for latent Dirichlet allocation (LDA) in order to process received documents one at a time and then discard them in an excess text stream. Our algorithm does not need to store old statistics for all data. The proposed algorithm is much faster than a batch algorithm and is comparable to the batch algorithm in terms of perplexity in experiments.

reference text

Loulwah Alsumait, Daniel Barbara, and Carlotta Domeniconi. On-line lda: Adaptive topic models for mining text streams with applications to topic detection and tracking. IEEE International Conference on Data Mining, 0:3–12, 2008. ISSN 1550-4786. 8 A. Asuncion, M. Welling, P. Smyth, and Y. W. Teh. On smoothing and inference for topic models. In Proceedings of the International Conference on Uncertainty in Artiﬁcial Intelligence, 2009. Arindam Banerjee and Sugato Basu. Topic models over text streams: A study of batch and online unsupervised learning. In SIAM International Conference on Data Mining, 2007. D. P. Bertsekas and J. N. Tsitsiklis. Neuro-Dynamic Programming. Athena Scientiﬁc, 1996. D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, 3:993–1022, 2003. Eric Brochu, Nando de Freitas, and Kejie Bao. Owed to a martingale: A fast bayesian on-line em algorithm for multinomial models, 2004. Kevin R. Canini, Lei Shi, and Thomas L. Grifﬁths. Online inference of topics with latent dirichlet allocation. In Proceedings of the Twelfth International Conference on Artiﬁcial Intelligence and Statistics, 2009. H.Robbins and S.Monro. A stochastic approximation method. In Annals of Mathematical Statistics, pages 400–407, 1951. Thomas P. Minka. Estimating a dirichlet distribution. Technical report, Microsoft, 2000. URL http://research.microsoft.com/∼minka/papers/dirichlet/ minka-dirichlet.pdf. Thomas P. Minka. Using lower bounds to approximate integrals. Technical report, Microsoft, 2001. URL http://research.microsoft.com/en-us/um/people/ minka/papers/rem.html. R. Neal and G. Hinton. A view of the EM algorithm that justiﬁes incremental, sparse, and other variants. In M. I. Jordan, editor, Learning in Graphical Models. Kluwer, 1998. URL http: //citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.33.2557. Masa A. Sato and Shin Ishii. On-line em algorithm for the normalized gaussian network. Neural Computation, 12(2):407–432, 2000. URL http://citeseerx.ist.psu.edu/ viewdoc/summary?doi=10.1.1.37.3704. Yee Whye Teh, David Newman, and Max Welling. A collapsed variational Bayesian inference algorithm for latent Dirichlet allocation. In Advances in Neural Information Processing Systems 19, 2007. Hanna Wallach, David Mimno, and Andrew McCallum. Rethinking lda: Why priors matter. In Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems 22, pages 1973–1981. 2009. Limin Yao, David Mimno, and Andrew McCallum. Efﬁcient methods for topic model inference on streaming document collections. In KDD ’09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 937–946, New York, NY, USA, 2009. ACM. ISBN 978-1-60558-495-9. 9