nips nips2010 nips2010-194 nips2010-194-reference knowledge-graph by maker-knowledge-mining

194 nips-2010-Online Learning for Latent Dirichlet Allocation

Source: pdf

Author: Matthew Hoffman, Francis R. Bach, David M. Blei

Abstract: We develop an online variational Bayes (VB) algorithm for Latent Dirichlet Allocation (LDA). Online LDA is based on online stochastic optimization with a natural gradient step, which we show converges to a local optimum of the VB objective function. It can handily analyze massive document collections, including those arriving in a stream. We study the performance of online LDA in several ways, including by ﬁtting a 100-topic topic model to 3.3M articles from Wikipedia in a single pass. We demonstrate that online LDA ﬁnds topic models as good or better than those found with batch VB, and in a fraction of the time. 1

reference text

[1] M. Braun and J. McAuliffe. Variational inference for large-scale models of discrete choice. arXiv, (0712.2526), 2008.

[2] D. Blei and M. Jordan. Variational methods for the Dirichlet process. In Proc. 21st Int’l Conf. on Machine Learning, 2004.

[3] A. Asuncion, M. Welling, P. Smyth, and Y.W. Teh. On smoothing and inference for topic models. In Proceedings of the 25th Conference on Uncertainty in Artiﬁcial Intelligence, 2009.

[4] D. Newman, A. Asuncion, P. Smyth, and M. Welling. Distributed inference for latent Dirichlet allocation. In Neural Information Processing Systems, 2007.

[5] Feng Yan, Ningyi Xu, and Yuan Qi. Parallel inference for latent Dirichlet allocation on graphics processing units. In Advances in Neural Information Processing Systems 22, pages 2134–2142, 2009.

[6] L. Bottou and O. Bousquet. The tradeoffs of large scale learning. In Advances in Neural Information Processing Systems, volume 20, pages 161–168. NIPS Foundation (http://books.nips.cc), 2008.

[7] D. Blei, A. Ng, and M. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, 3:993–1022, January 2003.

[8] Hanna Wallach, David Mimno, and Andrew McCallum. Rethinking lda: Why priors matter. In Advances in Neural Information Processing Systems 22, pages 1973–1981, 2009.

[9] W. Buntine. Variational extentions to EM and multinomial PCA. In European Conf. on Machine Learning, 2002.

[10] J. Mairal, F. Bach, J. Ponce, and G. Sapiro. Online learning for matrix factorization and sparse coding. Journal of Machine Learning Research, 11(1):19–60, 2010.

[11] L. Yao, D. Mimno, and A. McCallum. Efﬁcient methods for topic model inference on streaming document collections. In KDD 2009: Proc. 15th ACM SIGKDD int’l Conf. on Knowledge discovery and data mining, pages 937–946, 2009.

[12] M. Jordan, Z. Ghahramani, T. Jaakkola, and L. Saul. Introduction to variational methods for graphical models. Machine Learning, 37:183–233, 1999.

[13] H. Attias. A variational Bayesian framework for graphical models. In Advances in Neural Information Processing Systems 12, 2000.

[14] A. Dempster, N. Laird, and D. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39:1–38, 1977.

[15] L. Bottou and N. Murata. Stochastic approximations and efﬁcient learning. The Handbook of Brain Theory and Neural Networks, Second edition. The MIT Press, Cambridge, MA, 2002.

[16] M.A. Sato. Online model selection based on the variational Bayes. Neural Computation, 13(7):1649– 1681, 2001.

[17] P. Liang and D. Klein. Online EM for unsupervised models. In Proc. Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 611–619, 2009.

[18] H. Robbins and S. Monro. A stochastic approximation method. The Annals of Mathematical Statistics, 22(3):400–407, 1951.

[19] L. Bottou. Online learning and stochastic approximations. Cambridge University Press, Cambridge, UK, 1998.

[20] R.M. Neal and G.E. Hinton. A view of the EM algorithm that justiﬁes incremental, sparse, and other variants. Learning in graphical models, 89:355–368, 1998.

[21] M.A. Sato and S. Ishii. On-line EM algorithm for the normalized Gaussian network. Neural Computation, 12(2):407–432, 2000.

[22] T. Grifﬁths and M. Steyvers. Finding scientiﬁc topics. Proc. National Academy of Science, 2004.

[23] X. Song, C.Y. Lin, B.L. Tseng, and M.T. Sun. Modeling and predicting personal information dissemination behavior. In KDD 2005: Proc. 11th ACM SIGKDD int’l Conf. on Knowledge discovery and data mining. ACM, 2005.

[24] K.R. Canini, L. Shi, and T.L. Grifﬁths. Online inference of topics with latent Dirichlet allocation. In Proceedings of the International Conference on Artiﬁcial Intelligence and Statistics, volume 5, 2009.

[25] J. Chang, J. Boyd-Graber, S. Gerrish, C. Wang, and D. Blei. Reading tea leaves: How humans interpret topic models. In Advances in Neural Information Processing Systems 21 (NIPS), 2009. 9