nips nips2009 nips2009-205 nips2009-205-reference knowledge-graph by maker-knowledge-mining

205 nips-2009-Rethinking LDA: Why Priors Matter

Source: pdf

Author: Andrew McCallum, David M. Mimno, Hanna M. Wallach

Abstract: Implementations of topic models typically use symmetric Dirichlet priors with ﬁxed concentration parameters, with the implicit assumption that such “smoothing parameters” have little practical effect. In this paper, we explore several classes of structured priors for topic models. We ﬁnd that an asymmetric Dirichlet prior over the document–topic distributions has substantial advantages over a symmetric prior, while an asymmetric prior over the topic–word distributions provides no real beneﬁt. Approximation of this prior structure through simple, efﬁcient hyperparameter optimization steps is sufﬁcient to achieve these performance gains. The prior structure we advocate substantially increases the robustness of topic models to variations in the number of topics and to the highly skewed word frequency distributions common in natural language. Since this prior structure can be implemented using efﬁcient algorithms that add negligible cost beyond standard inference techniques, we recommend it as a new standard for topic modeling. 1

reference text

[1] A. Asuncion, M. Welling, P. Smyth, and Y. W. Teh. On smoothing and inference for topic models. In Proceedings of the 25th Conference on Uncertainty in Artiﬁcial Intelligence, 2009.

[2] D. Blei and J. Lafferty. A correlated topic model of Science. Annals of Applied Statistics, 1(1):17–35, 2007.

[3] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, 3:993–1022, January 2003.

[4] P. J. Cowans. Probabilistic Document Modelling. PhD thesis, University of Cambridge, 2006.

[5] S. Geman and D. Geman. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transaction on Pattern Analysis and Machine Intelligence 6, pages 721–741, 1984.

[6] S. Goldwater and T. L. Grifﬁths. A fully Bayesian approach to unsupervised part-of-speech tagging. In Association for Computational Linguistics, 2007.

[7] T. L. Grifﬁths and M. Steyvers. Finding scientiﬁc topics. Proceedings of the National Academy of Sciences, 101(suppl. 1):5228–5235, 2004.

[8] T. L. Grifﬁths, M. Steyvers, D. M. Blei, and J. B. Tenenbaum. Integrating topics and syntax. In L. K. Saul, Y. Weiss, and L. Bottou, editors, Advances in Neural Information Processing Systems 17, pages 536–544. The MIT Press, 2005.

[9] D. Hall, D. Jurafsky, and C. D. Manning. Studying the history of ideas using topic models. In Proceedings of EMNLP 2008, pages 363–371.

[10] W. Li and A. McCallum. Mixtures of hierarchical topics with pachinko allocation. In Proceedings of the 24th International Conference on Machine learning, pages 633–640, 2007.

[11] M. Meil˘ . Comparing clusterings by the variation of information. In Conference on Learning a Theory, 2003.

[12] D. Mimno and A. McCallum. Organizing the OCA: Learning faceted subjects from a library of digital books. In Proceedings of the 7th ACM/IEEE joint conference on Digital libraries, pages 376–385, Vancouver, BC, Canada, 2007.

[13] R. M. Neal. Slice sampling. Annals of Statistics, 31:705–767, 2003.

[14] D. Newman, C. Chemudugunta, P. Smyth, and M. Steyvers. Analyzing entities and topics in news articles using statistical topic models. In Intelligence and Security Informatics, Lecture Notes in Computer Science. 2006.

[15] Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei. Hierarchical Dirichlet processes. Journal of the American Statistical Association, 101:1566–1581, 2006.

[16] Y. W. Teh, D. Newman, and M. Welling. A collapsed variational Bayesian inference algorithm for latent Dirichlet allocation. In Advances in Neural Information Processing Systems 18, 2006.

[17] H. Wallach, I. Murray, R. Salakhutdinov, and D. Mimno. Evaluation methods for topic models. In Proceedings of the 26th Interational Conference on Machine Learning, 2009.

[18] H. M. Wallach. Topic modeling: Beyond bag-of-words. In Proceedings of the 23rd International Conference on Machine Learning, pages 977–984, Pittsburgh, Pennsylvania, 2006.

[19] H. M. Wallach. Structured Topic Models for Language. Ph.D. thesis, University of Cambridge, 2008.

[20] L. Yao, D. Mimno, and A. McCallum. Efﬁcient methods for topic model inference on streaming document collections. In Proceedings of KDD 2009, 2009. 9