emnlp emnlp2011 emnlp2011-21 emnlp2011-21-reference knowledge-graph by maker-knowledge-mining

21 emnlp-2011-Bayesian Checking for Topic Models


Source: pdf

Author: David Mimno ; David Blei

Abstract: Real document collections do not fit the independence assumptions asserted by most statistical topic models, but how badly do they violate them? We present a Bayesian method for measuring how well a topic model fits a corpus. Our approach is based on posterior predictive checking, a method for diagnosing Bayesian models in user-defined ways. Our method can identify where a topic model fits the data, where it falls short, and in which directions it might be improved.


reference text

Amr Ahmed and Eric Xing. 2010. Staying informed: Supervised and semi-supervised multi-view topical analysis of ideological perspective. In EMNLP. Arthur Asuncion, Padhraic Smyth, and Max Welling. 2008. Asynchronous distributed learning of topic models. In NIPS. M.J. Bayarri and M.E. Castellanos. 2007. Bayesian checking of the second levels of hierarchical models. Statistical Science, 22(3):322–343. David M. Blei and John D. Lafferty. 2006. Dynamic topic models. In ICML. David Blei, Andrew Ng, and Michael Jordan. 2003. Latent Dirichlet allocation. Journal of Machine Learning Research, 3:993–1022, January. Jonathan Chang, Jordan Boyd-Graber, Chong Wang, Sean Gerrish, and David M. Blei. 2009. Reading tea leaves: How humans interpret topic models. In Advances in Neural Information Processing Systems 22, pages 288–296. Gabriel Doyle and Charles Elkan. 2009. Accounting for burstiness in topic models. In ICML. David Draper and Milovan Krnjajic. 2006. Bayesian model specification. Technical report, University of California, Santa Cruz. Jacob Eisenstein and Eric Xing. 2010. The CMU 2008 political blog corpus. Technical report, Carnegie Mellon University. A. Gelman, X.L. Meng, and H.S. Stern. 1996. posterior predictive assessment of model fitness via realized discrepancies. Statistica Sinica, 6:733–807. Thomas L. Griffiths and Mark Steyvers. 2004. Finding scientific topics. PNAS, 101(suppl. 1):5228–5235. Matthew Hoffman, David Blei, and Francis Bach. 2010. Online learning for latent dirichlet allocation. In NIPS. Wei-Hao Lin, Eric Xing, and Alexander Hauptmann. 2008. A joint topic and perspective model for ideological discourse. In PKDD. Qiaozhu Mei and ChengXiang Zhai. 2006. A mixture model for contextual text mining. In KDD. David Mimno and Andrew McCallum. 2007. Organizing the OCA: learning faceted subjects from a library of digital books. In JCDL. David Newman, Jey Han Lau, Karl Grieser, and Timothy Baldwin. 2010. Automatic evaluation of topic coherence. In Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics. 237 Michael J. Paul, ChengXiang Zhai, and Roxana Girju. 2010. Summarizing contrastive viewpoints in opinionated text. In EMNLP. Donald B. Rubin. 1981. Estimation in parallel randomized experiments. Journal of Educational Statistics, 6:377–401. D. Rubin. 1984. Bayesianly justifiable and relevant frequency calculations for the applied statistician. The Annals of Statistics, 12(4): 115 1–1 172. Hanna Wallach, Iain Murray, Ruslan Salakhutdinov, and David Mimno. 2009. Evaluation methods for topic models. In ICML.