acl acl2011 acl2011-178 acl2011-178-reference knowledge-graph by maker-knowledge-mining

178 acl-2011-Interactive Topic Modeling

Source: pdf

Author: Yuening Hu ; Jordan Boyd-Graber ; Brianna Satinoff

Abstract: Topic models have been used extensively as a tool for corpus exploration, and a cottage industry has developed to tweak topic models to better encode human intuitions or to better model data. However, creating such extensions requires expertise in machine learning unavailable to potential end-users of topic modeling software. In this work, we develop a framework for allowing users to iteratively refine the topics discovered by models such as latent Dirichlet allocation (LDA) by adding constraints that enforce that sets of words must appear together in the same topic. We incorporate these constraints interactively by selectively removing elements in the state of a Markov Chain used for inference; we investigate a variety of methods for incorporating this information and demonstrate that these interactively added constraints improve topic usefulness for simulated and actual user sessions.

reference text

David Andrzejewski, Xiaojin Zhu, and Mark Craven. 2009. Incorporating domain knowledge into topic modeling via Dirichlet forest priors. In Proceedings of International Conference of Machine Learning. David M. Blei, Andrew Ng, and Michael Jordan. 2003. Latent Dirichlet allocation. Journal of Machine Learning Research, 3:993–1022. Jordan Boyd-Graber and David M. Blei. 2008. Syntactic topic models. In Proceedings of Advances in Neural Information Processing Systems. Jordan Boyd-Graber and Philip Resnik. 2010. Holistic sentiment analysis across languages: Multilingual supervised latent Dirichlet allocation. In Proceedings of Emperical Methods in Natural Language Processing. Jordan Boyd-Graber, David M. Blei, and Xiaojin Zhu. 2007. A topic model for word sense disambiguation. In Proceedings of Emperical Methods in Natural Language Processing. Jonathan Chang, Jordan Boyd-Graber, Chong Wang, Sean Gerrish, and David M. Blei. 2009. Reading tea leaves: How humans interpret topic models. In Neural Information Processing Systems. Jonathan Chang. 2010. Not-so-latent Dirichlet allocation: Collapsed Gibbs sampling using human judgments. In NAACL Workshop: Creating Speech and Language Data With Amazon ’ss Mechanical Turk. Hal Daume´ III. 2009. Markov random topic fields. In Proceedings of Artificial Intelligence and Statistics. Laura Dietz, Steffen Bickel, and Tobias Scheffer. 2007. Unsupervised prediction of citation influences. In Proceedings ofInternational Conference ofMachine Learning. Thomas L. Griffiths and Mark Steyvers. 2004. Finding scientific topics. Proceedings of the National Academy of Sciences, 101(Suppl 1):5228–5235. Amit Gruber, Michael Rosen-Zvi, and Yair Weiss. 2007. Hidden topic Markov models. In Artificial Intelligence and Statistics. Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten. 2009. The WEKA data mining software: An update. SIGKDD Explorations, 11(1): 10–18. Thomas Hofmann. 1999. Probabilistic latent semantic analysis. In Proceedings of Uncertainty in Artificial Intelligence. Thomas K. Landauer, Danielle S. McNamara, Dennis S. Marynick, and Walter Kintsch, editors. 2006. Probabilistic Topic Models. Laurence Erlbaum. Wei-Hao Lin, Theresa Wilson, Janyce Wiebe, and Alexander Hauptmann. 2006. Which side are you on? identifying perspectives at the document and sentence levels. 257 In Proceedings of the Conference on Natural Language Learning (CoNLL). Edward Loper and Steven Bird. 2002. NLTK: the natural language toolkit. In Tools and methodologies for teaching. Radford M. Neal. 1993. Probabilistic inference using Markov chain Monte Carlo methods. Technical Report CRG-TR-93-1, University of Toronto. David Newman, Jey Han Lau, Karl Grieser, and Timothy Baldwin. 2010. Automatic evaluation of topic coherence. In Conference of the North American Chapter of the Association for Computational Linguistics. Michael Paul and Roxana Girju. 2010. A twodimensional topic-aspect model for discovering multifaceted topics. In Association for the Advancement of Artificial Intelligence. James Petterson, Smola Alex, Tiberio Caetano, Wray Buntine, and Narayanamurthy Shravan. 2010. Word features for latent Dirichlet allocation. In Neural Information Processing Systems. Daniel Ramage, David Hall, Ramesh Nallapati, and Christopher D. Manning. 2009. Labeled LDA: A supervised topic model for credit attribution in multilabeled corpora. In Proceedings of Emperical Methods in Natural Language Processing. Fergus Rob, Li Fei-Fei, Perona Pietro, and Zisserman Andrew. 2005. Learning object categories from Google’s image search. In International Conference on Computer Vision. Michal Rosen-Zvi, Thomas L. Griffiths, Mark Steyvers, and Padhraic Smyth. 2004. The author-topic model for authors and documents. In Proceedings of Uncertainty in Artificial Intelligence. Suyash Shringarpure and Eric P. Xing. 2008. mStruct: a new admixture model for inference of population structure in light of both genetic admixing and allele mutations. In Proceedings of International Conference of Machine Learning. Rion Snow, Brendan O’Connor, Daniel Jurafsky, and Andrew Ng. 2008. Cheap and fast—but is it good? Evaluating non-expert annotations for natural language tasks. In Proceedings of Emperical Methods in Natural Language Processing. Hanna Wallach, David Mimno, and Andrew McCallum. 2009. Rethinking LDA: Why priors matter. In Proceedings of Advances in Neural Information Processing Systems. Hanna M. Wallach. 2006. Topic modeling: Beyond bagof-words. In Proceedings of International Conference of Machine Learning. Limin Yao, David Mimno, and Andrew McCallum. 2009. Efficient methods for topic model inference on streaming document collections. In Knowledge Discovery and Data Mining.