acl acl2011 acl2011-52 acl2011-52-reference knowledge-graph by maker-knowledge-mining

52 acl-2011-Automatic Labelling of Topic Models


Source: pdf

Author: Jey Han Lau ; Karl Grieser ; David Newman ; Timothy Baldwin

Abstract: We propose a method for automatically labelling topics learned via LDA topic models. We generate our label candidate set from the top-ranking topic terms, titles of Wikipedia articles containing the top-ranking topic terms, and sub-phrases extracted from the Wikipedia article titles. We rank the label candidates using a combination of association measures and lexical features, optionally fed into a supervised ranking model. Our method is shown to perform strongly over four independent sets of topics, significantly better than a benchmark method.


reference text

S. Banerjee and T. Pedersen. 2003. The design, implementation, and use of the Ngram Statistic Package. In Proceedings of the Fourth International Conference on Intelligent Text Processing and Computational Linguistics, pages 370–381, Mexico City, February. D.M. Blei and J.D. Lafferty. 2006. Dynamic topic models. In ICML 2006. D.M. Blei, A.Y. Ng, and M.I. Jordan. 2003. Latent Dirichlet allocation. JMLR, 3:993–1022. S. Brody and M. Lapata. 2009. Bayesian word sense induction. In EACL 2009, pages 103–1 11. J. Chang, J. Boyd-Graber, S. Gerrish, C. Wang, and D. Blei. 2009. Reading tea leaves: How humans interpret topic models. In NIPS, pages 288–296. B. Croft, D. Metzler, and T. Strohman. 2009. Search Engines: Information Retrieval in Practice. Addison Wesley. Y. Feng and M. Lapata. 2010. Topic models for image annotation and text illustration. In Proceedings of Human Language Technologies: The 11th Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL HLT 2010), pages 83 1–839, Los Angeles, USA, June. K. Grieser, T. Baldwin, F. Bohnert, and L. Sonenberg. 2011. Using ontological and document similarity to estimate museum exhibit relatedness. ACM Journal on Computing and Cultural Heritage, 3(3): 1–20. T. Griffiths and M. Steyvers. 2004. Finding scientific topics. In PNAS, volume 101, pages 5228–5235. A. Haghighi and L. Vanderwende. 2009. Exploring content models for multi-document summarization. In HLT: NAACL 2009, pages 362–370. K. Jarvelin and J. Kekalainen. 2002. Cumulated gainbased evaluation of IR techniques. ACM Transactions on Information Systems, 20(4). T. Joachims. 2006. Training linear svms in linear time. In Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (KDD), pages 217–226, New York, NY, USA. ACM. J.H. Lau, D. Newman, S. Karimi, and T. Baldwin. 2010. Best topic word selection for topic labelling. In Coling 2010: Posters, pages 605–613, Beijing, China. D. Magatti, S. Calegari, D. Ciucci, and F. Stella. 2009. Automatic labeling of topics. In ISDA 2009, pages 1227–1232, Pisa, Italy. Q. Mei, C. Liu, H. Su, and C. Zhai. 2006. A probabilistic approach to spatiotemporal theme pattern mining on weblogs. In WWW 2006, pages 533–542. Q. Mei, X. Shen, and C. Zhai. 2007. Automatic labeling of multinomial topic models. In SIGKDD, pages 490– 499. 1545 G. Minnen, J. Carroll, and D. Pearce. 2001. Applied morphological processing of English. Journal of Natural Language Processing, 7(3):207–223. D. Newman, T. Baldwin, L. Cavedon, S. Karimi, D. Martinez, and J. Zobel. 2010a. Visualizing document collections and search results using topic mapping. Journal of Web Semantics, 8(2-3): 169–175. D. Newman, J.H. Lau, K. Grieser, and T. Baldwin. 2010b. Automatic evaluation of topic coherence. In Proceedings of Human Language Technologies: The 11th Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL HLT 2010), pages 100–108, Los Angeles, USA, June. Association for Computational Linguistics. D. O` S ´eaghdha. 2010. Latent variable models of selectional preference. In ACL 2010. P. Pantel and D. Ravichandran. 2004. Automatically labeling semantic classes. In HLT/NAACL-04, pages 321–328. P. Pecina. 2009. Lexical Association Measures: Collocation Extraction. Ph.D. thesis, Charles University. A. Ritter, Mausam, and O. Etzioni. 2010. A latent Dirichlet allocation method for selectional preferences. In ACL 2010. R. Snow, B. O’Connor, D. Jurafsky, and A. Y. Ng. 2008. Cheap and fast—but is it good?: evaluating non-expert annotations for natural language tasks. In EMNLP ’08: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 254– 263, Morristown, NJ, USA. I. Titov and R. McDonald. 2008. Modeling online reviews with multi-grain topic models. In WWW ’08, pages 111–120. X. Wang and A. McCallum. 2006. Topics over time: A non-Markov continuous-time model of topical trends. In KDD, pages 424–433. S. Wei and W.B. Croft. 2006. LDA-based document models for ad-hoc retrieval. In SIGIR ’06, pages 178– 185.