emnlp emnlp2010 emnlp2010-23 emnlp2010-23-reference knowledge-graph by maker-knowledge-mining

23 emnlp-2010-Automatic Keyphrase Extraction via Topic Decomposition

Source: pdf

Author: Zhiyuan Liu ; Wenyi Huang ; Yabin Zheng ; Maosong Sun

Abstract: Existing graph-based ranking methods for keyphrase extraction compute a single importance score for each word via a single random walk. Motivated by the fact that both documents and words can be represented by a mixture of semantic topics, we propose to decompose traditional random walk into multiple random walks specific to various topics. We thus build a Topical PageRank (TPR) on word graph to measure word importance with respect to different topics. After that, given the topic distribution of the document, we further calculate the ranking scores of words and extract the top ranked ones as keyphrases. Experimental results show that TPR outperforms state-of-the-art keyphrase extraction methods on two datasets under various evaluation metrics.

reference text

David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent dirichlet allocation. Journal of Machine Learning Research, 3:993–1022, January. C. Buckley and E.M. Voorhees. 2004. Retrieval evaluation with incomplete information. In Proceedings of SIGIR, pages 25–32. David Cohn and Huan Chang. 2000. Learning to probabilistically identify authoritative documents. In Pro- ceedings of ICML, pages 167–174. M. Grineva, M. Grinev, and D. Lizorkin. 2009. Extracting key terms from noisy and multi-theme documents. In Proceedings of WWW, pages 661–670. Taher H. Haveliwala. 2002. Topic-sensitive pagerank. In Proceedings of WWW, pages 517–526. G. Heinrich. 2005. Parameter estimation for text analysis. Web: http://www. arbylon. net/publications/textest. Thomas Hofmann. 1999. Probabilistic latent semantic indexing. In Proceedings of SIGIR, pages 50–57. Anette Hulth. 2003. Improved automatic keyword extraction given more linguistic knowledge. In Proceedings of EMNLP, pages 216–223. J.M. Kleinberg. 1999. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5):604– 632. T.K. Landauer, P.W. Foltz, and D. Laham. 1998. An introduction to latent semantic analysis. Discourse Processes, 25:259–284. Marina Litvak and Mark Last. 2008. Graph-based keyword extraction for single-document summarization. In Proceedings of the workshop Multi-source Multilingual Information Extraction and Summarization, pages 17–24. Zhiyuan Liu, Peng Li, Yabin Zheng, and Maosong Sun. 2009. Clustering to find exemplar terms for keyphrase extraction. In Proceedings of EMNLP, pages 257– 266. C.D. Manning and H. Schutze. 2000. Foundations of statistical natural language processing. MIT Press. Rada Mihalcea and Paul Tarau. 2004. Textrank: Bringing order into texts. In Proceedings of EMNLP, pages 404–41 1 . Fellbaum, Derek Gross, and Katherine Miller. 1990. WordNet: An on-line lexical database. International Journal of Lexicography, 3:235–244. Thuy Nguyen and Min-Yen Kan. 2007. Keyphrase extraction in scientific publications. In Proceedings of George A. Miller, Richard Beckwith, Christiane the 10th International Conference on Asian Digital Li- braries, pages 3 17–326. Lan Nie, Brian D. Davison, and Xiaoguang Qi. 2006. Topical link analysis for web search. In Proceedings of SIGIR, pages 91–98. P. Over, W. Liggett, H. Gilbert, A. Sakharov, and M. Thatcher. 2001. Introduction to duc-2001 : An intrinsic evaluation of generic news text summarization systems. In Proceedings of DUC2001. L. Page, S. Brin, R. Motwani, and T. Winograd. 1998. The pagerank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project, 1998. Peter D. Turney. 1999. Learning to extract keyphrases from text. National Research Council Canada, Institute for Information Technology, Technical Report ERB-1057. Peter D. Turney. 2000. Learning algorithms for keyphrase extraction. Information Retrieval, 2(4):303–336. E.M. Voorhees. 2000. The trec-8 question answering track report. In Proceedings of TREC, pages 77–82. Xiaojun Wan and Jianguo Xiao. 2008a. Collabrank: Towards a collaborative approach to single-document keyphrase extraction. In Proceedings of COLING, pages 969–976. Xiaojun Wan and Jianguo Xiao. 2008b. Single document keyphrase extraction using neighborhood knowledge. In Proceedings of AAAI, pages 855–860. 376