acl acl2013 acl2013-126 acl2013-126-reference knowledge-graph by maker-knowledge-mining

126 acl-2013-Diverse Keyword Extraction from Conversations

Source: pdf

Author: Maryam Habibi ; Andrei Popescu-Belis

Abstract: A new method for keyword extraction from conversations is introduced, which preserves the diversity of topics that are mentioned. Inspired from summarization, the method maximizes the coverage of topics that are recognized automatically in transcripts of conversation fragments. The method is evaluated on excerpts of the Fisher and AMI corpora, using a crowdsourcing platform to elicit comparative relevance judgments. The results demonstrate that the method outperforms two competitive baselines.

reference text

David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet Allocation. Journal of Machine Learning Research, 3:993–1022. Jonathan Boyd-Graber, Jordan Chang, Sean Gerrish, Chong Wang, and David Blei. 2009. Reading tea leaves: How humans interpret topic models. In Proceedings of the 23rd Annual Conference on Neural Information Processing Systems (NIPS). Jean Carletta. 2007. Unleashing the killer corpus: Experiences in creating the multi-everything AMI Meeting Corpus. Language Resources and Evaluation Journal, 41(2): 181–190. Christopher Cieri, David Miller, and Kevin Walker. 2004. The Fisher Corpus: a resource for the next generations of speech-to-text. In Proceedings of 4th International Conference on Language Resources and Evaluation (LREC), pages 69–71 . Charles L. A. Clarke, Maheedhar Kolla, Gordon V. Cormack, Olga Vechtomova, Azin Ashkan, Stefan B ¨uttcher, and Ian MacKinnon. 2008. Novelty and diversity in information retrieval evaluation. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, pages 659–666. Andras Csomai and Rada Mihalcea. 2007. Linking educational materials to encyclopedic knowledge. Frontiers in Artificial Intelligence and Applications, 158:557. Eibe Frank, Gordon W. Paynter, Ian H. Witten, Carl Gutwin, and Craig G. Nevill-Manning. 1999. Domain-specific keyphrase extraction. In Proceedings of the 16th International Joint Conference on Artificial Intelligence (IJCAI 1999), pages 668–673, Stockholm, Sweden. Maryam Habibi and Andrei Popescu-Belis. 2012. Using crowdsourcing to compare document recommendation strategies for conversations. In Workshop on Recommendation Utility Evaluation: Beyond RMSE (RUE 2011), page 15. David Harwath and Timothy J. Hazen. 2012. Topic identification based extrinsic evaluation of summa- rization techniques applied to conversational speech. In Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 5073–5076. IEEE. Matthew D. Hoffman, David M. Blei, and Francis Bach. 2010. Online learning for Latent Dirichlet Allocation. Proceedings of 24th Annual Conference on Neural Information Processing Systems, 23:856– 864. Anette Hulth. 2003. Improved automatic keyword extraction given more linguistic knowledge. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2003), pages 216–223, Sapporo, Japan. Jingxuan Li, Lei Li, and Tao Li. 2012. Multidocument summarization via submodularity. Applied Intelligence, 37(3):420–430. Hui Lin and Jeff Bilmes. 2011. A class of submodular functions for document summarization. In Proceedings of the 49th Annual Meeting of the ACL. Feifan Liu, Deana Pennell, Fei Liu, and Yang Liu. 2009a. Unsupervised approaches for automatic keyword extraction using meeting transcripts. In Proceedings of the 2009 Annual Conference of the North American Chapter of the ACL (HLT-NAACL), pages 620–628. Zhiyuan Liu, Wenyi Huang, Yabin Zheng, and Maosong Sun. 2009b. Clustering to find exemplar terms for keyphrase extraction. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing (EMNLP 2009), pages 257–266. Zhiyuan Liu, Wenyi Huang, Yabin Zheng, and Maosong Sun. 2010. Automatic keyphrase extraction via topic decomposition. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (EMNLP 2010), pages 366– 376. Hans Peter Luhn. 1957. A statistical approach to mechanized encoding and searching of literary information. IBM Journal of Research and Development, 1(4):309–317. Yutaka Matsuo and Mitsuru Ishizuka. 2004. Keyword extraction from a single document using word co-occurrence statistical information. International Journal on Artificial Intelligence Tools, 13(1): 157– 169. Andrew K. McCallum. A machine learning 2002. MALLET: for language toolkit. http://mallet.cs.umass.edu. Rada Mihalcea and Paul Tarau. 2004. TextRank: Bringing order into texts. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2004), pages 404–41 1, Barcelona. George L. Nemhauser, Laurence A. Wolsey, and Marshall L. Fisher. 1978. An analysis of approximations for maximizing submodular set functions. Mathematical Programming Journal, 14(1):265– 294. Ani Nenkova and Kathleen McKeown, 2012. A Survey of Text Summarization Techniques, chapter 3, pages 43–76. Springer. Gerard Salton and Christopher Buckley. 1988. Termweighting approaches in automatic text retrieval. Information Processing and Management Journal, 24(5):513–523. 656 Yang, and Clement T. Yu. 1975. A theory of term importance in automatic text analysis. Journal of the American Society for Information Science, 26(1):33–44. Gerard Salton, Chung-Shu Peter Turney. from text. 1999. Learning to extract keyphrases Technical Report ERB-1057, National Research Council Canada (NRC). Jianyi Liu, and Cong Wang. 2007. Keyword extraction based on PageRank. In Ad- Jinghua Wang, vances in Knowledge (Proceedings 857–864. Discovery and Data Mining of PAKDD 2007), LNAI Springer-Verlag, Berlin. 4426, pages Shiren Ye, Tat-Seng Chua, Min-Yen Kan, and Long Qiu. 2007. Document concept lattice for text understanding and summarization. Information Processing and Management, 43(6): 1643–1662. 657