emnlp emnlp2013 emnlp2013-4 emnlp2013-4-reference knowledge-graph by maker-knowledge-mining

4 emnlp-2013-A Dataset for Research on Short-Text Conversations

Source: pdf

Author: Hao Wang ; Zhengdong Lu ; Hang Li ; Enhong Chen

Abstract: Natural language conversation is widely regarded as a highly difficult problem, which is usually attacked with either rule-based or learning-based models. In this paper we propose a retrieval-based automatic response model for short-text conversation, to exploit the vast amount of short conversation instances available on social media. For this purpose we introduce a dataset of short-text conversation based on the real-world instances from Sina Weibo (a popular Chinese microblog service), which will be soon released to public. This dataset provides rich collection of instances for the research on finding natural and relevant short responses to a given short text, and useful for both training and testing of conversation models. This dataset consists of both naturally formed conversations, manually labeled data, and a large repository of candidate responses. Our preliminary experiments demonstrate that the simple retrieval-based conversation model performs reasonably well when combined with the rich instances in our dataset.

reference text

Peter F. Brown, Vincent J. Della Pietra, Stephen A. Della Pietra, and Robert L. Mercer. 1993. The mathematics of statistical machine translation: parameter estimation. Comput. Linguist., 19(2). Rollo Carpenter. 1997. Cleverbot. Sina Jafarpour and Christopher J. C. Burges. 2010. Filter, rank, and transfer the knowledge: Learning to chat. Thorsten Joachims. 2002. Optimizing search engines using clickthrough data. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’02, pages 133– 142, New York, NY, USA. ACM. Anton Leuski and David R. Traum. 2011. Npceditor: Creating virtual human dialogue using information retrieval techniques. AIMagazine, 32(2):42–56. Diane Litman, Satinder Singh, Michael Kearns, and Marilyn Walker. 2000. Njfun: a reinforcement learning spoken dialogue system. In Proceedings of the 2000 ANLP/NAACL Workshop on Conversational systems Volume 3, ANLP/NAACL-ConvSyst ’00, pages 17– 20, Stroudsburg, PA, USA. Association for Computational Linguistics. Christopher D. Manning, Prabhakar Raghavan, and Hinrich Sch u¨tze. 2008. Introduction to Information Retrieval. Cambridge University Press, New York, NY, USA. Teruhisa Misu, Kallirroi Georgila, Anton Leuski, and David Traum. 2012. Reinforcement learning of question-answering dialogue policies for virtual museum guides. In Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse logue, SIGDIAL ’ 12, pages 84–93. and Dia- Alan Ritter, Colin Cherry, and William B. Dolan. 2011. Data-driven response generation in social media. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP ’ 11, pages 583–593, Stroudsburg, PA, USA. Association for Computational Linguistics. Jost Schatzmann, Karl Weilhammer, Matt Stuttle, and Steve Young. 2006. A survey of statistical user simulation techniques for reinforcement-learning of dialogue management strategies. Knowl. Eng. Rev., pages 97–126. Ellen M Voorhees. 2002. The philosophy of information retrieval evaluation. In Evaluation of crosslanguage information retrieval systems, pages 355– 370. Springer. Jason D. Williams and Steve Young. 2007. Partially observable markov decision processes for spoken dialog systems. Comput. Speech Lang., 21(2):393–422. Wei Wu, Zhengdong Lu, and Hang Li. 2013. Learning bilinear model for matching queries and documents. Journal of Machine Learning Research (2013 to appear). Hua-Ping Zhang, Hong-Kui Yu, De-Yi Xiong, and Qun Liu. 2003. Hhmm-based chinese lexical analyzer ictclas. SIGHAN ’03. 945