emnlp emnlp2010 emnlp2010-74 emnlp2010-74-reference knowledge-graph by maker-knowledge-mining

74 emnlp-2010-Learning the Relative Usefulness of Questions in Community QA

Source: pdf

Author: Razvan Bunescu ; Yunfeng Huang

Abstract: We present a machine learning approach for the task of ranking previously answered questions in a question repository with respect to their relevance to a new, unanswered reference question. The ranking model is trained on a collection of question groups manually annotated with a partial order relation reflecting the relative utility of questions inside each group. Based on a set of meaning and structure aware features, the new ranking model is able to substantially outperform more straightforward, unsupervised similarity measures.

reference text

Ricardo Baeza-Yates and Berthier Ribeiro-Neto. 1999. Modern Information Retrieval. ACM Press, New York. Delphine Bernhard and Iryna Gurevych. 2008. Answering learners’ questions by retrieving question paraphrases from social Q&A; sites. In EANL ’08: Proceedings of the Third Workshop on Innovative Use of NLP for Building Educational Applications, pages 44– 52, Morristown, NJ, USA. Association for Computational Linguistics. Razvan Bunescu and Yunfeng Huang. 2010a. Towards a general model of answer typing: Question focus identification. In Proceedings of The 11th International Conference on Intelligent Text Processing and Computational Linguistics (CICLing 2010), RCS Volume, pages 23 1–242. Razvan Bunescu and Yunfeng Huang. 2010b. A utilitydriven approach to question ranking in social QA. In Proceedings of The 23rd International Conference on Computational Linguistics (COLING 2010), pages 125–133. Michael Collins. 1999. Head-driven Statistical Models for Natural Language Parsing. Ph.D. thesis, University of Pennsylvania. William B. Dolan and Chris Brockett. 2005. Automatically constructing a corpus of sentential paraphrases. In Proceedings of the Third International Workshop on Paraphrasing (IWP2005), pages 9–16. Bill Dolan, Chris Quirk, and Chris Brockett. 2004. Unsupervised construction of large paraphrase corpora: Exploiting assively parallel news sources. In Proceedings of The 20th International Conference on Computational Linguistics (COLING’04), page 350. Huizhong Duan, Yunbo Cao, Chin-Yew Lin, and Yong Yu. 2008. Searching questions by identifying question topic and question focus. In Proceedings of ACL-08: HLT, pages 156–164, Columbus, Ohio, June. Ulf Hermjakob, Abdessamad Echihabi, and Daniel Marcu. 2002. Natural language based reformulation resource and web exploitation for question answering. In Proceedings of TREC-2002. Jiwoon Jeon, W. Bruce Croft, and Joon Ho Lee. 2005. Finding similar questions in large question and answer archives. In Proceedings of the 14th ACM international conference on Information and knowledge management (CIKM’05), pages 84–90, New York, NY, USA. ACM. J.J. Jiang and D.W. Conrath. 1997. Semantic similarity based on corpus statistics and lexical taxonomy. In Proceedings of the International Conference on Research in Computational Linguistics, pages 19–33. Valentin Jijkoun and Maarten de Rijke. 2005. Retrieving answers from frequently asked questions pages on the Web. In Proceedings of the 14th ACM international conference on Information and knowledge management (CIKM’05), pages 76–83, New York, NY, USA. ACM. Thorsten Joachims. 2002. Optimizing search engines using clickthrough data. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2002), Edmonton, Canada. Dekang Lin. 1998. An information-theoretic definition of similarity. In Proceedings of the Fifteenth International Conference on Machine Learning (ICML ’98), pages 296–304, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc. Rada Mihalcea, Courtney Corley, and Carlo Strapparava. 2006. Corpus-based and knowledge-based measures of text semantic similarity. In Proceedings of the 21st national conference on Artificial intelligence (AAAI’06), pages 775–780. AAAI Press. Dan I. Moldovan, Marius Pasca, Sanda M. Harabagiu, and Mihai Surdeanu. 2002. Performance issues and error analysis in an open-domain question answering annual meeting on Association for Computational Linguistics, pages 133–138, Morristown, NJ, USA. Association for Computational Linguistics. Shiqi Zhao, Ming Zhou, and Ting Liu. 2007. Learning question paraphrases for QA from Encarta logs. In Proceedings of the 20th international joint conference on Artifical intelligence (IJCAI’07), pages 1795–1800, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc. system. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pages 33–40, Philadelphia, PA, July. John M. Prager. 2006. Open-domain questionanswering. Foundations and Trends in Information Retrieval, 1(2):91–23 1. Philip Resnik. 1995. Using information content to evaluate semantic similarity in a taxonomy. In IJCAI’95: Proceedings of the 14th international joint conference on Artificial intelligence, pages 448–453, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc. Noriko Tomuro. 2003. Interrogative reformulation patterns and acquisition of question paraphrases. In Proceedings of the Second International Workshop on Paraphrasing, pages 33–40, Morristown, NJ, USA. Association for Computational Linguistics. Zhibiao Wu and Martha Palmer. 1994. Verbs semantics and lexical selection. In Proceedings of the 32nd 107