acl acl2013 acl2013-169 acl2013-169-reference knowledge-graph by maker-knowledge-mining

169 acl-2013-Generating Synthetic Comparable Questions for News Articles

Source: pdf

Author: Oleg Rokhlenko ; Idan Szpektor

Abstract: We introduce the novel task of automatically generating questions that are relevant to a text but do not appear in it. One motivating example of its application is for increasing user engagement around news articles by suggesting relevant comparable questions, such as “is Beyonce a better singer than Madonna?”, for the user to answer. We present the first algorithm for the task, which consists of: (a) offline construction of a comparable question template database; (b) ranking of relevant templates to a given article; and (c) instantiation of templates only with entities in the article whose comparison under the template’s relation makes sense. We tested the suggestions generated by our algorithm via a Mechanical Turk experiment, which showed a significant improvement over the strongest baseline of more than 45% in all metrics.

reference text

Manish Agarwal, Rakshit Shah, and Prashanth Mannem. 2011. Automatic question generation using discourse cues. In Proceedings of the 6th Workshop on Innovative Use of NLP for Building Educational Applications, IUNLPBEA ’ 11, pages 1–9, Stroudsburg, PA, USA. Association for Computational Linguistics. David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent dirichlet allocation. J. Mach. Learn. Res., 3:993–1022, March. Jonathan C. Brown, Gwen A. Frishkoff, and Maxine Eskenazi. 2005. Automatic question generation for vocabulary assessment. In Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, HLT ’05, pages 819–826, Stroudsburg, PA, USA. Association for Computational Linguistics. Li Cai, Guangyou Zhou, Kang Liu, and Jun Zhao. 2011. Learning the latent topics for question retrieval in community qa. In Proceedings of 5th International Joint Conference on Natural Language Processing, pages 273–281, Chiang Mai, Thailand, November. Asian Federation of Natural Language Processing. Asli Celikyilmaz, Dilek Hakkani-Tur, and Gokhan Tur. 2010. Lda based similarity modeling for question answering. In Proceedings of the NAACL HLT 2010 Workshop on Semantic Search, SS ’ 10, pages 1–9, Stroudsburg, PA, USA. Association for Computa- tional Linguistics. Chih-Chung Chang and Chih-Jen Lin. 2011. Libsvm: A library for support vector machines. ACM TIST, 2(3):27. Aron Culotta, Andrew McCallum, and Jonathan Betz. 2006. Integrating probabilistic extraction models and data mining to discover relations and patterns in text. In Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, HLT-NAACL ’06, pages 296– 303, Stroudsburg, PA, USA. Association for Computational Linguistics. Katrin Erk. 2007. A simple, similarity-based model for selectional preferences. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pages 216–223, Prague, Czech Republic, June. Association for Computational Linguistics. Tianyong Hao and Eugene Agichtein. 2012. Finding similar questions in collaborative question answering archives: toward bootstrapping-based equivalent pattern learning. Inf. Retr. , 15(3-4):332–353, June. Michael Heilman and Noah A. Smith. 2010. Good question! statistical ranking for question generation. 750 In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, HLT ’ 10, pages 609–617, Stroudsburg, PA, USA. Association for Computational Linguistics. Jiwoon Jeon, W. Bruce Croft, and Joon Ho Lee. 2005. Finding similar questions in large question and answer archives. In Proceedings of the 14th ACM international conference on Information and knowledge management, CIKM ’05, pages 84–90, New York, NY, USA. ACM. Nitin Jindal and Bing Liu. 2006. Mining comparative sentences and relations. In proceedings of the 21st national conference on Artificial intelligence - Volume 2, AAAI’06, pages 133 1–1336. AAAI Press. John Lafferty, Andrew McCallum, and Fernando CN Pereira. 2001 . Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of ICML. Shasha Li, Chin-Yew Lin, Young-In Song, and Zhoujun Li. 2010. Comparable entity mining from comparative questions. In Proceedings of the 48th An- nual Meeting of the Association for Computational Linguistics, ACL ’ 10, pages 650–658. Association for Computational Linguistics. Marc Light and Warren R. Greiff. 2002. Statistical models for the induction and use of selectional preferences. Cognitive Science, 26(3):269–281 . Tom M. Mitchell. 1997. Machine learning. McGraw Hill series in computer science. McGraw-Hill. Ruslan Mitkov, Le An Ha, and Nikiforos Karamanis. 2006. A computer-aided environment for generating multiple-choice test items. Nat. Lang. Eng., 12(2): 177–194, June. Raymond J. Mooney and Razvan Bunescu. 2005. Mining knowledge from text using information extraction. SIGKDD Explor. Newsl., 7(1):3–10, June. Niko Myller. 2007. Automatic generation of prediction questions during program visualization. Electron. Notes Theor. Comput. Sci., 178:43–49, July. A.M. Olney, A.C. Graesser, and N.K. Person. 2012. Question generation from concept maps. Dialogue & Discourse, 3(2):75–99. D. Pollard. 2001 . A User’s Guide to Measure Theoretic Probability. Cambridge University Press. F. Provost. 2000. Machine learning from imbalanced data sets 101. Proceedings of the AAAI-2000 Workshop on Imbalanced Data Sets. Alan Ritter, Mausam, and Oren Etzioni. 2010. A latent dirichlet allocation method for selectional preferences. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL ’ 10, pages 424–434, Stroudsburg, PA, USA. Association for Computational Linguistics. Vasile Rus, Brendan Wyse, Paul Piwek, Mihai C. Lintean, Svetlana Stoyanchev, and Cristian Moldovan. 2010. The first question generation shared task evaluation challenge. In John D. Kelleher, Brian Mac Namee, Ielka van der Sluis, Anja Belz, Albert Gatt, and Alexander Koller, editors, INLG 2010 - Proceedings of the Sixth International Natural Language Generation Conference, July 7-9, 2010, Trim, Co. Meath, Ireland. The Association for Computer Linguistics. Anne Schuth, Maarten Marx, and Maarten de Rijke. 2007. Extracting the discussion structure in comments on news-articles. In Proceedings of the 9th annual ACM international workshop on Web information and data management, WIDM ’07, pages 97–104, New York, NY, USA. ACM. Frank Wilcoxon. 1945. Individual comparisons by ranking methods. Biometrics Bulletin, 1:80–83. 751