acl acl2013 acl2013-272 acl2013-272-reference knowledge-graph by maker-knowledge-mining

272 acl-2013-Paraphrase-Driven Learning for Open Question Answering


Source: pdf

Author: Anthony Fader ; Luke Zettlemoyer ; Oren Etzioni

Abstract: We study question answering as a machine learning problem, and induce a function that maps open-domain questions to queries over a database of web extractions. Given a large, community-authored, question-paraphrase corpus, we demonstrate that it is possible to learn a semantic lexicon and linear ranking function without manually annotating questions. Our approach automatically generalizes a seed lexicon and includes a scalable, parallelized perceptron parameter estimation scheme. Experiments show that our approach more than quadruples the recall of the seed lexicon, with only an 8% loss in precision.


reference text

Michele Banko, Michael J. Cafarella, Stephen Soderland, Matthew Broadhead, and Oren Etzioni. 2007. Open Information Extraction from the Web. In Proceedings of the 20th international joint conference on Artifical intelligence. Colin Bannard and Chris Callison-Burch. 2005. Paraphrasing with Bilingual Parallel Corpora. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics. Regina Barzilay and Lillian Lee. 2003. Learning to Paraphrase: An Unsupervised Approach Using Multiple-Sequence Alignment. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics. Regina Barzilay and Kathleen R. McKeown. 2001 . Extracting Paraphrases from a Parallel Corpus. In Proceedings of the 39th Annual Meeting on Associ- ation for Computational Linguistics. Eric Brill, Susan Dumais, and Michele Banko. 2002. An Analysis of the AskMSR Question-Answering System. In Proceedings of Empirical Methods in Natural Language Processing. Qingqing Cai and Alexander Yates. 2013. Large-scale Semantic Parsing via Schema Matching and Lexicon Extension. In Proceedings of the Annual Meeting of the Association for Computational Linguistics. Chris Callison-Burch. 2008. Syntactic Constraints on Paraphrases Extracted from Parallel Corpora. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing. James Clarke, Dan Goldwasser, Ming-Wei Chang, and Dan Roth. 2010. Driving Semantic Parsing from the World’s Response. In Proceedings of the Fourteenth Conference on Computational Natural Language Learning. Michael Collins. 2002. Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Anthony Fader, Stephen Soderland, and Oren Etzioni. 2011. Identifying Relations for Open Information Extraction. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing. Qin Gao and Stephan Vogel. 2008. Parallel Implementations of Word Alignment Tool. In Proc. of the ACL 2008 Software Engineering, Testing, and Quality Assurance Workshop. Barbara J. Grosz, Douglas E. Appelt, Paul A. Martin, and Fernando C. N. Pereira. 1987. TEAM: An Experiment in the Design of Transportable Natural-Language Interfaces. Artificial Intelligence, 32(2): 173–243. Raphael Hoffmann, Congle Zhang, and Daniel S. Weld. 2010. Learning 5000 relational extractors. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Raphael Hoffmann, Congle Zhang, Xiao Ling, Luke Zettlemoyer, and Daniel S. Weld. 2011. Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics. Boris Katz. 1997. Annotating the World Wide Web using Natural Language. In RIAO, pages 136–159. Philipp Koehn, Franz Josef Och, and Daniel Marcu. 2003. Statistical Phrase-Based Translation. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics. Cody Kwok, Oren Etzioni, and Daniel S. Weld. 2001 . Scaling Question Answering to the Web. ACM Trans. Inf. Syst., 19(3):242–262. Percy Liang, Michael Jordan, and Dan Klein. 2011. Learning Dependency-Based Compositional Semantics. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics. Yuval Marton, Chris Callison-Burch, and Philip Resnik. 2009. Improved Statistical Machine Translation Using Monolingually-Derived Paraphrases. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. Ryan McDonald, Keith Hall, and Gideon Mann. 2010. Distributed training strategies for the structured perceptron. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Mike Mintz, Steven Bills, Rion Snow, and Daniel Ju- rafsky. 2009. Distant Supervision for Relation Extraction Without Labeled Data. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL. Franz Josef Och and Hermann Ney. 2000. Improved Statistical Alignment Models. In Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics. Ana-Maria Popescu, Alex Armanasu, Oren Etzioni, David Ko, and Alexander Yates. 2004. Modern Natural Language Interfaces to Databases: Composing Statistical Parsing with Semantic Tractability. In Proceedings of the Twentieth International Conference on Computational Linguistics. Chris Quirk, Chris Brockett, and William Dolan. 2004. Monolingual Machine Translation for Paraphrase Generation. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. 1617 Sebastian Riedel, Limin Yao, and Andrew McCallum. 2010. Modeling Relations and Their Mentions with- out Labeled Text. In Proceedings of the pean conference on Machine 2010 Euro- learning and Knowl- edge Discovery in Databases. Lappoon R. Tang and Raymond J. Mooney. 2001. Us- ing Multiple Clause Constructors in Inductive Logic Programming for Semantic Parsing. Christina Unger, Axel-Cyrille Lorenz Ngonga Philipp Cimiano. Answering B ¨uhmann, Ngomo, 2012. Jens Lehmann, Daniel Gerber, Template-Based over RDF Data. In Proceedings 21st World Wide Web Conference 2012. and Question of the Mohamed Yahya, Klaus Berberich, Shady Elbassuoni, Maya Ramanath, Volker Tresp, and Gerhard Weikum. 2012. Natural Language Questions for the Web of Data. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Limin Yao, Sebastian Riedel, and Andrew McCallum. 2012. Unsupervised Relation Discovery with Sense Disambiguation. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics. John M. Zelle and Raymond J. Mooney. 1996. Learning to Parse Database Queries Using Inductive Logic Programming. In Proceedings of the Thirteenth National Conference on Artificial Intelligence. Luke S. Zettlemoyer and Michael Collins. 2005. Learning to Map Sentences to Logical Form: Structured Classification with Probabilistic Categorial Grammars. In Proceedings of the 21st Conference in Uncertainty in Artificial Intelligence. 1618