acl acl2011 acl2011-37 acl2011-37-reference knowledge-graph by maker-knowledge-mining

37 acl-2011-An Empirical Evaluation of Data-Driven Paraphrase Generation Techniques

Source: pdf

Author: Donald Metzler ; Eduard Hovy ; Chunliang Zhang

Abstract: Paraphrase generation is an important task that has received a great deal of interest recently. Proposed data-driven solutions to the problem have ranged from simple approaches that make minimal use of NLP tools to more complex approaches that rely on numerous language-dependent resources. Despite all of the attention, there have been very few direct empirical evaluations comparing the merits of the different approaches. This paper empirically examines the tradeoffs between simple and sophisticated paraphrase harvesting approaches to help shed light on their strengths and weaknesses. Our evaluation reveals that very simple approaches fare surprisingly well and have a number of distinct advantages, including strong precision, good coverage, and low redundancy.

reference text

I. Androutsopoulos and P. Malakasiotis. 2010. A survey of paraphrasing and textual entailment methods. Journal of Artificial Intelligence Research, 38: 135–187. Colin Bannard and Chris Callison-Burch. 2005. Paraphrasing with bilingual parallel corpora. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, ACL ’05, pages 597–604, Morristown, NJ, USA. Association for Computational Linguistics. Ken Barker, Bhalchandra Agashe, Shaw-Yi Chaw, James Fan, Noah Friedland, Michael Glass, Jerry Hobbs, Eduard Hovy, David Israel, Doo Soon Kim, Rutu Mulkar-Mehta, Sourabh Patwardhan, Bruce Porter, Dan Tecuci, and Peter Yeh. 2007. Learning by reading: a prototype system, performance baseline and lessons learned. In Proceedings of the 22nd national conference on Artificial intelligence - Volume 1, pages 280–286. AAAI Press. Regina Barzilay and Lillian Lee. 2003. Learning to paraphrase: an unsupervised approach using multiplesequence alignment. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1, NAACL ’03, pages 16– 23, Morristown, NJ, USA. Association for Computational Linguistics. Regina Barzilay and Kathleen R. McKeown. 2001. Extracting paraphrases from a parallel corpus. In Proceedings of the 39th Annual Meeting on Association for Computational Linguistics, ACL ’01, pages 50–57, Morristown, NJ, USA. Association for Computational Linguistics. Rahul Bhagat and Deepak Ravichandran. 2008. Large scale acquisition of paraphrases for learning surface patterns. In Proceedings of ACL-08: HLT, pages 674– 682, Columbus, Ohio, June. Association for Computational Linguistics. Chris Callison-Burch. 2008. Syntactic constraints on paraphrases extracted from parallel corpora. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP ’08, pages 196–205, Morristown, NJ, USA. Association for Computational Linguistics. Joseph L. Fleiss. 1971 . Measuring Nominal Scale Agreement Among Many Raters. Psychological Bulletin, 76(5):378–382. Stanley Kok and Chris Brockett. 2010. Hitting the right paraphrases in good time. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, HLT ’ 10, pages 145–153, Morris551 town, NJ, USA. Association for Computational Linguistics. J. R. Landis and G. G. Koch. 1977. The measurement of observer agreement for categorical data. Biometrics, 33(1): 159–174, March. Dekang Lin and Patrick Pantel. 2001. Discovery of inference rules for question-answering. Nat. Lang. Eng., 7:343–360, December. Nitin Madnani and Bonnie J. Dorr. 2010. Generating phrasal and sentential paraphrases: A survey of datadriven methods. Comput. Linguist., 36:341–387. Bo Pang, Kevin Knight, and Daniel Marcu. 2003. Syntax-based alignment of multiple translations: extracting paraphrases and generating new sentences. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology Volume 1, NAACL ’03, pages 102–109, Morristown, NJ, USA. Association for Computational Linguistics. Marius Pasca and Pter Dienes. 2005. Aligning needles in a haystack: Paraphrase acquisition across the web. In Robert Dale, Kam-Fai Wong, Jian Su, and Oi Yee Kwong, editors, Natural Language Processing IJCNLP 2005, volume 3651 of Lecture Notes in Computer Science, pages 119–130. Springer Berlin / Heidelberg. Shiqi Zhao, Haifeng Wang, Ting Liu, and Sheng Li. 2009. Extracting paraphrase patterns from bilingual parallel corpora. Natural Language Engineering, 15(Special Issue 04):503–526.