emnlp emnlp2012 emnlp2012-39 emnlp2012-39-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Atsushi Fujita ; Pierre Isabelle ; Roland Kuhn
Abstract: This paper presents a paraphrase acquisition method that uncovers and exploits generalities underlying paraphrases: paraphrase patterns are first induced and then used to collect novel instances. Unlike existing methods, ours uses both bilingual parallel and monolingual corpora. While the former are regarded as a source of high-quality seed paraphrases, the latter are searched for paraphrases that match patterns learned from the seed paraphrases. We show how one can use monolingual corpora, which are far more numerous and larger than bilingual corpora, to obtain paraphrases that rival in quality those derived directly from bilingual corpora. In our experiments, the number of paraphrase pairs obtained in this way from monolingual corpora was a large multiple of the number of seed paraphrases. Human evaluation through a paraphrase substitution test demonstrated that the newly acquired paraphrase pairs are ofreasonable quality. Remaining noise can be further reduced by filtering seed paraphrases.
Colin Bannard and Chris Callison-Burch. 2005. Paraphrasing with bilingual parallel corpora. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL), pages 597–604. Regina Barzilay and Kathleen R. McKeown. 2001. Extracting paraphrases from a parallel corpus. In Proceedings ofthe 39thAnnualMeeting ofthe Association for Computational Linguistics (ACL), pages 50–57. Regina Barzilay and Lillian Lee. 2003. Learning to paraphrase: An unsupervised approach using multiplesequence alignment. In Proceedings of the 2003 Human Language Technology Conference and the North American Chapter of the Association for Computational Linguistics (HLT-NAACL), pages 16–23. Rahul Bhagat and Deepak Ravichandran. 2008. Large scale acquisition of paraphrases for learning surface patterns. In Proceedings ofthe 46thAnnual Meeting of the Association for Computational Linguistics (ACL), pages 161–170. Chris Callison-Burch, Philipp Koehn, and Miles Osborne. 2006. Improved statistical machine translation using paraphrases. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL), pages 17–24. Chris Callison-Burch. 2008. Syntactic constraints on paraphrases extracted from parallel corpora. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 196–205. Tsz Ping Chan, Chris Callison-Burch, and Benjamin VanDurme. 2011. Reranking bilingually extracted paraphrases using monolingual distributional similarity. In Proceedings of the Workshop on Geometrical Models of Natual Language Semantics (GEMS), pages 33–42. David L. Chen and William B. Dolan. 2011. Collecting highly parallel data for paraphrase evaluation. In Proceedings ofthe 49thAnnualMeeting ofthe Association for Computational Linguistics (ACL), pages 190–200. Jacob Cohen. 1960. A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1):37–46. Trevor Cohn and Mirella Lapata. 2008. Sentence compression beyond word deletion. In Proceedings of the 22nd International Conference on Computational Linguistics (COLING), pages 137–144. Michael Denkowski and Alon Lavie. 2011. Meteor 1.3: Automatic metric for reliable optimization and evaluation of machine translation systems. In Proceedings of the 6th Workshop on Statistical Machine Translation (WMT), pages 85–91. 641 Bill Dolan, Chris Quirk, and Chris Brockett. 2004. Unsupervised construction of large paraphrase corpora: Exploiting massively parallel news sources. In Proceedings of the 20th International Conference on Computational Linguistics (COLING), pages 350– 356. Jinhua Du, Jie Jiang, and Andy Way. 2010. Facilitating translation using source language paraphrase lattices. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 420–429. Atsushi Fujii, Masao Utiyama, Mikio Yamamoto, Takehito Utsuro, Terumasa Ehara, Hiroshi Echizen-ya, and Sayori Shimohata. 2010. Overview of the patent translation task at the NTCIR-8 workshop. In Proceedings of NTCIR-8 Workshop Meeting, pages 371– 376. Atsushi Fujita, Shuhei Kato, Naoki Kato, and Satoshi Sato. 2007. A compositional approach toward dynamic phrasal thesaurus. In Proceedings of the ACLPASCAL Workshop on Textual Entailment and Paraphrasing (WTEP), pages 151–158. Juri Ganitkevitch, Chris Callison-Burch, Courtney Napoles, and Benjamin Van Durme. 2011. Learning sentential paraphrases from bilingual parallel corpora for text-to-text generation. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1168–1 179. Zellig Harris. 1968. Mathematical Structures of Language. John Wiley & Sons. Chikara Hashimoto, Kentaro Torisawa, Stijn De Saeger, Jun’ichi Kazama, and Sadao Kurohashi. 2011. Extracting paraphrases from definition sentences on the Web. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics (ACL), pages 1087–1097. Christian Jacquemin. 1999. Syntagmatic and paradigmatic representations of term variation. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics (ACL), pages 341–348. Howard Johnson, Joel Martin, George Foster, and Roland Kuhn. 2007. Improving translation quality by discarding most of the phrasetable. In Proceedings of the 2007 Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pages 967– 975. Maurice Kendall. 1938. A new measure of rank correlation. Biometrika, 30(1-2):8 1–93. Philipp Koehn, Franz Josef Och, and Daniel Marcu. 2003. Statistical phrase-based translation. In Proceedings of the 2003 Human Language Technology Conference and the North American Chapter of the Association for Computational Linguistics (HLT-NAACL), pages 48–54. Philipp Koehn. 2009. Statistical Machine Translation. Cambridge University Press. Stanley Kok and Chris Brockett. 2010. Hitting the right paraphrases in good time. In Proceedings of Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT), pages 145– 153. Dekang Lin and Patrick Pantel. 2001 . Discovery ofinference rules for question answering. Natural Language Engineering, 7(4):343–360. Nitin Madnani and Bonnie J. Dorr. 2010. Generating phrasal and sentential paraphrases: A survey of data-driven methods. Computational Linguistics, 36(3):341–387. Yuval Marton, Chris Callison-Burch, and Philip Resnik. 2009. Improved statistical machine translation using monolingually-derived paraphrases. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 38 1–390. Yuval Marton, Ahmed El Kholy, and Nizar Habash. 2011. Filtering antonymous, trend-contrasting, and polarity-dissimilar distributional paraphrases for improving statistical machine translation. In Proceedings ofthe 6th Workshop on StatisticalMachine Translation (WMT), pages 237–249. Aur e´lien Max. 2010. Example-based paraphrasing for improved phrase-based statistical machine translation. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 656–666. Marius Pas ¸ca and P e´ter Dienes. 2005. Aligning needles in a haystack: Paraphrase acquisition across the Web. In Proceedings of the 2nd International Joint Conference on Natural Language Processing (IJCNLP), pages 119–130. Bo Pang, Kevin Knight, and Daniel Marcu. 2003. Syntax-based alignment of multiple translations: Extracting paraphrases and generating new sentences. In Proceedings of the 2003 Human Language Technology Conference and the North American Chapter of the Association for Computational Linguistics (HLTNAACL), pages 102–109. Fatiha Sadat, Howard Johnson, Akakpo Agbago, George Foster, Roland Kuhn, Joel Martin, and Aaron Tikuisis. 2005. PORTAGE: A phrase-based machine translation system. In Proceedings of the ACL Workshop on Building and Using Parallel Texts, pages 129–132. Yusuke Shinyama, Satoshi Sekine, Kiyoshi Sudo, and Ralph Grishman. 2002. Automatic paraphrase acquisition from news articles. In Proceedings of the 2002 Human Language Technology Conference (HLT). 642 Idan Szpektor and Ido Dagan. 2008. Learning entailment rules for unary templates. In Proceedings of the 22nd International Conference on Computational Linguistics (COLING). 849-856. Sander Wubben, Antal van den Bosch, Emiel Krahmer, and Erwin Marsi. 2009. Clustering and matching headlines for automatic paraphrase acquisition. In Proceedings of the 12th European Workshop on Natural Language Generation, pages 122–125. Shiqi Zhao, Haifeng Wang, Ting Liu, and Sheng Li. 2009. Extracting paraphrase patterns from bilingual parallel corpora. Natural Language Engineering, 15(4):503–526.