acl acl2011 acl2011-132 acl2011-132-reference knowledge-graph by maker-knowledge-mining

132 acl-2011-Extracting Paraphrases from Definition Sentences on the Web

Source: pdf

Author: Chikara Hashimoto ; Kentaro Torisawa ; Stijn De Saeger ; Jun'ichi Kazama ; Sadao Kurohashi

Abstract: ¶ kuro@i . We propose an automatic method of extracting paraphrases from definition sentences, which are also automatically acquired from the Web. We observe that a huge number of concepts are defined in Web documents, and that the sentences that define the same concept tend to convey mostly the same information using different expressions and thus contain many paraphrases. We show that a large number of paraphrases can be automatically extracted with high precision by regarding the sentences that define the same concept as parallel corpora. Experimental results indicated that with our method it was possible to extract about 300,000 paraphrases from 6 Web docu3m0e0n,t0s0 w0i ptha a precision oramte 6 6o ×f a 1b0out 94%. 108

reference text

Susumu Akamine, Daisuke Kawahara, Yoshikiyo Kato, Tetsuji Nakagawa, Yutaka I. Leon-Suematsu, Takuya Kawada, Kentaro Inui, Sadao Kurohashi, and Yutaka Kidawara. 2010. Organizing information on the web to support user judgments on information credibility. In Proceedings of 2010 4th International Universal Communication Symposium Proceedings (IUCS 2010), pages 122–129. Ion Androutsopoulos and Prodromos Malakasiotis. 2010. A survey of paraphrasing and textual entailment methods. Journal of Artificial Intelligence Research, 38: 135–187. Colin Bannard and Chris Callison-Burch. 2005. Paraphrasing with bilingual parallel corpora. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL-2005), pages 597– 604. Regina Barzilay and Lillian Lee. 2003. Learning to paraphrase: An unsupervised approach using multiplesequence alignment. In Proceedings of HLT-NAACL 2003, pages 16–23. Regina Barzilay and Kathleen R. McKeown. 2001. Extracting paraphrases from a parallel corpus. In Proceedings of the 39th Annual Meeting of the ACL joint with the 10th Meeting of the European Chapter of the ACL (ACL/EACL 2001), pages 50–57. Rahul Bhagat, Patrick Pantel, and Eduard Hovy. 2007. Ledir: An unsupervised algorithm for learning directionality of inference rules. In Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP2007), pages 161–170. Chris Callison-Burch, Philipp Koehn, and Miles Osborne. 2006. Improved statistical machine translation using paraphrases. In Proceedings of the 2006 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL 2006), pages 17–24. Bill Dolan, Chris Quirk, and Chris Brockett. 2004. Unsupervised construction of large paraphrase corpora: exploiting massively parallel news sources. In Pro- ceedings of the 20th international conference on Computational Linguistics (COLING 2004), pages 350– 356. Joseph L. Fleiss. 1971. Measuring nominal scale agreement among many raters. Psychological Bulletin, 76(5):378–382. Atsushi Fujii and Tetsuya Ishikawa. 2002. Extraction and organization of encyclopedic knowledge information using the World Wide Web (written in Japanese). Institute of Electronics, Information, and Communication Engineers, J85-D-II(2):300–307. 1096 Maayan Geffet and Ido Dagan. 2005. The distributional inclusion hypotheses and lexical entailment. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL 2005), pages 107–1 14. Chikara Hashimoto, Kentaro Torisawa, Kow Kuroda, Stijn De Saeger, Masaki Murata, and Jun’ichi Kazama. 2009. Large-scale verb entailment acquisition from the web. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing (EMNLP 2009), pages 1172–1 181. Lidija Iordanskaja, Richard Kittredge, and Alain Polgu `ere. 1991. Lexical selection and paraphrase in a meaning-text generation model. In C ´ecile L. Paris, William R. Swartout, and William C. Mann, editors, Natural language generation in artificial intelligence and computational linguistics, pages 293–3 12. Kluwer Academic Press. David Kauchak and Regina Barzilay. 2006. Paraphrasing for automatic evaluation. In Proceedings of the 2006 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL 2006), pages 455–462. Jun’ichi Kazama and Kentaro Torisawa. 2007. Exploiting Wikipedia as external knowledge for named entity recognition. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL 2007), pages 698–707, June. Jun’ichi Kazama and Kentaro Torisawa. 2008. Inducing gazetteers for named entity recognition by large-scale clustering of dependency relations. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-08: HLT), pages 407–415. Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ond ˇrej Bojar, Alexandra Constantin, and Evan Herbst. 2007. Moses: Open Source Toolkit for Statistical Machine Translation. In Proceedings of the 45th Annual Meeting of the Associa- tion for Computational Linguistics (ACL 2007), pages 177–180. J. Richard Landis and Gary G. Koch. 1977. The measurement of observer agreement for categorical data. Biometrics, 33(1): 159–174. Dekang Lin and Patrick Pantel. 2001. Discovery ofinference rules for question answering. Natural Language Engineering, 7(4):343–360. Bill MacCartney, Michel Galley, and Christopher D. Manning. 2008. A phrase-based alignment model for natural language inference. In Proceedings ofthe 2008 Conference on Empirical Methods in Natural Language Processing (EMNLP-2008), pages 802–81 1. Nitin Madnani and Bonnie Dorr. 2010. Generating phrasal and sentential paraphrases: A survey of datadriven methods. Computational Linguistics, 36(3). Kathleen R. McKeown, Regina Barzilay, David Evans, Vasileios Hatzivassiloglou, Judith L. Klavans, Ani Nenkova, Carl Sable, Barry Schiffman, and Sergey Sigelman. 2002. Tracking and summarizing news on a daily basis with columbia’s newsblaster. In Proceedings of the 2nd international conference on Human Language Technology Research, pages 280–285. Masaki Murata, Toshiyuki Kanemaru, and Hitoshi Isahara. 2004. Automatic paraphrase acquisition based on matching of definition sentences in plural dictionaries (written in Japanese). Journal ofNatural Language Processing, 11(5): 135–149. Roberto Navigli and Paola Velardi. 2010. Learning word-class lattices for definition and hypernym extraction. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL 2010), pages 1318–1327. Chris Quirk, Chris Brockett, and William Dolan. 2004. Monolingual machine translation for paraphrase generation. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing (EMNLP-2004), pages 142–149. Deepak Ravichandran and Eduard H. Hovy. 2002. Learning surface text patterns for a question answering system. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL 2002), pages 41–47. Satoshi Sekine. 2005. Automatic paraphrase discovery based on context and keywords between ne pairs. In Proceedings of the Third International Workshop on Paraphrasing (IWP-2005), pages 80–87. Yusuke Shinyama, Satoshi Sekine, and Kiyoshi Sudo. 2002. Automatic paraphrase acquisition from news articles. In Proceedings of the 2nd international Con- ference on Human Language Technology Research (HLT2002), pages 313–318. Idan Szpektor and Ido Dagan. 2008. Learning entailment rules for unary template. In Proceedings of the 22nd International Conference on Computational Linguistics (COLING2008), pages 849–856. Idan Szpektor, Eyal Shnarch, and Ido Dagan. 2007. Instance-based evaluation of entailment rule acquisition. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics (ACL 2007), pages 456–463. 1097