emnlp emnlp2012 emnlp2012-135 emnlp2012-135-reference knowledge-graph by maker-knowledge-mining

135 emnlp-2012-Using Discourse Information for Paraphrase Extraction

Source: pdf

Author: Michaela Regneri ; Rui Wang

Abstract: Previous work on paraphrase extraction using parallel or comparable corpora has generally not considered the documents’ discourse structure as a useful information source. We propose a novel method for collecting paraphrases relying on the sequential event order in the discourse, using multiple sequence alignment with a semantic similarity measure. We show that adding discourse information boosts the performance of sentence-level paraphrase acquisition, which consequently gives a tremendous advantage for extracting phraselevel paraphrase fragments from matched sentences. Our system beats an informed baseline by a margin of 50%.

reference text

Amit Bagga and Breck Baldwin. 1999. Cross-document event coreference: annotations, experiments, and observations. In Proceedings of the Workshop on Coreference and its Applications. Colin Bannard and Chris Callison-Burch. 2005. Paraphrasing with bilingual parallel corpora. In Proceedings of ACL 2005. Regina Barzilay and Lillian Lee. 2003. Learning to paraphrase: An unsupervised approach using multiplesequence alignment. In Proc. of HLT-NAACL 2003. Regina Barzilay and Kathleen R. McKeown. 2001. Extracting paraphrases from a parallel corpus. In Proc. of ACL 2001. Regina Barzilay, Kathleen McKeown, and Michael Elhadad. 1999. Information fusion in the context of multi-document summarization. In Proceedings of ACL 1999. W. Byrne, S. Khudanpur, W. Kim, S. Kumar, P. Pecina, P. Virga, P. Xu, and D. Yarowsky. 2003. The Johns Hopkins University 2003 Chinese-English machine translation system. In Proceedings of the MT Summit IX. Chris Callison-Burch. 2008. Syntactic constraints on paraphrases extracted from parallel corpora. In Proceedings of EMNLP 2008. Lynn Carlson, Daniel Marcu, and Mary Ellen Okurowski. 2002. RST Discourse Treebank. LDC. J. Cohen. 1960. A Coefficient ofAgreement for Nominal Scales. Educational and Psychological Measurement, 20(1):37. Ido Dagan, Oren Glickman, and Bernardo Magnini. 2005. The pascal recognising textual entailment challenge. In MLCW, pages 177–190. D. Das and N. A. Smith. 2009. Paraphrase identification as probabilistic quasi-synchronous recognition. In Proceedings of ACL-IJCNLP 2009. Louise Deléger and Pierre Zweigenbaum. 2009. Extracting lay paraphrases of specialized expressions from monolingual comparable medical corpora. In Proceedings of the ACL-IJCNLP BUCC-2009 Workshop. Georgiana Dinu and Rui Wang. 2009. Inference rules and their application to recognizing textual entailment. In Proceedings of EACL 2009. W. B. Dolan and C. Brockett. 2005. Automatically constructing a corpus of sentential paraphrases. In Proceedings of the third International Workshop on Paraphrasing. Bill Dolan, Chris Quirk, and Chris Brockett. 2004. Unsupervised construction of large paraphrase corpora: Exploiting massively parallel news sources. In Proceedings of COLING 2004. 926 Richard Durbin, Sean Eddy, Anders Krogh, and Graeme Mitchison. 1998. Biological Sequence Analysis. Cambridge University Press. Eugene S Edgington. 1986. Randomization tests. Marcel Dekker, Inc., New York, NY, USA. Samuel Fern and Mark Stevenson. 2009. A semantic similarity approach to paraphrase detection. In Proceedings of the Computational Linguistics UK (CLUK 2008) 11th Annual Research Colloquium. Juri Ganitkevitch, Chris Callison-Burch, Courtney Napoles, and Benjamin Van Durme. 2011. Learning sentential paraphrases from bilingual parallel corpora for text-to-text generation. In Proceedings of EMNLP 2011. Dan Gillick. 2009. Sentence boundary detection and the problem with the u.s. In Proceedings of HLT-NAACL 2009: Companion Volume: Short Papers. Michael Heilman and Noah A. Smith. 2010. Tree edit models for recognizing textual entailments, paraphrases, and answers to questions. In Proceedings of NAACL-HLT 2010. Dan Klein and Christopher D. Manning. 2003. Accurate unlexicalized parsing. In Proceedings of ACL 2003. Heeyoung Lee, Yves Peirsman, Angel Chang, Nathanael Chambers, Mihai Surdeanu, and Dan Jurafsky. 2011. Stanford’s multi-pass sieve coreference resolution system at the conll-201 1 shared task. In Proceedings of the CoNLL-2011 Shared Task. Dekang Lin and Patrick Pantel. 2001. DIRT - Discovery of Inference Rules from Text. In Proceedings of the ACM SIGKDD. Yuval Marton, Chris Callison-Burch, and Philip Resnik. 2009. Improved Statistical Machine Translation Using Monolingually-Derived Paraphrases. In Proceedings of EMNLP 2009. Dragos Stefan Munteanu and Daniel Marcu. 2006. Extracting Parallel Sub-Sentential Fragments from NonParallel Corpora. In Proceedings of ACL 2006. Saul B. Needleman and Christian D. Wunsch. 1970. A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of molecular biology, 48(3), March. Andreas Noack. 2007. Energy models for graph clustering. Journal of Graph Algorithms and Applications, 11(2):453–480. Franz Josef Och and Hermann Ney. 2003. A systematic comparison of various statistical alignment models. Computational Linguistics, 29(1). Kishore Papineni, Salim Roukos, Todd Ward, and WeiJing Zhu. 2002. BLEU: a method for automatic evaluation of machine translation. In Proceedings of ACL 2002. Chris Quirk, Chris Brockett, and William B. Dolan. 2004. Monolingual machine translation for paraphrase generation. In Proceedings of EMNLP 2004. Chris Quirk, Raghavendra Udupa, and Arul Menezes. 2007. Generative models of noisy translations with applications to parallel fragment extraction. In Proceedings of MT Summit XI, Copenhagen, Denmark. Michaela Regneri, Alexander Koller, and Manfred Pinkal. 2010. Learning Script Knowledge with Web Experiments. In Proceedings of ACL 2010. Yusuke Shinyama and Satoshi Sekine. 2003. Paraphrase acquisition for information extraction. In Proceedings of the ACL PARAPHRASE ’03 Workshop. Yusuke Shinyama, Satoshi Sekine, and Kiyoshi Sudo. 2002. Automatic paraphrase acquisition from news articles. In Proceedings of HLT 2002. Idan Szpektor, Hristo Tanev, Ido Dagan, and Bonaventura Coppola. 2004. Scaling Web-based Acquisition of Entailment Relations. In Proceedings of EMNLP 2004. Stefan Thater, Hagen Fürstenau, and Manfred Pinkal. 2011. Word Meaning in Context: A Simple and Effective Vector Model. In Proceedings of IJCNLP 2011. Eleftheria Tomadaki and Andrew Salway. 2005. Matching verb attributes for cross-document event coreference. In Proc. of the Interdisciplinary Workshop on the Identification and Representation of Verb Features and Verb Classes. Rui Wang and Chris Callison-Burch. 2011. Paraphrase fragment extraction from monolingual comparable corpora. In Proc. of the ACL BUCC-2011 Workshop. Shiqi Zhao, Haifeng Wang, Ting Liu, and Sheng Li. 2008. Pivot Approach for Extracting Paraphrase Patterns from Bilingual Corpora. In Proceedings of ACL 2008. Shiqi Zhao, Haifeng Wang, Xiang Lan, and Ting Liu. 2010. Leveraging Multiple MT Engines for Paraphrase Generation. In Proceedings of COLING 2010. 927