emnlp emnlp2011 emnlp2011-6 emnlp2011-6-reference knowledge-graph by maker-knowledge-mining

6 emnlp-2011-A Generate and Rank Approach to Sentence Paraphrasing

Source: pdf

Author: Prodromos Malakasiotis ; Ion Androutsopoulos

Abstract: We present a method that paraphrases a given sentence by first generating candidate paraphrases and then ranking (or classifying) them. The candidates are generated by applying existing paraphrasing rules extracted from parallel corpora. The ranking component considers not only the overall quality of the rules that produced each candidate, but also the extent to which they preserve grammaticality and meaning in the particular context of the input sentence, as well as the degree to which the candidate differs from the input. We experimented with both a Maximum Entropy classifier and an SVR ranker. Experimental results show that incorporating features from an existing paraphrase recognizer in the ranking component improves performance, and that our overall method compares well against a state of the art paraphrase generator, when paraphrasing rules apply to the input sentences. We also propose a new methodology to evaluate the ranking components of generate-and-rank paraphrase generators, which evaluates them across different combinations of weights for grammaticality, meaning preservation, and diversity. The paper is accompanied by a paraphrasing dataset we constructed for evaluations of this kind.

reference text

I. Androutsopoulos and P. Malakasiotis. 2010. A survey of paraphrasing and textual entailment methods. Journal of Artificial Intelligence Research, 38: 135–187. C. Bannard and C. Callison-Burch. 2005. Paraphrasing with bilingual parallel corpora. In Proc. of the 43rd ACL, pages 597–604, Ann Arbor, MI. C. Callison-Burch, P. Koehn, and M. Osborne. 2006. Improved statistical machine translation using paraphrases. In Proc. of HLT-NAACL, pages 17–24, New York, NY. C. Callison-Burch. 2008. Syntactic constraints on paraphrases extracted from parallel corpora. In Proc. of EMNLP, pages 196–205, Honolulu, HI, October. J. Carletta. 1996. Assessing agreement on classification tasks: The kappa statistic. Computational Linguistics, 22:249–254. J. Clarke and M. Lapata. 2008. Global inference for sentence compression: An integer linear programming approach. Journal of Artificial Intelligence Research, 1(31):399–429. M. Collins and T. Koo. 2005. Discriminative reranking for natural language parsing. Computational Linguistics, 31(1):25–69. N. Cristianini and J. Shawe-Taylor. 2000. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. Cambridge University Press. I. Dagan, B. Dolan, B. Magnini, and D. Roth. 2009. Recognizing textual entailment: Rational, evaluation and approaches. Natural Lang. Engineering, 15(4):i–xvii. Editorial of the special issue on Textual Entailment. P. A. Duboue and J. Chu-Carroll. 2006. Answering the question you wish they had asked: The impact of paraphrasing for question answering. In Proc. of HLTNAACL, pages 33–36, New York, NY. Ronald A. Fisher. 1925. Statistical Methods for Research Workers. Oliver and Boyd. T. Joachims. 2002. Learning to Classify Text Using Support Vector Machines: Methods, Theory, Algorithms. Kluwer. D. Kauchak and R. Barzilay. 2006. Paraphrasing for automatic evaluation. In Proc. of HLT-NAACL, pages 455–462, New York, NY. 21Consult http : / /www . i . forth .gr/ indigo / . cs 105 K. Knight and D. Marcu. 2002. Summarization beyond sentence extraction: A probalistic approach to sentence compression. Artif. Intelligence, 139(1):91– 107. P. Koehn. 2009. Statistical Machine Translation. Cambridge University Press. S. Kok and C. Brockett. 2010. Hitting the right paraphrases in good time. In Proc. of HLT-NAACL, pages 145–153, Los Angeles, CA. Y. Lepage and E. Denoual. 2005. Automatic generation of paraphrases to be used as translation references in objective evaluation measures of machine translation. In Proc. of the 3rd Int. Workshop on Paraphrasing, pages 57–64, Jesu Island, Korea. N. Madnani and B.J. Dorr. 2010. Generating phrasal and sentential paraphrases: A survey of data-driven methods. Computational Linguistics, 36(3):341–387. N. Madnani, F. Ayan, P. Resnik, and B. J. Dorr. 2007. Using paraphrases for parameter tuning in statistical machine translation. In Proc. of 2nd Workshop on Statistical Machine Translation, pages 120–127, Prague, Czech Republic. P. Malakasiotis. 2009. Paraphrase recognition using machine learning to combine similarity measures. In Proc. of the Student Research Workshop of ACLAFNLP, Singapore. P. Malakasiotis. 2011. Paraphrase and Textual Entailment Recognition and Generation. Ph.D. thesis, Department of Informatics, Athens University of Economics and Business, Greece. Y. Marton, C. Callison-Burch, and P. Resnik. 2009. Improved statistical machine translation using monolingually-derived paraphrases. In Proc. of EMNLP, pages 381–390, Singapore. S. Mirkin, L. Specia, N. Cancedda, I. Dagan, M. Dymetman, and I. Szpektor. 2009. Source-language entailment modeling for translating unknown terms. In Proc. of ACL-AFNLP, pages 791–799, Singapore. R. Nelken and S. M. Shieber. 2006. Towards robust context-sensitive sentence alignment for monolingual corpora. In Proc. of the 11th EACL, pages 161–168, Trento, Italy. S. Pad o´, M. Galley, D. Jurafsky, and C. D. Manning. 2009. Robust machine translation evaluation with entailment features. In Proc. of ACL-AFNLP, pages 297– 305, Singapore. K. Papineni, S. Roukos, T. Ward, and W. J. Zhu. 2002. BLEU: a method for automatic evaluation of machine translation. In Proc. of the 40th ACL, pages 3 11–3 18, Philadelphia, PA. C. Quirk, C. Brockett, and W. B. Dolan. 2004. Monolingual machine translation for paraphrase generation. In Proc. of the Conf. on EMNLP, pages 142–149, Barcelona, Spain. S. Riezler and Y. Liu. 2010. Query rewriting using monolingual statistical machine translation. Computational Linguistics, 36(3):569–582. S. Riezler, A. Vasserman, I. Tsochantaridis, V. Mittal, and Y. Liu. 2007. Statistical machine translation for query expansion in answer retrieval. In Proc. of the 45th ACL, pages 464–471, Prague, Czech Republic. I. Szpektor, I. Dagan, R. Bar-Haim, and J. Goldberger. 2008. Contextual preferences. In Proc. of ACL-HLT, pages 683–691, Columbus, OH. V. Vapnik. 1998. Statistical learning theory. John Wiley. S. Zhao, H. Wang, T. Liu, and S. Li. 2008. Pivot approach for extracting paraphrase patterns from bilingual corpora. In Proc. of ACL-HLT, pages 780–788, Columbus, OH. S. Zhao, X. Lan, T. Liu, and S. Li. 2009a. Applicationdriven statistical paraphrase generation. In Proc. of ACL-AFNLP, pages 834–842, Singapore. S. Zhao, H. Wang, T. Liu, and Li. S. 2009b. Extracting paraphrase patterns from bilingual parallel corpora. Natural Language Engineering, 15(4):503–526. S. Zhao, H. Wang, X. Lan, and T. Liu. 2010. Leveraging multiple MT engines for paraphrase generation. In Proceedings of the 23rd COLING, pages 1326–1334, Beijing, China. L. Zhou, C.-Y. Lin, and Eduard Hovy. 2006. Reevaluating machine translation results with paraphrase support. In Proc. of the Conf. on EMNLP, pages 77–84. 106