emnlp emnlp2010 emnlp2010-89 emnlp2010-89-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Chang Liu ; Daniel Dahlmeier ; Hwee Tou Ng
Abstract: We present PEM, the first fully automatic metric to evaluate the quality of paraphrases, and consequently, that of paraphrase generation systems. Our metric is based on three criteria: adequacy, fluency, and lexical dissimilarity. The key component in our metric is a robust and shallow semantic similarity measure based on pivot language N-grams that allows us to approximate adequacy independently of lexical similarity. Human evaluation shows that PEM achieves high correlation with human judgments.
C. Bannard and C. Callison-Burch. 2005. Paraphrasing with bilingual parallel corpora. In Proc. of ACL. R. Barzilay and L. Lee. 2003. Learning to paraphrase: An unsupervised approach using multiple-sequence alignment. In Proc. of HLT-NAACL. J. Blatz, E. Fitzgerald, G. Foster, S. Gandrabur, C. Goutte, A. Kulesza, A. Sanchis, and N. Ueffing. 2003. Confidence estimation for machine translation. Technical report, CLSP Workshop Johns Hopkins University. C. Callison-Burch, P. Koehn, and M. Osborne. 2006. Improved statistical machine translation using paraphrases. In Proc. of HLT-NAACL. C. Callison-Burch, T. Cohn, and M. Lapata. 2008. Para- Metric: An automatic evaluation metric for paraphrasing. In Proc. of COLING. C. Callison-Burch, P. Koehn, C. Monz, and J. Schroeder. 2009. Findings of the 2009 Workshop on Statistical Machine Translation. In Proceedings of WMT. Y.S. Chan and H.T. Ng. 2008. MAXSIM: A maximum similarity metric for machine translation evaluation. In Proc. of ACL-08: HLT. D. Das and N.A. Smith. 2009. Paraphrase identification as probabilistic quasi-synchronous recognition. In Proc. of ACL-IJCNLP. P. Duboue and J. Chu-Carroll. 2006. Answering the question you wish they had asked: The impact of paraphrasing for question answering. In Proc. of HLTNAACL Companion Volume: Short Papers. C. Fellbaum, editor. 1998. WordNet: An electronic lexical database. MIT Press, Cambridge, MA. A. Haghighi, J. Blitzer, J. DeNero, and D. Klein. 2009. Better word alignments with supervised ITG models. In Proc. of ACL-IJCNLP. M. Heilman and N.A. Smith. 2010. Tree edit models for recognizing textual entailments, paraphrases, and answers to questions. In Proc. of NAACL. E. Hovy, C.Y. Lin, L. Zhou, and J. Fukumoto. 2006. Automated summarization evaluation with basic elements. In Proc. of LREC. T. Joachims. 1999. Making large-scale SVM learning practical. In B. Schölkopf, C. Burges, and A. Smola, editors, Advances in Kernel Methods - Support Vector Learning. MIT Press. D. Kauchak and R. Barzilay. 2006. Paraphrasing for automatic evaluation. In Proc. of HLT-NAACL. P. Koehn, F.J. Och, and D. Marcu. 2003. Statistical phrase-based translation. In Proc. of HLT-NAACL. P. Koehn. 2005. Europarl: A parallel corpus for statistical machine translation. In MT Summit, volume 5. P. Liang, B. Taskar, and D. Klein. 2006. Alignment by agreement. In Proc. of HLT-NAACL. C.Y. Lin. 2004. ROUGE: A package for automatic evaluation of summaries. In Proc. of the ACL-04 Workshop on Text Summarization Branches Out. C. Liu, D. Dahlmeier, and H.T. Ng. 2010. TESLA: translation evaluation of sentences with linearprogramming-based analysis. In Proc. of WMT. J.K. Low, H.T. Ng, and W. Guo. 2005. A maximum entropy approach to Chinese word segmentation. In Proc. of the 4th SIGHAN Workshop. N. Madnani, N.F. Ayan, P. Resnik, and B.J. Dorr. 2007. Using paraphrases for parameter tuning in statistical machine translation. In Proc. of WMT. N. Madnani, P. Resnik, B.J. Dorr, and R. Schwartz. 2008. Are multiple reference translations necessary? Investigating the value of paraphrased reference translations in parameter optimization. In Proc. of AMTA. F.J. Och and H. Ney. 2003. A systematic comparison of various statistical alignment models. Computational Linguistics, 29(1). K. Owczarzak, D. Groves, J. Van Genabith, and A. Way. 2006. Contextual bitext-derived paraphrases in automatic MT evaluation. In Proc. of WMT. B. Pang, K. Knight, and D. Marcu. 2003. Syntax-based alignment of multiple translations: Extracting paraphrases and generating new sentences. In Proc. of HLT-NAACL. K. Papineni, S. Roukos, T. Ward, and W.J. Zhu. 2002. BLEU: a method for automatic evaluation of machine translation. In Proc. of ACL. M. Porter. 1980. An algorithm for suffix stripping. Program, 40(3). M. Przybocki, K. Peterson, S. Bronsart, and G Sanders. 2009. Evaluating machine translation with LFG dependencies. Machine Translation, 23(2). L. Qiu, M.Y. Kan, and T.S. Chua. 2006. Paraphrase recognition via dissimilarity significance classification. In Proc. of EMNLP. C. Quirk, C. Brockett, and W. Dolan. 2004. Monolingual machine translation for paraphrase generation. In Proc. of EMNLP. M. Snover, N. Madnani, B. Dorr, and R. Schwartz. 2009. Fluency, adequacy, or HTER? Exploring different human judgments with a tunable MT metric. In Proc. of WMT. A. Stolcke. 2002. SRILM - an extensible language mod- eling toolkit. In Proc. of ICSLP. S. Wan, M. Dras, R. Dale, and C Paris. 2006. Using dependency-based features to take the ’para-farce’ out of paraphrase. In Proc. of ALTW 2006. 932 H. Wu and M. Zhou. 2003. Synonymous collocation extraction using translation information. In Proc. of ACL. S.Q. Zhao, C. Niu, M. Zhou, T. Liu, and S. Li. 2008. Combining multiple resources to improve SMT-based paraphrasing model. In Proc. of ACL-08: HLT. S.Q. Zhao, X. Lan, T. Liu, and S. Li. 2009. Applicationdriven statistical paraphrase generation. In Proc. of ACL-IJCNLP. L. Zhou, C.Y. Lin, and E. Hovy. 2006a. Re-evaluating machine translation results with paraphrase support. In Proc. of EMNLP. L. Zhou, C.Y. Lin, D.S. Munteanu, and E. Hovy. 2006b. ParaEval: Using paraphrases to evaluate summaries automatically. In Proc. of HLT-NAACL.