acl acl2010 acl2010-104 acl2010-104-reference knowledge-graph by maker-knowledge-mining

104 acl-2010-Evaluating Machine Translations Using mNCD


Source: pdf

Author: Marcus Dobrinkat ; Tero Tapiovaara ; Jaakko Vayrynen ; Kimmo Kettunen

Abstract: This paper introduces mNCD, a method for automatic evaluation of machine translations. The measure is based on normalized compression distance (NCD), a general information theoretic measure of string similarity, and flexible word matching provided by stemming and synonyms. The mNCD measure outperforms NCD in system-level correlation to human judgments in English.


reference text

Abhaya Agarwal and Alon Lavie. 2008. METEOR, M-BLEU and M-TER: evaluation metrics for highcorrelation with human rankings of machine translation output. In StatMT ’08: Proceedings of the Third Workshop on Statistical Machine Translation, pages 115–1 18, Morristown, NJ, USA. Association for Computational Linguistics. Satanjeev Banerjee and Alon Lavie. 2005. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, pages 65–72, Ann Arbor, Michigan, June. Association for Computational Linguistics. Chris Callison-Burch, Miles Osborne, and Philipp Koehn. 2006. Re-evaluating the role of BLEU in machine translation research. In Proceedings of EACL-2006, pages 249–256. Chris Callison-Burch, Cameron Fordyce, Philipp Koehn, Christoph Monz, and Josh Schroeder. 2008. Further meta-evalutation of machine translation. ACL Workshop on Statistical Machine Translation. Yee Seng Chan and Hwee Tou Ng. 2009. MaxSim: performance and effects of translation fluency. Ma- chine Translation, 23(2-3):157–168. Rudi Cilibrasi and Paul Vitanyi. 2005. Clustering by compression. IEEE Transactions on Information Theory, 51: 1523–1545. David Kauchak and Regina Barzilay. 2006. Paraphrasing for automatic evaluation. In Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, pages 455–462, Morristown, NJ, USA. Association for Computational Linguistics. Kimmo Kettunen. 2009. Packing it all up in search for a language independent MT quality measure tool. In In Proceedings of LTC-09, 4th Language and Technology Conference, pages 280–284, Poznan. Yanjun Ma, Nicolas Stroppa, and Andy Way. 2007. Bootstrapping word alignment via word packing. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pages 304– 3 11, Prague, Czech Republic, June. Association for Computational Linguistics. K. Papineni, S. Roukos, T. Ward, and W. Zhu. 2001. BLEU: a method for automatic evaluation of machine translation. Technical Report RC22176 (W0109-022), IBM Research Division, Thomas J. Watson Research Center. Steven Parker. 2008. BADGER: A new machine translation metric. In Metrics for Machine Translation Challenge 2008, Waikiki, Hawai’i, October. AMTA. Grazia Russo-Lassner, Jimmy Lin, and Philip Resnik. 2005. A paraphrase-based approach to machine translation evaluation. Technical Report LAMPTR-125/CS-TR-4754/UMIACS-TR-2005-57, University of Maryland, College Park. Ray Solomonoff. 1964. Formal theory of inductive inference. Part I. Information and Control, , 7(1): 1– 22. Jaakko J. V ¨ayrynen, Tero Tapiovaara, tunen, and Marcus Dobrinkat. 2010. compression distance as an automatic tion metric. In Proceedings of MT 25 appear. 85 Kimmo KetNormalized MT evaluayears on. To