acl acl2010 acl2010-223 acl2010-223-reference knowledge-graph by maker-knowledge-mining

223 acl-2010-Tackling Sparse Data Issue in Machine Translation Evaluation


Source: pdf

Author: Ondrej Bojar ; Kamil Kos ; David Marecek

Abstract: We illustrate and explain problems of n-grams-based machine translation (MT) metrics (e.g. BLEU) when applied to morphologically rich languages such as Czech. A novel metric SemPOS based on the deep-syntactic representation of the sentence tackles the issue and retains the performance for translation to English as well.


reference text

Ond ˇrej Bojar, David Mare cˇek, V ´aclav Nov a´k, Martin Popel, Jan Pt´ a ˇcek, Jan Rou sˇ, and Zden eˇk Zˇabokrtsk y´. 2009. English-Czech MT in 2008. In Proceedings of the Fourth Workshop on Statistical Machine Translation, Athens, Greece, March. Association for Computational Linguistics. Chris Callison-Burch, Cameron Fordyce, Philipp Koehn, Christof Monz, and Josh Schroeder. 2008. Further meta-evaluation of machine translation. In Proceedings of the Third Workshop on Statistical Machine Translation, pages 70–106, Columbus, Ohio, June. Association for Computational Linguistics. Chris Callison-Burch, Philipp Koehn, Christof Monz, and Josh Schroeder. 2009. Findings of the 2009 workshop on statistical machine translation. In Proceedings of the Fourth Workshop on Statistical Machine Translation, Athens, Greece. Association for Computational Linguistics. Silvie Cinkov´ a, Jan Haji cˇ, Marie Mikulov a´, Lucie Mladov´ a, Anja Nedoluˇ zko, Petr Pajas, Jarmila Panevov a´, Jiˇ r ´ı Semeck´ y, Jana Sˇindlerov a´, Josef Toman, Zde nˇka Ure sˇov a´, and Zden eˇk Zˇabokrtsk y´. 2004. Annotation of English on the tectogrammatical level. Technical Report TR-2006-35, U´FAL/CKL, Prague, Czech Republic, December. 90 Sherri Condon, Gregory A. Sanders, Dan Parvaz, Alan Rubenstein, Christy Doran, John Aberdeen, and Beatrice Oshika. 2009. Normalization for Automated Metrics: English and Arabic Speech Translation. In MT Summit XII. Jes u´s Gim e´nez and Llu ı´s M ´arquez. 2007. Linguistic Features for Automatic Evaluation of Heterogenous MT Systems. In Proceedings of the Second Workshop on Statistical Machine Translation, pages 256–264, Prague, June. Association for Computational Linguistics. Jan Haji cˇ, Silvie Cinkov´ a, Krist ´yna Cˇerm ´akov a´, Lucie Mladov´ a, Anja Nedoluˇ zko, Petr Pajas, Jiˇ r ´ı Semeck y´, Jana Sˇindlerov a´, Josef Toman, Krist ´yna Tom sˇ˚ u, Mat eˇj Korvas, Magdal´ ena Rysov ´a, Kate ˇrina Veselovsk a´, and Zden eˇk 2009. Prague English Dependency Treebank 1.0. Institute of Formal and Applied Linguistics, Charles University in Prague, ISBN 978-80-904175-0-2, January. Zˇabokrtsk y´. Jan Haji cˇ, Jarmila Panevov a´, Eva Haji cˇov a´, Petr Sgall, Petr Pajas, Jan Sˇt eˇp a´nek, Jiˇ r ´ı Havelka, Marie Mikulov a´, Zden ˇek Zˇabokrtsk y´, and Magda Sˇev cˇ ı´kov ´a Raz ı´mov a´. 2006. Prague Dependency Treebank 2.0. LDC2006T01, ISBN: 1-58563-3704. Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ond ˇrej Bojar, Alexandra Constantin, and Evan Herbst. 2007. Moses: Open Source Toolkit for Statistical Machine Translation. In ACL 2007, Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions, pages 177–180, Prague, Czech Republic, June. Association for Computational Linguistics. Kamil Kos and Ond ˇrej Bojar. 2009. Evaluation of Machine Translation Metrics for Czech as the Target Language. Prague Bulletin of Mathematical Linguistics, 92. Kishore Papineni, Salim Roukos, Todd Ward, and WeiJing Zhu. 2002. BLEU: a Method for Automatic Evaluation of Machine Translation. In ACL 2002, Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pages 3 11– 318, Philadelphia, Pennsylvania. M. Przybocki, K. Peterson, and S. Bronsart. 2008. Official results of the NIST 2008 ”Metrics for MAchine TRanslation” Challenge (MetricsMATR08). Zˇabokrtsk y´ Zden eˇk and Ond ˇrej Bojar. 2008. TectoMT, Developer’s Guide. Technical Report TR-2008-39, Institute of Formal and Applied Linguistics, Faculty of Mathematics and Physics, Charles University in Prague, December. Petr Sgall, Eva Haji cˇov a´, and Jarmila Panevov a´. 1986. The Meaning of the Sentence and Its Semantic and Pragmatic Aspects. Academia/Reidel Publishing Company, Prague, Czech Republic/Dordrecht, Netherlands. 91