acl acl2011 acl2011-16 acl2011-16-reference knowledge-graph by maker-knowledge-mining

16 acl-2011-A Joint Sequence Translation Model with Integrated Reordering


Source: pdf

Author: Nadir Durrani ; Helmut Schmid ; Alexander Fraser

Abstract: We present a novel machine translation model which models translation by a linear sequence of operations. In contrast to the “N-gram” model, this sequence includes not only translation but also reordering operations. Key ideas of our model are (i) a new reordering approach which better restricts the position to which a word or phrase can be moved, and is able to handle short and long distance reorderings in a unified way, and (ii) a joint sequence model for the translation and reordering probabilities which is more flexible than standard phrase-based MT. We observe statistically significant improvements in BLEU over Moses for German-to-English and Spanish-to-English tasks, and comparable results for a French-to-English task.


reference text

Peter F. Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, and R. L. Mercer. 1993. The mathematics of statistical machine translation: parameter estimation. Computational Linguistics, 19(2):263–31 1. David Chiang. 2007. Hierarchical phrase-based translation. Computational Linguistics, 33(2):201–228. Josep Maria Crego and Franois Yvon. 2009. Gappy translation units under left-to-right smt decoding. In Proceedings of the meeting of the European Association for Machine Translation (EAMT), pages 66–73, Barcelona, Spain. Josep Maria Crego and Fran ¸cois Yvon. 2010. Improving reordering with linguistically informed bilingual n-grams. In Coling 2010: Posters, pages 197–205, Beijing, China, August. Coling 2010 Organizing Committee. Josep M. Crego, Marta R. Costa-juss, Jos B. Mario, and Jos A. R. Fonollosa. 2005a. Ngram-based versus phrasebased statistical machine translation. In In Proceedings of the International Workshop on Spoken Language Technology (IWSLT05, pages 177–184. Josep M. Crego, Jos e´ B. Mari nˆo, and Adri a` de Gispert. 2005b. Reordered search and unfolding tuples for ngram-based SMT. In Proceedings of the 10th Machine Translation Summit (MT Summit X), pages 283– 289, Phuket, Thailand. Michel Galley and Christopher D. Manning. 2010. Accurate non-hierarchical phrase-based translation. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 966– 974, Los Angeles, California, June. Association for Computational Linguistics. Philipp Koehn and Barry Haddow. 2009. Edinburgh’s submission to all tracks of the WMT 2009 shared task with reordering and speed improvements to Moses. In Proceedings of the Fourth Workshop on Statistical Machine Translation, pages 160–164, Athens, Greece, March. Association for Computational Linguistics. Philipp Koehn, Franz J. Och, and Daniel Marcu. 2003. Statistical phrase-based translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics Conference, pages 127–133, Edmonton, Canada. Philipp Koehn, Amittai Axelrod, Alexandra Birch Mayne, Chris Callison-Burch, Miles Osborne, and David Talbot. 2005. Edinburgh system description for the 2005 iwslt speech translation evaluation. In International Workshop on Spoken Language Translation 2005. Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, 1054 Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondrej Bojar, Alexandra Constantin, and Evan Herbst. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, Demonstration Program, Prague, Czech Republic. Philipp Koehn. 2004a. Pharaoh: A beam search decoder for phrase-based statistical machine translation models. In AMTA, pages 115–124. Philipp Koehn. 2004b. Statistical significance tests for machine translation evaluation. In Dekang Lin and Dekai Wu, editors, Proceedings of EMNLP 2004, pages 388–395, Barcelona, Spain, July. Association for Computational Linguistics. Zhifei Li, Chris Callison-burch, Chris Dyer, Juri Ganitkevitch, Sanjeev Khudanpur, Lane Schwartz, Wren N. G. Thornton, Jonathan Weese, and Omar F. Zaidan. 2009. Joshua: An open source toolkit for parsing-based machine translation. J.B. Mari n˜o, R.E. Banchs, J.M. Crego, A. de Gispert, P. Lambert, J.A.R. Fonollosa, and M.R. Costa-juss a`. 2006. N-gram-based machine translation. Computational Linguistics, 32(4):527–549. I. Dan Melamed. 2004. Statistical machine translation by parsing. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics, Barcelona, Spain. Franz J. Och and Hermann Ney. 2003. A systematic comparison of various statistical alignment models. Computational Linguistics, 29(1): 19–5 1. Franz J. Och and Hermann Ney. 2004. The alignment template approach to statistical machine translation. Computational Linguistics, 30(1):417–449. Kishore Papineni, Salim Roukos, Todd Ward, and WeiJing Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL ’02, pages 3 11–3 18, Morristown, NJ, USA. Association for Computational Linguistics. Andreas Stolcke. 2002. SRILM - an extensible language modeling toolkit. In Intl. Conf. Spoken Language Processing, Denver, Colorado. Omar F. Zaidan. 2009. Z-MERT: A fully configurable open source tool for minimum error rate training of machine translation systems. The Prague Bulletin of Mathematical Linguistics, 91:79–88.