acl acl2010 acl2010-192 acl2010-192-reference knowledge-graph by maker-knowledge-mining

192 acl-2010-Paraphrase Lattice for Statistical Machine Translation

Source: pdf

Author: Takashi Onishi ; Masao Utiyama ; Eiichiro Sumita

Abstract: Lattice decoding in statistical machine translation (SMT) is useful in speech translation and in the translation of German because it can handle input ambiguities such as speech recognition ambiguities and German word segmentation ambiguities. We show that lattice decoding is also useful for handling input variations. Given an input sentence, we build a lattice which represents paraphrases of the input sentence. We call this a paraphrase lattice. Then, we give the paraphrase lattice as an input to the lattice decoder. The decoder selects the best path for decoding. Using these paraphrase lattices as inputs, we obtained significant gains in BLEU scores for IWSLT and Europarl datasets.

reference text

Colin Bannard and Chris Callison-Burch. 2005. Paraphrasing with Bilingual Parallel Corpora. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL), pages 597–604. Nicola Bertoldi, Richard Zens, and Marcello Federico. 2007. Speech translation by confusion network decoding. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pages 1297–1300. Francis Bond, Eric Nichols, Darren Scott Appling, and Michael Paul. 2008. Improving Statistical Machine Translation by Paraphrasing the Training Data. In Proceedings of the International Workshop on Spoken Language Translation (IWSLT), pages 150–157. Chris Callison-Burch, Philipp Koehn, and Miles Osborne. 2006. Improved Statistical Machine Trans- lation Using Paraphrases. In Proceedings of the Human Language Technology conference - North American chapter of the Association for Computational Linguistics (HLT-NAACL), pages 17–24. Chris Dyer. 2009. Using a maximum entropy model to build segmentation lattices for MT. In Proceedings of the Human Language Technology conference - North American chapter of the Association for Computational Linguistics (HLT-NAACL), pages 406–4 14. 5 Cameron S. Fordyce. 2007. Overview of the IWSLT 2007 Evaluation Campaign. In Proceedings of the International Workshop on Spoken Language Translation (IWSLT), pages 1–12. J Howard Johnson, Joel Martin, George Foster, and Roland Kuhn. 2007. Improving Translation Quality by Discarding Most of the Phrasetable. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLPCoNLL), pages 967–975. Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondrej Bojar, Alexan- dra Constantin, and Evan Herbst. 2007. Moses: Open Source Toolkit for Statistical Machine Translation. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL), pages 177–180. Philipp Koehn. 2005. Europarl: A Parallel Corpus for Statistical Machine Translation. In Proceedings of the 10th Machine Translation Summit (MT Summit), pages 79–86. Yuval Marton, Chris Callison-Burch, and Philip Resnik. 2009. Improved Statistical Machine Translation Using Monolingually-Derived Paraphrases. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 381–390. Preslav Nakov. 2008. Improved Statistical Machine Translation Using Monolingual Paraphrases. In Proceedings of the European Conference on Artificial Intelligence (ECAI), pages 338–342. Franz Josef Och. 2003. Minimum Error Rate Training in Statistical Machine Translation. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (ACL), pages 160–167.