emnlp emnlp2010 emnlp2010-42 emnlp2010-42-reference knowledge-graph by maker-knowledge-mining

42 emnlp-2010-Efficient Incremental Decoding for Tree-to-String Translation

Source: pdf

Author: Liang Huang ; Haitao Mi

Abstract: Syntax-based translation models should in principle be efficient with polynomially-sized search space, but in practice they are often embarassingly slow, partly due to the cost of language model integration. In this paper we borrow from phrase-based decoding the idea to generate a translation incrementally left-to-right, and show that for tree-to-string models, with a clever encoding of derivation history, this method runs in averagecase polynomial-time in theory, and lineartime with beam search in practice (whereas phrase-based decoding is exponential-time in theory and quadratic-time in practice). Experiments show that, with comparable translation quality, our tree-to-string system (in Python) can run more than 30 times faster than the phrase-based system Moses (in C++).

reference text

David Chiang. 2007. Hierarchical phrase-based translation. Computational Linguistics, 33(2):201–208. Chris Dyer and Philip Resnik. 2010. Context-free reordering, finite-state translation. In Proceedings of NAACL. Jay Earley. 1970. An efficient context-free parsing algorithm. Communications of the ACM, 13(2):94–102. Michel Galley and Christopher D. Manning. 2008. A simple and effective hierarchical phrase reordering model. In Proceedings of EMNLP 2008. Michel Galley, Mark Hopkins, Kevin Knight, and Daniel Marcu. 2004. What’s in a translation rule? In Proceedings of HLT-NAACL, pages 273–280. Liang Huang and David Chiang. 2007. Forest rescoring: Fast decoding with integrated language models. In Proceedings of ACL, Prague, Czech Rep., June. Liang Huang, Kevin Knight, and Aravind Joshi. 2006. Statistical syntax-directed translation with extended domain of locality. In Proceedings of AMTA, Boston, MA, August. Liang Huang. 2007. Binarization, synchronous binarization, and target-side binarization. In Proc. NAACL Workshop on Syntax and Structure in Statistical Translation. Kevin Knight. 1999. Decoding complexity in wordreplacement translation models. Computational Linguistics, 25(4):607–615. P. Koehn, H. Hoang, A. Birch, C. Callison-Burch, M. Federico, N. Bertoldi, B. Cowan, W. Shen, C. Moran, R. Zens, C. Dyer, O. Bojar, A. Constantin, and E. Herbst. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings ofACL: demonstration sesion. Philipp Koehn. 2004. Pharaoh: a beam search decoder for phrase-based statistical machine translation models. In Proceedings of AMTA, pages 115–124. Yang Liu, Qun Liu, and Shouxun Lin. 2006. Tree-tostring alignment template for statistical machine translation. In Proceedings of COLING-ACL, pages 609– 616. Haitao Mi, Liang Huang, and Qun Liu. 2008. Forestbased translation. In Proceedings of ACL: HLT, Columbus, OH. Franz Joseph Och. 2003. Minimum error rate training in statistical machine translation. In Proceedings of ACL, pages 160–167. Slav Petrov and Dan Klein. 2007. Improved inference for unlexicalized parsing. In Proceedings of HLTNAACL. Stuart Shieber, Yves Schabes, and Fernando Pereira. 1995. Principles and implementation of deductive parsing. Journal of Logic Programming, 24:3–36. Andreas Stolcke. 2002. Srilm - an extensible language modeling toolkit. In Proceedings ofICSLP, volume 30, pages 901–904. Ashish Venugopal, Andreas Zollmann, and Stephen Vogel. 2007. An efficient two-pass approach to synchronous-CFG driven statistical MT. In Proceedings of HLT-NAACL. Taro Watanabe, Hajime Tsukuda, and Hideki Isozaki. 2006. Left-to-right target generation for hierarchical phrase-based translation. In Proceedings of COLINGACL. Dekai Wu. 1997. Stochastic inversion transduction grammars and bilingual parsing of parallel corpora. Computational Linguistics, 23(3):377–404. Hao Zhang and Daniel Gildea. 2008. Efficient multipass decoding for synchronous context free grammars. In Proceedings of ACL. 283