emnlp emnlp2011 emnlp2011-66 emnlp2011-66-reference knowledge-graph by maker-knowledge-mining

66 emnlp-2011-Hierarchical Phrase-based Translation Representations


Source: pdf

Author: Gonzalo Iglesias ; Cyril Allauzen ; William Byrne ; Adria de Gispert ; Michael Riley

Abstract: This paper compares several translation representations for a synchronous context-free grammar parse including CFGs/hypergraphs, finite-state automata (FSA), and pushdown automata (PDA). The representation choice is shown to determine the form and complexity of target LM intersection and shortest-path algorithms that follow. Intersection, shortest path, FSA expansion and RTN replacement algorithms are presented for PDAs. Chinese-toEnglish translation experiments using HiFST and HiPDT, FSA and PDA-based decoders, are presented using admissible (or exact) search, possible for HiFST with compact SCFG rulesets and HiPDT with compact LMs. For large rulesets with large LMs, we introduce a two-pass search strategy which we then analyze in terms of search errors and translation performance.


reference text

Alfred V. Aho and Jeffrey D. Ullman. 1972. The Theory of Parsing, Translation and Compiling, volume 1-2. Prentice-Hall. Cyril Allauzen and Michael Riley, 2011. Pushdown Transducers. http://pdt.openfst.org. Cyril Allauzen, Michael Riley, Johan Schalkwyk, Wojciech Skut, and Mehryar Mohri. 2007. OpenFst: A general and efficient weighted finite-state transducer library. In Proceedings of CIAA, pages 11–23. http://www.openfst.org. Cyril Allauzen, Michael Riley, and Johan Schalkwyk. 2011. Filters for efficient composition of weighted finite-state transducers. In Proceedings of CIAA, volume 6482 of LNCS, pages 28–38. Springer. Y. Bar-Hillel, M. Perles, and E. Shamir. 1964. On formal properties of simple phrase structure grammars. In Y. Bar-Hillel, editor, Language and Information: Selected Essays on their Theory and Application, pages 116–150. Addison-Wesley. Jean Berstel. 1979. Transductions and Context-Free Languages. Teubner. Thorsten Brants, Ashok C. Popat, Peng Xu, Franz J. Och, and Jeffrey Dean. 2007. Large language models in machine translation. In Proceedings of EMNLP-ACL, pages 858–867. Ciprian Chelba, Thorsten Brants, Will Neveitt, and Peng Xu. 2010. Study on interaction between entropy pruning and kneser-ney smoothing. In Proceedings of Interspeech, pages 2242–2245. David Chiang. 2007. Hierarchical phrase-based translation. Computational Linguistics, 33(2):201–228. Adri a` de Gispert, Gonzalo Iglesias, Graeme Blackwood, Eduardo R. Banga, and William Byrne. 2010. Hierarchical phrase-based translation with weighted finite state transducers and shallow-n grammars. Computational Linguistics, 36(3). Yonggang Deng and William Byrne. 2008. HMM word and phrase alignment for statistical machine transla- tion. Audio, Speech, and Language Processing, 16(3):494–507. Manfred Drosde, Werner Kuick, and Heiko Vogler, editors. 2009. Handbook of Weighted Automata. Springer. John Hershberger, Subhash Suri, and Amit Bhosle. 2003. On the difficulty of some shortest path problems. In Proceedings of STACS, volume 2607 of LNCS, pages 343–354. Springer. Liang Huang and David Chiang. 2007. Forest rescoring: Faster decoding with integrated language models. In Proceedings of ACL, pages 144–15 1. IEEE Transactions Liang Huang and Haitao Mi. 2010. Efficient incremental decoding for tree-to-string translation. In Proceedings of EMNLP, pages 273–283. Liang Huang, Hao Zhang, and Daniel Gildea. 2005. Machine translation as lexicalized parsing with hooks. In Proceedings of the Ninth International Workshop on Parsing Technology, Parsing ’05, pages 65–73, Stroudsburg, PA, USA. Association for Computational Linguistics. Liang Huang. 2008. Advanced dynamic programming in semiring and hypergraph frameworks. In Proceedings of COLING, pages 1–18. Gonzalo Iglesias, Adri a` de Gispert, Eduardo R. Banga, and William Byrne. 2009a. Hierarchical phrase-based translation with weighted finite state transducers. In Proceedings of NAACL-HLT, pages 433–441 . Gonzalo Iglesias, Adri a` de Gispert, Eduardo R. Banga, and William Byrne. 2009b. Rule filtering by pattern for efficient hierarchical translation. In Proceedings of EACL, pages 380–388. Terry Koo, Alexander M. Rush, Michael Collins, Tommi Jaakkola, and David Sontag. 2010. Dual decomposition for parsing with non-projective head automata. In Proceedings of EMNLP, pages 1288–1298. Werner Kuich and Arto Salomaa. 1986. Semirings, automata, languages. Springer. Shankar Kumar, Yonggang Deng, and William Byrne. 2006. A weighted finite state transducer translation template model for statistical machine translation. Natural Language Engineering, 12(1):35–75. Andrej Ljolje, Fernando Pereira, and Michael Riley. 1999. Efficient general lattice generation and rescoring. In Proceedings of Eurospeech, pages 125 1–1254. Mehryar Mohri. 2009. Weighted automata algorithms. In Drosde et al. (Drosde et al., 2009), chapter 6, pages 213–254. on lagrangian relaxation. In Proceedings of ACL-HLT, pages 72–82. Andreas Stolcke. 1998. Entropy-based pruning of backoff language models. In Proceedings of DARPA Broadcast News Transcription and Understanding Workshop, pages 270–274. Mark-Jan Nederhof and Giorgio Satta. 2003. Probabilistic parsing as intersection. In Proceedings of 8th International Workshop on Parsing Technologies, pages 137–148. Franz J. Och. 2003. Minimum error rate training in statistical machine translation. In Proceedings of ACL, pages 160–167. Ion Petre and Arto Salomaa. 2009. Algebraic systems and pushdown automata. In Drosde et al. (Drosde et al., 2009), chapter 7, pages 257–289. R. Prasad, K. Krstovski, F. Choi, S. Saleem, P. Natarajan, M. Decerbo, and D. Stallard. 2007. Real-time speechto-speech translation for pdas. In Proceedings of IEEE International Conference on Portable Information Devices, pages 1–5. Alexander M. Rush and Michael Collins. 2011. Exact decoding of syntactic translation models through 1383