emnlp emnlp2013 emnlp2013-71 emnlp2013-71-reference knowledge-graph by maker-knowledge-mining

71 emnlp-2013-Efficient Left-to-Right Hierarchical Phrase-Based Translation with Improved Reordering

Source: pdf

Author: Maryam Siahbani ; Baskaran Sankaran ; Anoop Sarkar

Abstract: Left-to-right (LR) decoding (Watanabe et al., 2006b) is a promising decoding algorithm for hierarchical phrase-based translation (Hiero). It generates the target sentence by extending the hypotheses only on the right edge. LR decoding has complexity O(n2b) for input of n words and beam size b, compared to O(n3) for the CKY algorithm. It requires a single language model (LM) history for each target hypothesis rather than two LM histories per hypothesis as in CKY. In this paper we present an augmented LR decoding algorithm that builds on the original algorithm in (Watanabe et al., 2006b). Unlike that algorithm, using experiments over multiple language pairs we show two new results: our LR decoding algorithm provides demonstrably more efficient decoding than CKY Hiero, four times faster; and by introducing new distortion and reordering features for LR decoding, it maintains the same translation quality (as in BLEU scores) ob- tained phrase-based and CKY Hiero with the same translation model.

reference text

Chris Callison-Burch, Philipp Koehn, Christof Monz, Matt Post, Radu Soricut, and Lucia Specia. 2012. Findings of the 2012 workshop on statistical machine translation. In Proceedings of the Seventh Workshop on Statistical Machine Translation, pages 10– 5 1, Montr ´eal, Canada, June. Association for Computational Linguistics. David Chiang. 2005. A hierarchical phrase-based model for statistical machine translation. In In ACL, pages 263–270. David Chiang. 2007. Hierarchical phrase-based translation. Computational Linguistics, 33. Jonathan H. Clark, Chris Dyer, Alon Lavie, and Noah A. Smith. 2011. Better hypothesis testing for statistical machine translation: controlling for optimizer instability. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2, HLT ’ 11, pages 176–181, Stroudsburg, PA, USA. Association for Computational Linguistics. Jay Earley. 1970. An efficient context-free parsing algorithm. Commun. ACM, 13(2):94–102, February. 1098 Michel Galley and Christopher D. Manning. 2010. Accurate non-hierarchical phrase-based translation. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 966– 974, Los Angeles, California, June. Association for Computational Linguistics. Kenneth Heafield, Hieu Hoang, Philipp Koehn, Tetsuo Kiso, and Marcello Federico. 2011. Left language model state for syntactic machine translation. In Proceedings of the International Workshop on Spoken Language Translation, pages 183–190, San Francisco, California, USA, 12. Kenneth Heafield, Philipp Koehn, and Alon Lavie. 2013. Grouping language model boundary words to speed K- Best extraction from hypergraphs. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Atlanta, Georgia, USA, 6. Kenneth Heafield. 2011. KenLM: Faster and smaller language model queries. In In Proc. of the Sixth Workshop on Statistical Machine Translation. Liang Huang and David Chiang. 2007. Forest rescoring: Faster decoding with integrated language models. In In ACL 07. Liang Huang and Haitao Mi. 2010. Efficient incremental decoding for tree-to-string translation. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pages 273–283, Cambridge, MA, October. Association for Computational Linguistics. Philipp Koehn, Franz Josef Och, and Daniel Marcu. 2003. Statistical phrase-based translation. In Proc. of NAACL. Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ond ˇrej Bojar, Alexandra Constantin, and Evan Herbst. 2007. Moses: open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the ACL on Inter- ’07, Association active Poster and Demonstration Sessions, ACL pages 177–180, Stroudsburg, PA, USA. for Computational Linguistics. Adam Lopez. 2007. Hierarchical phrase-based lation with suffix arrays. In EMNLP-CoNLL, transpages 976–985. Robert C. Moore and John Dowding. 1991. Efficient bottom-up parsing. In HLT. Morgan Kaufmann. Thuylinh Nguyen and Stephan Vogel. 2013. Integrating phrase-based reordering features into chart-based decoder for machine translation. In Proc. of ACL. Franz Josef Och. 2003. Minimum error rate training in statistical machine translation. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1, ACL ’03, pages 160– 167, Stroudsburg, PA, USA. Association for Computational Linguistics. Majid Razmara, Baskaran Sankaran, Ann Clifton, and Anoop Sarkar. 2012. Kriya - the sfu system for translation task at wmt-12. In Proceedings of the Seventh Workshop on Statistical Machine Translation, WMT ’ 12, pages 356–361, Stroudsburg, PA, USA. Association for Computational Linguistics. Baskaran Sankaran, Ajeet Grewal, and Anoop Sarkar. 2010. Incremental decoding for phrase-based statistical machine translation. In Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR, WMT ’ 10, pages 216–223, Stroudsburg, PA, USA. Association for Computational Linguistics. Baskaran Sankaran, Majid Razmara, and Anoop Sarkar. 2012. Kriya - an end-to-end hierarchical phrase-based mt system. The Prague Bulletin of Mathematical Linguistics (PBML), (97):83–98, apr. Taro Watanabe, Jun Suzuki, Hajime Tsukada, and Hideki Isozaki. 2006a. NTT statistical machine translation for iwslt 2006. In Proceedings of IWSLT 2006, pages 95–102. Taro Watanabe, Hajime Tsukada, and Hideki Isozaki. 2006b. Left-to-right target generation for hierarchical phrase-based translation. In Proc. of ACL. Jiajun Zhang and Chenqqing Zong. 2012. A Comparative Study on Discontinuous Phrase Translation. In NLPCC 2012, pages 164–175. 1099