emnlp emnlp2010 emnlp2010-99 emnlp2010-99-reference knowledge-graph by maker-knowledge-mining

99 emnlp-2010-Statistical Machine Translation with a Factorized Grammar


Source: pdf

Author: Libin Shen ; Bing Zhang ; Spyros Matsoukas ; Jinxi Xu ; Ralph Weischedel

Abstract: In modern machine translation practice, a statistical phrasal or hierarchical translation system usually relies on a huge set of translation rules extracted from bi-lingual training data. This approach not only results in space and efficiency issues, but also suffers from the sparse data problem. In this paper, we propose to use factorized grammars, an idea widely accepted in the field of linguistic grammar construction, to generalize translation rules, so as to solve these two problems. We designed a method to take advantage of the XTAG English Grammar to facilitate the extraction of factorized rules. We experimented on various setups of low-resource language translation, and showed consistent significant improvement in BLEU over state-ofthe-art string-to-dependency baseline systems with 200K words of bi-lingual training data.


reference text

Satanjeev Banerjee and Alon Lavie. 2005. Meteor: An automatic metric for mt evaluation with improved correlation with human judgments. In Proceedings of the 43th Annual Meeting of the Association for Computational Linguistics (ACL), pages 101–104, Ann Arbor, MI. Xavier Carreras and Michael Collins. 2009. Nonprojective parsing for statistical machine translation. In Proceedings of the 2009 Conference of Empirical Methods in Natural Language Processing, pages 200– 209, Singapore. David Chiang. 2005. A hierarchical phrase-based model for statistical machine translation. In Proceedings of the 43th Annual Meeting of the Association for Computational Linguistics (ACL), pages 263–270, Ann Arbor, MI. Steve DeNeefe and Kevin Knight. 2009. Synchronous tree adjoining machine translation. In Proceedings of the 2009 Conference of Empirical Methods in Natural Language Processing, pages 727–736, Singapore. Jacob Devlin. 2009. Lexical features for statistical machine translation. Master’s thesis, Univ. of Maryland. Christiane Fellbaum, editor. 1998. WordNet: an electronic lexical database. The MIT Press. Aravind K. Joshi and Yves Schabes. 1997. Treeadjoining grammars. In G. Rozenberg and A. Salomaa, editors, Handbook of Formal Languages, volume 3, pages 69–124. Springer-Verlag. Karin Kipper, Anna Korhonen, Neville Ryant, and Martha Palmer. 2006. Extensive classifications of english verbs. In Proceedings of the 12th EURALEX International Congress. P. Koehn and H. Hoang. 2007. Factored translation models. In Proceedings of the 2007 Conference of Empirical Methods in Natural Language Processing. Philipp Koehn, Franz J. Och, and Daniel Marcu. 2003. Statistical phrase based translation. In Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, pages 48–54, Edmonton, Canada. Philipp Koehn. 2004. Statistical significance tests for machine translation evaluation. In Proceedings of the 2004 Conference of Empirical Methods in Natural Language Processing, pages 388–395, Barcelona, Spain. Daniel Marcu, Wei Wang, Abdessamad Echihabi, and Kevin Knight. 2006. SPMT: Statistical machine translation with syntactified target language phrases. In Proceedings of the 2006 Conference of Empirical Methods in Natural Language Processing, pages 44– 52, Sydney, Australia. 625 M. P. Marcus, B. Santorini, and M. A. Marcinkiewicz. 1994. Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2):313–330. Franz J. Och and Hermann Ney. 2004. The alignment template approach to statistical machine translation. Computational Linguistics, 30(4). Kishore Papineni, Salim Roukos, and Todd Ward. 2001 . Bleu: a method for automatic evaluation of machine translation. IBM Research Report, RC22176. Libin Shen, Jinxi Xu, and Ralph Weischedel. 2008. A new string-to-dependency machine translation algorithm with a target dependency language model. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics (ACL). Libin Shen, Jinxi Xu, Bing Zhang, Spyros Matsoukas, and Ralph Weischedel. 2009. Effective Use of Linguistic and Contextual Information for Statistical Machine Translation. In Proceedings of the 2009 Conference of Empirical Methods in Natural Language Processing, pages 72–80, Singapore. XTAG-Group. 2001. A lexicalized tree adjoining grammar for english. Technical Report 01-03, IRCS, Univ. of Pennsylvania.