emnlp emnlp2012 emnlp2012-109 emnlp2012-109-reference knowledge-graph by maker-knowledge-mining

109 emnlp-2012-Re-training Monolingual Parser Bilingually for Syntactic SMT

Source: pdf

Author: Shujie Liu ; Chi-Ho Li ; Mu Li ; Ming Zhou

Abstract: The training of most syntactic SMT approaches involves two essential components, word alignment and monolingual parser. In the current state of the art these two components are mutually independent, thus causing problems like lack of rule generalization, and violation of syntactic correspondence in translation rules. In this paper, we propose two ways of re-training monolingual parser with the target of maximizing the consistency between parse trees and alignment matrices. One is targeted self-training with a simple evaluation function; the other is based on training data selection from forced alignment of bilingual data. We also propose an auxiliary method for boosting alignment quality, by symmetrizing alignment matrices with respect to parse trees. The best combination of these novel methods achieves 3 Bleu point gain in an IWSLT task and more than 1 Bleu point gain in NIST tasks. 1

reference text

Vamshi Ambati and Alon Lavie. 2008. Improving syntax driven translation models by re-structuring divergent and non-isomorphic parse tree structures. In Student Research Workshop of the Eighth Conference of the Association for Machine Translation in the Americas, pages 235-244. 861 David Burkett and Dan Klein. 2008. Two languages are better than one (for syntactic parsing). In Proceedings of the Conference on Empirical Methods on Natural Language Processing, pages 877-886. Colin Cherry and Dekang Lin. 2006. Soft syntactic constraints for word alignment through discriminative training. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics. John DeNero and Dan Klein. 2007. Tailing word alignment to syntactic machine translation. In Proceedings of the Association for Computational Linguistics, pages 17-24. Victoria Fossum, Kevin Knight, Steven Abney. 2008. Using syntax to improve word alignment precision for syntax-based machine translation. In Proceedings of the Third Workshop on Statistical Machine Translation, pages 44-52. Michel Galley, Jonathan Graehl, Kevin Knight, Daniel Marcu, Steve Deneefe, Wei Wang and Ignacio Thayer. 2006. Scalable inference and training of context-rich syntactic translation models. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, pages 961-968. Ulf Hermjackob. Improved word alignment with statistics and linguistic heuristics. In Proceedings of the Conference on Empirical Methods on Natural Language Processing, pages 229-237. Bryant Huang, Kevin Knight. 2006. Relabeling syntax trees to improve syntax-based machine translation quality. In Proceedings of the Human Technology Conference of the North American Chapter of the ACL, pages 240-247. Jason Katz-Brown, Slav Petrov, Ryan McDonald, Franz Och, David Talbot, Hiroshi Ichikawa, Masakazu Seno, Hideto Kazawa. 201 1. Training a parser for machine translation reordering. In Proceedings of the Conference on Empirical Methods on Natural Language Processing, pages 183-192. Philipp Koehn. 2004. Statistical significance tests for machine translation evaluation. In Proceedings of the Conference on Empirical Methods on Natural Language Processing, pages 388-395. Wei Wang, Jonathan May, Kevin Knight, Daniel Marcu. 2010. Re-structuring, re-labeling, and re-alignment for syntax-Based machine translation. Computational Linguistics, 36(2). 862 Xianchao Wu, Takuya Matsuzaki and Jun'ichi Tsujii. 201 1. Effective use of function words for rule generalization in forest-based translation. In Proceedings of the Association for Computational Linguistics, pages 22-3 1. Franz Josef Och. 2003. Minimum error rate training in statistical machine translation. In Proceedings of the Association for Computational Linguistics, pages 160-167. Franz Josef Och and Hermann Ney. 2003. A systematic comparison of various statistical alignment models. Computational Linguistics, 29(1). Joern Wuebker, Arne Mauser and Hermann Ney. 2010. Training phrase translation models with leaving-oneout. In Proceedings of the Association for Computational Linguistics, pages 475-484. Kishore Papineni, Salim Roukos, Todd Ward and Weijing Zhu. 2002. BLEU: a method for automatic evaluation of machine translation. In Proceedings of the Association for Computational Linguistics, pages 3 11-3 18. Slav Petrov and Dan Klein. 2007. Improved inference for unlexicalized parsing. In Proceedings of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 404–41 1.