emnlp emnlp2010 emnlp2010-39 emnlp2010-39-reference knowledge-graph by maker-knowledge-mining

39 emnlp-2010-EMNLP 044

Source: pdf

Author: George Foster

Abstract: We describe a new approach to SMT adaptation that weights out-of-domain phrase pairs according to their relevance to the target domain, determined by both how similar to it they appear to be, and whether they belong to general language or not. This extends previous work on discriminative weighting by using a finer granularity, focusing on the properties of instances rather than corpus components, and using a simpler training procedure. We incorporate instance weighting into a mixture-model framework, and find that it yields consistent improvements over a wide range of baselines.

reference text

ACL. 2007. Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL), Prague, Czech Republic, June. Michel Bacchiani, Brian Roark, and Murat Saraclar. 2004. Language model adaptation with MAP estimation and the perceptron algorithm. In NAACL04 (NAA, 2004). Nicola Bertoldi and Marcello Federico. 2009. Domain adaptation for statistical machine translation with monolingual resources. In WMT09 (WMT, 2009). Jorge Civera and Alfons Juan. 2007. Domain adaptation in Statistical Machine Translation with mixture modelling. In WMT07 (WMT, 2007). Hal Daum e´ III and Daniel Marcu. 2006. Domain Adaptation for Statistical Classifiers. Journal of Artificial Intelligence Research, 26: 101–126. Hal Daum e´ III. 2007. Frustratingly Easy Domain Adaptation. In ACL-07 (ACL, 2007). Andrew Finch and Eiichiro Sumita. 2008. Dynamic model interpolation for statistical machine translation. In Proceedings of the ACL Workshop on Statistical Machine Translation, Columbus, June. WMT. Jenny Rose Finkel and Christopher D. Manning. 2009. Hierarchical Bayesian domain adaptation. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), Boulder, June. NAACL. George Foster and Roland Kuhn. 2007. Mixture-model adaptation for SMT. In WMT07 (WMT, 2007). George Foster and Roland Kuhn. 2009. Stabilizing minimum error rate training. In WMT09 (WMT, 2009). Almut Silja Hildebrand, Matthias Eck, Stephan Vogel, and Alex Waibel. 2005. Adaptation of the translation model for statistical machine translation based on information retrieval. In Proceedings of the 10th EAMT Conference, Budapest, May. Jing Jiang and ChengXiang Zhai. 2007. Instance Weighting for Domain Adaptation in NLP. In ACL07 (ACL, 2007). Reinhard Kneser and Hermann Ney. 1995. Improved backing-off for m-gram language modeling. In Pro- ceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 1995, pages 181–184, Detroit, Michigan. IEEE. Philipp Koehn and Josh Schroeder. 2007. Experiments in domain adaptation for statistical machine translation. In Proceedings of the Second Workshop on Statistical Machine Translation, pages 224–227, Prague, Czech Republic, June. Association for Computational Linguistics. Philipp Koehn, Franz Josef Och, and Daniel Marcu. 2003. Statistical phrase-based translation. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), pages 127–133, Edmonton, May. NAACL. D. C. Liu and J. Nocedal. 1989. On the limited memory method for large scale optimization. Mathematical Programming B, 45(3):503–528. Yajuan L u¨, Jin Huang, and Qun Liu. 2007. Improving Statistical Machine Translation Performance by Training Data Selection and Optimization. In Proceedings of the 2007 Conference on Empirical Methods in Natural Language Processing (EMNLP), Prague, Czech Republic. Spyros Matsoukas, Antti-Veikko I. Rosti, and Bing Zhang. 2009. Discriminative corpus weight estimation for machine translation. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing (EMNLP), Singapore. NAACL. 2004. Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), Boston, May. Franz Josef Och. 2003. Minimum error rate training for statistical machine translation. In Proceedings of the 41th Annual Meeting of the Association for Computational Linguistics (ACL), Sapporo, July. ACL. Holger Schwenk and Jean Senellart. 2009. Translation model adaptation for an arabic/french news translation system by lightly-supervised training. In Proceedings of MT Summit XII, Ottawa, Canada, September. International Association for Machine Translation. Yik-Cheung Tam, Ian Lane, and Tanja Schultz. 2007. Bilingual-LSA Based LM Adaptation for Spoken Language Translation. In ACL-07 (ACL, 2007). Jorg Tiedemann. 2009. News from opus - a collection of multilingual parallel corpora with tools and interfaces. In N. Nicolov, K. Bontcheva, G. Angelova, and R. Mitkov, editors, Recent Advances in Natural Language Processing, volume V, pages 237–248. John Benjamins, Amsterdam/Philadelphia. Nicola Ueffing, Gholamreza Haffari, and Anoop Sarkar. 2007. Transductive learning for statistical machine translation. In ACL-07 (ACL, 2007). 459 WMT. 2007. Proceedings of the ACL Workshop on Statistical Machine Translation, Prague, June. WMT. 2009. Proceedings of the 4th Workshop on Statistical Machine Translation, Athens, March. Hua Wu, Haifeng Wang, and Zhanyi Liu. 2005. Alignment model adaptation for domain-specific word alignment. In Proceedings of the 43th Annual Meeting of the Association for Computational Linguistics (ACL), Ann Arbor, Michigan, July. ACL. Jia Xu, Yonggang Deng, Yuqing Gao, and Hermann Ney. 2007. Domain dependent statistical machine translation. In MT Summit XI, Copenhagen, September. Bing Zhao, Matthias Eck, and Stephan Vogel. 2004. Language model adaptation for statistical machine translation with structured query models. In Proceedings ofthe International Conference on Computational Linguistics (COLING) 2004, Geneva, August.