acl acl2013 acl2013-328 acl2013-328-reference knowledge-graph by maker-knowledge-mining

328 acl-2013-Stacking for Statistical Machine Translation

Source: pdf

Author: Majid Razmara ; Anoop Sarkar

Abstract: We propose the use of stacking, an ensemble learning technique, to the statistical machine translation (SMT) models. A diverse ensemble of weak learners is created using the same SMT engine (a hierarchical phrase-based system) by manipulating the training data and a strong model is created by combining the weak models on-the-fly. Experimental results on two language pairs and three different sizes of training data show significant improvements of up to 4 BLEU points over a conventionally trained SMT model.

reference text

Leo Breiman. 1996a. Bagging predictors. Machine Learning, 24(2): 123–140, August. Leo Breiman. 1996b. Stacked regressions. Machine Learning, 24(1):49–64, July. David Chiang. 2005. A hierarchical phrase-based model for statistical machine translation. In ACL ’05: Proceedings of the 43rd Annual Meeting on As- sociation for Computational Linguistics, pages 263– 270, Morristown, NJ, USA. ACL. Nan Duan, Mu Li, Tong Xiao, and Ming Zhou. 2009. The feature subspace method for smt system combination. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3, EMNLP ’09, pages 1096– 1104, Stroudsburg, PA, USA. Association for Computational Linguistics. Nan Duan, Hong Sun, and Ming Zhou. 2010. Translation model generalization using probability averaging for machine translation. In Proceedings of the 338 23rd International Conference on Computational Linguistics, COLING ’ 10, pages 304–3 12, Stroudsburg, PA, USA. Association for Computational Linguistics. Andr e´ F. T. Martins, Dipanjan Das, Noah A. Smith, and Eric P. Xing. 2008. Stacking dependency parsers. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, pages 157–166, Honolulu, Hawaii, October. Association for Computational Linguistics. George Foster and Roland Kuhn. 2007. Mixture- model adaptation for smt. In Proceedings of the Second Workshop on Statistical Machine Translation, StatMT ’07, pages 128–135, Stroudsburg, PA, USA. ACL. Michel Galley, Jonathan Graehl, Kevin Knight, Daniel Marcu, Steve DeNeefe, Wei Wang, and Ignacio Thayer. 2006. Scalable inference and training of context-rich syntactic translation models. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, ACL-44, pages 961–968, Stroudsburg, PA, USA. Association for Computational Linguistics. Tin Kam Ho. 1998. The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell., 20(8):832–844, August. Jennifer A. Hoeting, David Madigan, Adrian E. Raftery, and Chris T. Volinsky. 1999. Bayesian Model Averaging: A Tutorial. Statistical Science, 14(4):382–401. Philipp Koehn, Franz Josef Och, and Daniel Marcu. 2003. Statistical phrase-based translation. In Proceedings of the Human Language Technology Conference of the NAACL, pages 127–133, Edmonton, May. NAACL. P. Koehn. 2005. Europarl: A parallel corpus for statistical machine translation. In MT summit, volume 5. Philipp Koehn. 2010. Statistical Machine Translation. Cambridge University Press, New York, NY, USA, 1st edition. Antonio Lagarda and Francisco Casacuberta. 2008. Applying boosting to statistical machine translation. In Annual Meeting of European Association for Machine Translation (EAMT), pages 88–96. Joakim Nivre and Ryan McDonald. 2008. Integrating graph-based and transition-based dependency parsers. In Proceedings of ACL-08: HLT, pages 950–958, Columbus, Ohio, June. Association for Computational Linguistics. Franz Josef Och and Hermann Ney. 2003. A systematic comparison of various statistical alignment models. Comput. Linguist., 29(1): 19–5 1, March. Franz Josef Och. 2003. Minimum error rate training for statistical machine translation. In Proceedings of the 41th Annual Meeting of the ACL, Sapporo, July. ACL. Majid Razmara, George Foster, Baskaran Sankaran, and Anoop Sarkar. 2012. Mixing multiple transla- tion models in statistical machine translation. In The 50th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, July 8-14, 2012, Jeju Island, Korea - Volume 1: Long Papers, pages 940–949. The Association for Computer Linguistics. Erik F. Tjong Kim Sang. 2002. Memory-based shallow parsing. J. Mach. Learn. Res., 2:559–594, March. Baskaran Sankaran, Majid Razmara, and Anoop Sarkar. 2012. Kriya an end-to-end hierarchical phrase-based mt system. The Prague Bulletin of Mathematical Linguistics, 97(97), April. Robert E. Schapire. 1990. The strength of weak learnability. Mach. Learn., 5(2): 197–227, July. Linfeng Song, Haitao Mi, Yajuan L u¨, and Qun Liu. 2011. Bagging-based system combination for domain adaption. In Proceedings of the 13th Machine Translation Summit (MT Summit XIII), pages 293– 299. International Association for Machine Translation, September. Andreas guage tional pages Stolcke. 2002. SRILM an extensible lanmodeling toolkit. In Proceedings InternaConference on Spoken Language Processing, 257–286. – Mihai Surdeanu and Christopher D. Manning. 2010. Ensemble models for dependency parsing: cheap and good? In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, HLT ’ 10, pages 649–652, Stroudsburg, PA, USA. Association for Computational Linguistics. Nadi Tomeh, Alexandre Allauzen, Guillaume Wisniewski, and Fran ¸cois Yvon. 2010. Refining word alignment with discriminative training. In Proceedings of The Ninth Conference of the Association for Machine Translation in the Americas (AMTA 2010). David H. Wolpert. 1992. Stacked generalization. Neural Networks, 5:241–259. Tong Xiao, Jingbo Zhu, Muhua Zhu, and Huizhen Wang. 2010. Boosting-based system combination for machine translation. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL ’ 10, pages 739–748, Stroudsburg, PA, USA. Association for Computational Linguistics. Tong Xiao, Jingbo Zhu, and Tongran Liu. 2013. Bagging and boosting statistical machine translation systems. Artificial Intelligence, 195:496–527, February. 339