emnlp emnlp2013 emnlp2013-2 emnlp2013-2-reference knowledge-graph by maker-knowledge-mining

2 emnlp-2013-A Convex Alternative to IBM Model 2

Source: pdf

Author: Andrei Simion ; Michael Collins ; Cliff Stein

Abstract: The IBM translation models have been hugely influential in statistical machine translation; they are the basis of the alignment models used in modern translation systems. Excluding IBM Model 1, the IBM translation models, and practically all variants proposed in the literature, have relied on the optimization of likelihood functions or similar functions that are non-convex, and hence have multiple local optima. In this paper we introduce a convex relaxation of IBM Model 2, and describe an optimization algorithm for the relaxation based on a subgradient method combined with exponentiated-gradient updates. Our approach gives the same level of alignment accuracy as IBM Model 2.

reference text

Peter L. Bartlett, Ben Taskar, Michael Collins and David Mcallester. 2004. Exponentiated Gradient Algorithms for Large-Margin Structured Classification. In Proceedings of NIPS. Amir Beck and Marc Teboulle. 2003. Mirror Descent and Nonlinear Projected Subgradient Methods for Convex Optimization. Operations Research Letters, 3 1:167175. Dimitris Bertsimas and John N. Tsitsiklis. 1997. Introduction to Linear Programming. Athena Scientific. Dimitris Bertsimas. 2005. Optimization Over Integers. Dynamic Ideas. Dimitri P. Bertsekas. 1999. Nonlinear Optimization. Athena Press. 1582 Steven Boyd and Lieven Vandenberghe. 2004. Convex Optimization. Cambridge University Press. Peter F. Brown, Vincent J. Della Pietra, Stephen A. Della Pietra, and Robert. L. Mercer. 1993. The Mathematics of Statistical Machine Translation: Parameter Estimation. Computational Linguistics, 19:263-3 11. David Chiang. 2005. A Hierarchical Phrase-Based Model for Statistical Machine Translation. In Proceedings of the ACL. Michael Collins, Amir Globerson, Terry Koo, Xavier Carreras and Peter L. Bartlett. 2008. Exponentiated Gradient Algorithms for Conditional Random Fields and Max-Margin Markov Networks. Journal Machine Learning, 9(Aug): 1775-1822. A. P. Dempster, N. M. Laird, and D. B. Rubin. 1977. Maximum Likelihood From Incomplete Data via the EM Algorithm. Journal of the royal statistical society, series B, 39(1): 1-38. Alexander Fraser and Daniel Marcu. 2007. Measuring Word Alignment Quality for Statistical Machine Translation. Journal Computational Linguistics, 33(3): 293-303. Kuzman Ganchev, Joao V. Graca, Jennifer Gillenwater, Ben Taskar. 2010. Posterior Regularization for Structured Latent Variable Models. Journal of Machine Learning, 11(July): 2001-2049. Joao V. Graca, Kuzman Ganchev and Ben Taskar. 2007. Expectation Maximization and Posterior Constraints. In Proceedings of NIPS. Aria Haghighi, John Blitzer, John DeNero and Dan Klein. 2009. Better Word Alignments with Supervised ITG Models. In Proceedings of the ACL. Darcey Riley and Daniel Gildea. 2012. Improving the IBM Alignment Models Using Variational Bayes. In Proceedings of the ACL. Yuhong Guo and Dale Schuurmans. 2007. Convex Relaxations of Latent Variable Training. In NIPS. Simon Lacoste-Julien, Ben Taskar, Dan Klein, and Michael Jordan. 2008. Word Alignment via Quadratic Assignment. In Proceedings of the HLT-NAACL. Phillip Koehn. 2008. Statistical Machine Translation. Cambridge University Press. Kivinen, J., Warmuth, M. 1997. Exponentiated Gradient Versus Gradient Descent for Linear Predictors. Information and Computation, 132, 1-63. Percy Liang, Ben Taskar and Dan Klein. 2006. Alignment by Agreement. In Proceedings of NAACL. Daniel Marcu, Wei Wang, Abdessamad Echihabi, and Kevin Knight. 2006. SPMT: Statistical Machine Translation with Syntactified Target Language Phrases. In Proceedings of the EMNLP. Andre F. T. Martins, Noah A. Smith and Eric P. Xing. 2010. Turbo Parsers: Dependency Parsing by Approximate Variational Inference. In Proceedings of the EMNLP. Rada Michalcea and Ted Pederson. tion Exercise in Word Alignment. Workshop 2003. An Evalua- HLT-NAACL 2003: in building and using Parallel Texts: Data Driven Machine Translation and Beyond. Robert C. Moore. 2004. Improving IBM Word- Alignment Model 1. In Proceedings of the ACL. Stephan Vogel, Hermann Ney and Christoph Tillman. 1996. HMM-Based Word Alignment in Statistical Translation. In Proceedings of COLING. Franz Och and Hermann Ney. 2003. A Systematic Comparison of Various Statistical Alignment Models. Computational-Linguistics, 29(1): 19-52. Libin Shen, Jinxi Xu and Ralph Weischedel. 2008. A New String-to-Dependency Machine Translation Algorithm with a Target Dependency Language Model. In Proceedings of the ACL-HLT. Ben Taskar, Simon Lacoste-Julien and Dan Klein. 2005. A Discriminative Matching Approach to Word Alignment. In Proceedings of the EMNLP. Kristina Toutanova and Michel Galley. 2011. Why Initialization Matters for IBM Model 1: Multiple Optima and Non-Strict Convexity. In Proceedings of the ACL. Kenji Yamada and Kevin Knight. 2001. A Syntax-Based Statistical Translation Model. In Proceedings of the ACL. Kenji Yamada and Kevin Knight. 2002. A Decoder for Syntax-Based Statistical Machine Translation. In Proceedings of the ACL. Ashish Vaswani, Liang Huang and David Chiang. 2012. Smaller Alignment Models for Better Translations: Unsupervised Word Alignment with the L0-norm. In Proceedings of the ACL. 1583