emnlp emnlp2011 emnlp2011-100 emnlp2011-100-reference knowledge-graph by maker-knowledge-mining

100 emnlp-2011-Optimal Search for Minimum Error Rate Training

Source: pdf

Author: Michel Galley ; Chris Quirk

Abstract: Minimum error rate training is a crucial component to many state-of-the-art NLP applications, such as machine translation and speech recognition. However, common evaluation functions such as BLEU or word error rate are generally highly non-convex and thus prone to search errors. In this paper, we present LP-MERT, an exact search algorithm for minimum error rate training that reaches the global optimum using a series of reductions to linear programming. Given a set of N-best lists produced from S input sentences, this algorithm finds a linear model that is globally optimal with respect to this set. We find that this algorithm is polynomial in N and in the size of the model, but exponential in S. We present extensions of this work that let us scale to reasonably large tuning sets (e.g., one thousand sentences), by either searching only promising regions of the parameter space, or by using a variant of LP-MERT that relies on a beam-search approximation. Experimental results show improvements over the standard Och algorithm.

reference text

C. Bradford Barber, David P. Dobkin, and Hannu Huhdanpaa. 1996. The QuickHull algorithm for convex hulls. ACM Trans. Math. Softw., 22:469–483. C. Buchta, J. Muller, and R. F. Tichy. 1985. Stochastical approximation of convex bodies. Math. Ann., 271:225– 235. A. Bykat. 1978. Convex hull of a finite set of points in two dimensions. Inf. Process. Lett., 7(6):296–298. Daniel Cer, Dan Jurafsky, and Christopher D. Manning. 2008. Regularization and search for minimum error rate training. In Proceedings of the Third Workshop on Statistical Machine Translation, pages 26–34. Samidh Chatterjee and Nicola Cancedda. 2010. Minimum error rate training by sampling the translation lattice. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pages 606–615. Association for Computational Linguistics. David Chiang, Yuval Marton, and Philip Resnik. 2008. Online large-margin training of syntactic and structural translation features. In EMNLP. David Chiang. 2007. Hierarchical phrase-based translation. Computational Linguistics, 33(2):201–228. W. Chou, C. H. Lee, and B. H. Juang. 1993. Minimum error rate training based on N-best string models. In Proc. IEEE Int’l Conf. Acoustics, Speech, and Signal Processing (ICASSP ’93), pages 652–655, Vol. 2. Kevin Duh and Katrin Kirchhoff. 2008. Beyond loglinear models: boosted minimum error rate training for programming N-best re-ranking. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers, pages 37–40, Stroudsburg, PA, USA. William F. Eddy. 1977. A new convex hull algorithm for planar sets. ACM Trans. Math. Softw., 3:398–403. Liang Huang and David Chiang. 2005. Better k-best parsing. In Proceedings of the Ninth International Workshop on Parsing Technology, pages 53–64, Stroudsburg, PA, USA. Biing-Hwang Juang, Wu Hou, and Chin-Hui Lee. 1997. Minimum classification error rate methods for speech recognition. Speech andAudio Processing, IEEE Transactions on, 5(3):257–265. N. Karmarkar. 1984. A new polynomial-time algorithm for linear programming. Combinatorica, 4:373–395. Victor Klee. 1966. Convex polytopes and linear programming. In Proceedings of the IBM Scientific Computing Symposium on Combinatorial Problems. Philipp Koehn, Hieu Hoang, Alexandra Birch Mayne, Christopher Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondrej Bojar, Alexandra Constantin, and Evan Herbst. 2007. Moses: Open 48 source toolkit for statistical machine translation. In Proc. of ACL, Demonstration Session. Shankar Kumar, Wolfgang Macherey, Chris Dyer, and Franz Och. 2009. Efficient minimum error rate training and minimum Bayes-risk decoding for translation hypergraphs and lattices. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1, pages 163–171. Gregor Leusch, Evgeny Matusov, and Hermann Ney. 2008. Complexity of finding the BLEU-optimal hy- pothesis in a confusion network. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 839–847, Stroudsburg, PA, USA. Zhifei Li, Chris Callison-Burch, Chris Dyer, Juri Ganitkevitch, Sanjeev Khudanpur, Lane Schwartz, Wren N. G. Thornton, Jonathan Weese, and Omar F. Zaidan. 2009. Joshua: an open source toolkit for parsing-based MT. In Proc. of WMT. P. Liang, A. Bouchard-Coˆte´, D. Klein, and B. Taskar. 2006. An end-to-end discriminative approach to machine translation. In International Conference on Computational Linguistics and Association for Computational Linguistics (COLING/ACL). Chin-Yew Lin and Franz Josef Och. 2004. ORANGE: a method for evaluating automatic evaluation metrics for machine translation. In Proceedings of the 20th international conference on Computational Linguistics, Stroudsburg, PA, USA. Jonas Lo¨o¨f, Ralf Schlu¨ter, and Hermann Ney. 2010. Discriminative adaptation for log-linear acoustic models. In INTERSPEECH, pages 1648–1651. Wolfgang Macherey, Franz Och, Ignacio Thayer, and Jakob Uszkoreit. 2008. Lattice-based minimum error rate training for statistical machine translation. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, pages 725–734. David McAllester, Tamir Hazan, and Joseph Keshet. 2010. Direct loss minimization for structured prediction. Advances in Neural Information Processing In Systems 23, pages 1594–1602. Ryan McDonald, Koby Crammer, and Fernando Pereira. 2005. Online large-margin training of dependency parsers. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pages 91–98. Ryan McDonald. 2006. Discriminative sentence compression with soft syntactic constraints. In Proceedings of EACL, pages 297–304. Robert C. Moore and Chris Quirk. 2008. Random restarts in minimum error rate training for statistical machine translation. In Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1, pages 585–592. J. A. Nelder and R. Mead. 1965. A simplex method for function minimization. Computer Journal, 7:308–3 13. Franz Josef Och and Hermann Ney. 2002. Discriminative training and maximum entropy models for statistical machine translation. In Proc. of the 40th Annual Meeting of the Association for Computational Linguistics, pages 295–302. Franz Josef Och. 2003. Minimum error rate training for statistical machine translation. In Proc. of ACL. Kishore Papineni, Salim Roukos, Todd Ward, and WeiJing Zhu. 2001. BLEU: a method for automatic evaluation of machine translation. In Proc. of ACL. M.J.D. Powell. 1964. An efficient method for finding the minimum of a function of several variables without calculating derivatives. Comput. J., 7(2): 155–162. William H. Press, Saul A. Teukolsky, William T. Vetterling, and Brian P. Flannery. 2007. Numerical Recipes: The Art of Scientific Computing. Cambridge University Press, 3rd edition. Chris Quirk, Arul Menezes, and Colin Cherry. 2005. Dependency treelet translation: syntactically informed phrasal SMT. In Proc. of ACL, pages 271–279. David A. Smith and Jason Eisner. 2006. Minimum risk annealing for training log-linear models. In Proceedings of the COLING/ACL on Main conference poster sessions, pages 787–794, Stroudsburg, PA, USA. Matthew Snover, Bonnie Dorr, Richard Schwartz, Linnea Micciulla, and John Makhoul. 2006. A study of translation edit rate with targeted human annotation. In Proc. of AMTA, pages 223–23 1. Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers, pages 21–24. Andreas Stolcke, Yochai Knig, and Mitchel Weintraub. 1997. Explicit word error minimization in N-best list rescoring. In In Proc. Eurospeech, pages 163–166. Taro Watanabe, Jun Suzuki, Hajime Tsukada, and Hideki Isozaki. 2007. Online large-margin training for statistical machine translation. In EMNLP-CoNLL. Omar F. Zaidan and Chris Callison-Burch. 2009. Feasibility of human-in-the-loop minimum error rate training. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1Volume 1, pages 52–61 . Richard Zens, Sasa Hasan, and Hermann Ney. 2007. A systematic comparison of training criteria for statistical machine translation. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pages 524–532, Prague, Czech Republic. Bing Zhao and Shengyuan Chen. 2009. A simplex Armijo downhill algorithm for optimizing statistical machine translation decoding parameters. In Proceedings of 49