acl acl2010 acl2010-240 acl2010-240-reference knowledge-graph by maker-knowledge-mining

240 acl-2010-Training Phrase Translation Models with Leaving-One-Out

Source: pdf

Author: Joern Wuebker ; Arne Mauser ; Hermann Ney

Abstract: Several attempts have been made to learn phrase translation probabilities for phrasebased statistical machine translation that go beyond pure counting of phrases in word-aligned training data. Most approaches report problems with overfitting. We describe a novel leavingone-out approach to prevent over-fitting that allows us to train phrase models that show improved translation performance on the WMT08 Europarl German-English task. In contrast to most previous work where phrase models were trained separately from other models used in translation, we include all components such as single word lexica and reordering mod- els in training. Using this consistent training of phrase models we are able to achieve improvements of up to 1.4 points in BLEU. As a side effect, the phrase table size is reduced by more than 80%.

reference text

Alexandra Birch, Chris Callison-Burch, Miles Osborne, and Philipp Koehn. 2006. Constraining the phrase-based, joint probability statistical translation model. In smt2006, pages 154–157, Jun. Phil Blunsom, Trevor Cohn, and Miles Osborne. 2008. A discriminative latent variable model for statistical machine translation. In Proceedings of ACL-08: HLT, pages 200–208, Columbus, Ohio, June. Association for Computational Linguistics. P. F. Brown, V. J. Della Pietra, S. A. Della Pietra, and R. L. Mercer. 1993. The mathematics of statistical machine translation: parameter estimation. Computational Linguistics, 19(2):263–3 12, June. John DeNero and Dan Klein. 2008. The complexity of phrase alignment problems. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers, pages 25–28, Morristown, NJ, USA. Association for Computational Linguistics. John DeNero, Dan Gillick, James Zhang, and Dan Klein. 2006. Why Generative Phrase Models Underperform Surface Heuristics. In Proceedings ofthe 483 Workshop on Statistical Machine Translation, pages 31–38, New York City, June. John DeNero, Alexandre Buchard-C oˆt ´e, and Dan Klein. 2008. Sampling Alignment Structure under a Bayesian Translation Model. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, pages 3 14–323, Honolulu, October. Nicola Ehling, Richard Zens, and Hermann Ney. 2007. Minimum bayes risk decoding for bleu. In ACL ’07: Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, pages 101–104, Morristown, NJ, USA. Association for Computational Linguistics. Jes´ us-Andr e´s Ferrer and Alfons Juan. 2009. A phrasebased hidden semi-markov approach to machine translation. In Procedings of European Association for Machine Translation (EAMT), Barcelona, Spain, May. European Association for Machine Translation. Reinhard Kneser and Hermann Ney. 1995. Improved Backing-Off for M-gram Language Modelling. In IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), pages 181–184, Detroit, MI, May. Philipp Koehn, Franz Josef Och, and Daniel Marcu. 2003. Statistical phrase-based translation. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1, pages 48–54, Morristown, NJ, USA. Association for Computational Linguistics. Percy Liang, Alexandre Buchard-C oˆt ´e, Dan Klein, and Ben Taskar. 2006. An End-to-End Discriminative Approach to Machine Translation. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, pages 761– 768, Sydney, Australia. Daniel Marcu and William Wong. 2002. A phrasebased, joint probability model for statistical machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP-2002), July. J.A. Nelder and R. Mead. 1965. A Simplex Method for Function Minimization. The Computer Journal), 7:308–313. Franz Josef Och and Hermann Ney. 2003. A systematic comparison of various statistical alignment models. March. Computational Linguistics, 29(1): 19–5 1, Franz Josef Och and Hermann Ney. 2004. The alignment template approach to statistical machine translation. Computational Linguistics, 30(4):417–449, December. F.J. Och, C. Tillmann, and H. Ney. 1999. Improved alignment models for statistical machine translation. In Proc. of the Joint SIGDAT Conf. on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP99), pages 20–28, University of Maryland, College Park, MD, USA, June. Kishore Papineni, Salim Roukos, Todd Ward, and WeiJing Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pages 3 11–3 18, Morristown, NJ, USA. Association for Computational Linguistics. Wade Shen, Brian Delaney, Tim Anderson, and Ray Slyh. 2008. The MIT-LL/AFRL IWSLT-2008 MT System. In Proceedings of IWSLT 2008, pages 69– 76, Hawaii, U.S.A., October. Matthew Snover, Bonnie Dorr, Rich Schwartz, Linnea Micciulla, and John Makhoul. 2006. A study of translation edit rate with targeted human annotation. In Proc. of AMTA, pages 223–231, Aug. Roy Tromble, Shankar Kumar, Franz Och, and Wolfgang Macherey. 2008. Lattice Minimum BayesRisk decoding for statistical machine translation. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, pages 620–629, Honolulu, Hawaii, October. Association for Computational Linguistics. N. Ueffing, F.J. Och, and H. Ney. 2002. Generation of word graphs in statistical machine translation. In Proc. of the Conference on Empirical Methods for Natural Language Processing, pages 156–163, Philadelphia, PA, USA, July. 484