acl acl2011 acl2011-57 acl2011-57-reference knowledge-graph by maker-knowledge-mining

57 acl-2011-Bayesian Word Alignment for Statistical Machine Translation


Source: pdf

Author: Coskun Mermer ; Murat Saraclar

Abstract: In this work, we compare the translation performance of word alignments obtained via Bayesian inference to those obtained via expectation-maximization (EM). We propose a Gibbs sampler for fully Bayesian inference in IBM Model 1, integrating over all possible parameter values in finding the alignment distribution. We show that Bayesian inference outperforms EM in all of the tested language pairs, domains and data set sizes, by up to 2.99 BLEU points. We also show that the proposed method effectively addresses the well-known rare word problem in EM-estimated models; and at the same time induces a much smaller dictionary of bilingual word-pairs. .t r


reference text

Phil Blunsom, Trevor Cohn, Chris Dyer, and Miles Osborne. 2009. A Gibbs sampler for phrasal synchronous grammar induction. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pages 782–790, Suntec, Singapore, August. Tugba Bodrumlu, Kevin Knight, and Sujith Ravi. 2009. A new objective function for word alignment. In Proceedings of the NAACL HLT Workshop on Integer Linear Programming for Natural Language Processing, pages 28–35, Boulder, Colorado, June. Association for Computational Linguistics. Peter F. Brown, Vincent J. Della Pietra, Stephen A. Della Pietra, and Robert L. Mercer. 1993. The mathematics of statistical machine translation: parameter estimation. Computational Linguistics, 19(2):263–3 11. David Chiang, Jonathan Graehl, Kevin Knight, Adam Pauls, and Sujith Ravi. 2010. Bayesian inference for finite-state transducers. In Human Language Tech- nologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 447–455, Los Angeles, California, June. David Chiang. 2007. Hierarchical phrase-based translation. Computational Linguistics, 33(2):201–228. Tagyoung Chung and Daniel Gildea. 2009. Unsupervised tokenization for machine translation. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, pages 718–726, Singapore, August. A.P. Dempster, N.M. Laird, and D.B. Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39(1): 1–38. John DeNero, Alexandre Bouchard-C oˆt´ e, and Dan Klein. 2008. Sampling alignment structure under a Bayesian translation model. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, pages 3 14–323, Honolulu, Hawaii, October. 186 Michel Galley, Jonathan Graehl, Kevin Knight, Daniel Marcu, Steve DeNeefe, Wei Wang, and Ignacio Thayer. 2006. Scalable inference and training of context-rich syntactic translation models. In Proceedings of the 21st International Conference on Computa- tional Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, pages 961– 968, Sydney, Australia, July. Stuart Geman and Donald Geman. 1984. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions On Pattern Analysis And Machine Intelligence, 6(6):721–741, November. Sharon Goldwater and Tom Griffiths. 2007. A fully Bayesian approach to unsupervised part-of-speech tagging. In Proceedings ofthe 45th Annual Meeting ofthe Association of Computational Linguistics, pages 744– 751, Prague, Czech Republic, June. Mark Johnson, Thomas L. Griffiths, and Sharon Goldwater. 2007. Bayesian inference for PCFGs via Markov chain Monte Carlo. In Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics, pages 139–146, Rochester, New York, April. Genichiro Kikui, Seiichi Yamamoto, Toshiyuki Takezawa, and Eiichiro Sumita. 2006. Comparative study on corpora for speech translation. IEEE Transactions on Audio, Speech and Language Processing, 14(5): 1674–1682. Philipp Koehn, Franz Josef Och, and Daniel Marcu. 2003. Statistical phrase-based translation. In Proceedings of HLT-NAACL 2003, Main Papers, pages 48–54, Edmonton, May-June. Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondrej Bojar, Alexandra Constantin, and Evan Herbst. 2007. Moses: open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions, pages 177– 180, Prague, Czech Republic, June. Percy Liang, Ben Taskar, and Dan Klein. 2006. Alignment by agreement. In Proceedings of the Human Language Technology Conference ofthe NAACL, Main Conference, pages 104–1 11, New York City, USA, June. Robert C. Moore. 2004. Improving IBM word alignment Model 1. In Proceedings of the 42nd Meeting of the Association for Computational Linguistics (ACL’04), Main Volume, pages 518–525, Barcelona, Spain, July. ThuyLinh Nguyen, Stephan Vogel, and Noah A. Smith. 2010. Nonparametric word segmentation for machine translation. In Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010), pages 815–823, Beijing, China, August. Franz Josef Och and Hermann Ney. 2003. A systematic comparison of various statistical alignment models. Computational Linguistics, 29(1): 19–51. Kishore Papineni, Salim Roukos, Todd Ward, and WeiJing Zhu. 2002. BLEU: a method for automatic evaluation of machine translation. In Proceedings of 40th Annual Meeting of the Association for Computational Linguistics, pages 311–318, Philadelphia, Pennsylvania, USA, July. Philip Resnik and Eric Hardisty. 2010. Gibbs sampling for the uninitiated. University of Maryland Computer Science Department; CS-TR-4956, June. Andreas Stolcke. 2002. SRILM an extensible language modeling toolkit. In Seventh International Conference on Spoken Language Processing, volume 3. Stephan Vogel, Hermann Ney, and Christoph Tillmann. 1996. HMM-based word alignment in statistical translation. In COLING, pages 836–841. Jia Xu, Jianfeng Gao, Kristina Toutanova, and Hermann Ney. 2008. Bayesian semi-supervised Chinese word segmentation for statistical machine translation. In Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), pages 1017–10124, Manchester, UK, August. Omar F. Zaidan. 2009. Z-MERT: A fully configurable open source tool for minimum error rate training of machine translation systems. The Prague Bulletin of Mathematical Linguistics, 91(1):79–88. Shaojun Zhao and Daniel Gildea. 2010. A fast fertility hidden Markov model for word alignment using MCMC. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pages 596–605, Cambridge, MA, October. Bing Zhao and Eric P. Xing. 2006. BiTAM: Bilingual topic admixture models for word alignment. In Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions, pages 969–976, Sydney, Australia, July. Association for Computational Linguistics. – 187