acl acl2013 acl2013-46 acl2013-46-reference knowledge-graph by maker-knowledge-mining

46 acl-2013-An Infinite Hierarchical Bayesian Model of Phrasal Translation

Source: pdf

Author: Trevor Cohn ; Gholamreza Haffari

Abstract: Modern phrase-based machine translation systems make extensive use of wordbased translation models for inducing alignments from parallel corpora. This is problematic, as the systems are incapable of accurately modelling many translation phenomena that do not decompose into word-for-word translation. This paper presents a novel method for inducing phrase-based translation units directly from parallel data, which we frame as learning an inverse transduction grammar (ITG) using a recursive Bayesian prior. Overall this leads to a model which learns translations of entire sentences, while also learning their decomposition into smaller units (phrase-pairs) recursively, terminating at word translations. Our experiments on Arabic, Urdu and Farsi to English demonstrate improvements over competitive baseline systems.

reference text

Phil Blunsom and Trevor Cohn. 2010. Inducing synchronous grammars with slice sampling. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 238–241, Los Angeles, California, June. Association for Computational Linguistics. Phil Blunsom, Trevor Cohn, Chris Dyer, and Miles Osborne. 2009a. A Gibbs sampler for phrasal synchronous grammar induction. In ACL2009, Singapore, August. Phil Blunsom, Trevor Cohn, and Miles Osborne. 2009b. Bayesian synchronous grammar induction. In D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, editors, Advances in Neural Information Processing Systems 21, pages 161–168. MIT Press. P. F. Brown, S. A. Della Pietra, V. J. Della Pietra, and R. L. Mercer. 1993. The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19(2):263–3 11. Colin Cherry and Dekang Lin. 2006. Soft syntactic constraints for word alignment through discriminative training. In Proceedings of COLING/ACL. Association for Computational Linguistics. Colin Cherry and Dekany Lin. 2007. Inversion transduction grammar for joint phrasal translation modeling. In Proc. of the HLT-NAACL Workshop on Syntax and Structure in Statistical Translation (SSST 2007), Rochester, USA. David Chiang. 2007. Hierarchical phrase-based translation. Computational Linguistics, 33(2):201–228. Trevor Cohn, Phil Blunsom, and Sharon Goldwater. 2010. Inducing tree-substitution grammars. Journal of Machine Learning Research, pages 3053–3096. John DeNero and Dan Klein. 2010. Discriminative modeling of extraction sets for machine translation. In The 48th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL). Aria Haghighi, John Blitzer, and Dan Klein. 2009. Better word alignments with supervised itg models. In Proceedings of the Joint Conference of the 47th Annual Meeting oftheACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, Suntec, Singapore. Association for Computational Linguistics. Mark Johnson, Thomas L. Griffiths, and Sharon Goldwater. 2007a. Adaptor grammars: A framework for specifying compositional nonparametric bayesian models. In B. Sch o¨lkopf, J. Platt, and T. Hoffman, editors, Advances in Neural Information Processing Systems 19, pages 641–648. MIT Press, Cambridge, MA. Mark Johnson, Thomas L Griffiths, and Sharon Goldwater. 2007b. Bayesian inference for PCFGs via Markov chain Monte Carlo. In Proc. ofthe 7th International Conference on Human Language Technology Research and 8th Annual Meeting of the NAACL (HLT-NAACL 2007), pages 139–146. Philipp Koehn, Franz Josef Och, and Daniel Marcu. 2003. Statistical phrase-based translation. In Proc. of the 3rd International Conference on Human Language Technology Research and 4th Annual Meeting of the NAACL (HLT-NAACL 2003), pages 81–88, Edmonton, Canada, May. Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondrej Bojar, Alexandra Constantin, and Evan Herbst. 2007. Moses: Open source toolkit for statistical machine translation. In Proc. of the 45th Annual Meeting of the ACL (ACL2007), Prague. Abby Levenberg, Chris Dyer, and Phil Blunsom. 2012. A Bayesian model for learning SCFGs with discontiguous rules. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 223–232, Jeju Island, Korea, July. Association for Computational Linguistics. Daniel Marcu and William Wong. 2002. A phrasebased, joint probability model for statistical machine translation. In Proc. of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP-2002), pages 133–139, Philadelphia, July. Association for Computational Linguistics. Daniel Marcu, Wei Wang, Abdessamad Echihabi, and Kevin Knight. 2006. SPMT: Statistical machine translation with syntactified target language phrases. In Proc. of the 2006 Conference on Empirical Methods in Natural Language Processing (EMNLP2006), pages 44–52, Sydney, Australia, July. Graham Neubig, Taro Watanabe, Eiichiro Sumita, Shinsuke Mori, and Tatsuya Kawahara. 2011. An unsupervised model for joint phrase alignment and extraction. In The 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT), pages 632–641, Portland, Oregon, USA, 6. Franz Josef Och. 2003. Minimum error rate training in statistical machine translation. In Proc. of the 41st Annual Meeting of the ACL (ACL-2003), pages 160– 167, Sapporo, Japan. Adam Pauls, Dan Klein, David Chiang, and Kevin Knight. 2010. Unsupervised syntactic alignment with inversion transduction grammars. In Proceedings of the North American Conference of the Association for Computational Linguistics (NAACL). Association for Computational Linguistics. 789 Slav Petrov, Aria Haghighi, and Dan Klein. 2008. Coarse-to-fine syntactic machine translation using language projections. In Proceedings of EMNLP. Association for Computational Linguistics. M. T. Pilevar and H. Faili. 2011. Tep: Tehran englishpersian parallel corpus. In Proc. International Conference on Intelligent Text Processing and Computational Linguistics (CICLing). Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei. 2006. Hierarchical Dirichlet processes. Journal of the American Statistical Association, 101(476):1566–1581. Y. W. Teh. 2006. A hierarchical Bayesian language model based on Pitman-Yor processes. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, pages 985–992. Dekai Wu. 1997. Stochastic inversion transduction grammars and bilingual parsing of parallel corpora. Computational Linguistics, 23(3):377–403. Hao Zhang and Daniel Gildea. 2005. Stochastic lexicalized inversion transduction grammar for alignment. In Proceedings of the 43rd Annual Conference of the Association for Computational Linguistics (ACL). Association for Computational Linguistics. Hao Zhang, Chris Quirk, Robert C. Moore, and Daniel Gildea. 2008. Bayesian learning of noncompositional phrases with synchronous parsing. In Proc. of the 46th Annual Conference of the Association for Computational Linguistics: Human Language Technologies (ACL-08:HLT), pages 97–105, Columbus, Ohio, June. 790