acl acl2011 acl2011-171 acl2011-171-reference knowledge-graph by maker-knowledge-mining

171 acl-2011-Incremental Syntactic Language Models for Phrase-based Translation

Source: pdf

Author: Lane Schwartz ; Chris Callison-Burch ; William Schuler ; Stephen Wu

Abstract: This paper describes a novel technique for incorporating syntactic knowledge into phrasebased machine translation through incremental syntactic parsing. Bottom-up and topdown parsers typically require a completed string as input. This requirement makes it difficult to incorporate them into phrase-based translation, which generates partial hypothesized translations from left-to-right. Incremental syntactic language models score sentences in a similar left-to-right fashion, and are therefore a good mechanism for incorporat- ing syntax into phrase-based translation. We give a formal definition of one such lineartime syntactic language model, detail its relation to phrase-based decoding, and integrate the model with the Moses phrase-based translation system. We present empirical results on a constrained Urdu-English translation task that demonstrate a significant BLEU score improvement and a large decrease in perplexity.

reference text

Anne Abeill´ e, Yves Schabes, and Aravind K. Joshi. 1990. Using lexicalized tree adjoining grammars for machine translation. In Proceedings of the 13th International Conference on Computational Linguistics. Anthony E. Ades and Mark Steedman. 1982. On the order of words. Linguistics and Philosophy, 4:5 17– 558. Kathy Baker, Steven Bethard, Michael Bloodgood, Ralf Brown, Chris Callison-Burch, Glen Coppersmith, Bonnie Dorr, Wes Filardo, Kendall Giles, Anni Irvine, Mike Kayser, Lori Levin, Justin Martineau, Jim Mayfield, Scott Miller, Aaron Phillips, Andrew Philpot, Christine Piatko, Lane Schwartz, and David Zajic. 2009. Semantically informed machine translation (SIMT). SCALE summer workshop final report, Human Language Technology Center Of Excellence. Alexandra Birch, Miles Osborne, and Philipp Koehn. 2007. CCG supertags in factored statistical machine translation. In Proceedings of the Second Workshop on Statistical Machine Translation, pages 9–16. Thorsten Brants, Ashok C. Popat, Peng Xu, Franz J. Och, and Jeffrey Dean. 2007. Large language models in machine translation. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL). Peter Brown, John Cocke, Stephen Della Pietra, Vincent Della Pietra, Frederick Jelinek, John Lafferty, Robert Mercer, and Paul Roossin. 1990. A statistical approach to machine translation. Computational Linguistics, 16(2):79–85. Eugene Charniak, Kevin Knight, and Kenji Yamada. 2003. Syntax-based language models for statistical machine translation. In Proceedings of the Ninth Ma- chine Translation Summit of the International Association for Machine Translation. Ciprian Chelba and Frederick Jelinek. 1998. Exploiting syntactic structure for language modeling. In Proceedings ofthe 36thAnnual Meeting ofthe Association for Computational Linguistics and 17th International Conference on Computational Linguistics, pages 225– 23 1. Ciprian Chelba and Frederick Jelinek. 2000. Structured language modeling. Computer Speech and Language, 14(4):283–332. Stanley F. Chen and Joshua Goodman. 1998. An empirical study of smoothing techniques for language modeling. Technical report, Harvard University. Colin Cherry. 2008. Cohesive phrase-based decoding for statistical machine translation. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 72–80. 629 David Chiang. 2005. A hierarchical phrase-based model for statistical machine translation. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, pages 263–270. David Chiang. 2010. Learning to translate with source and target syntax. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 1443–1452. John Cocke and Jacob Schwartz. 1970. Programming languages and their compilers. Technical report, Courant Institute of Mathematical Sciences, New York University. Michael Collins, Brian Roark, and Murat Saraclar. 2005. Discriminative syntactic language modeling for speech recognition. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, pages 507–514. Brooke Cowan, Ivona Ku˘ cerov a´, and Michael Collins. 2006. A discriminative model for tree-to-tree translation. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, pages 232–241 . Steve DeNeefe and Kevin Knight. 2009. Synchronous tree adjoining machine translation. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, pages 727–736. Steve DeNeefe, Kevin Knight, Wei Wang, and Daniel Marcu. 2007. What can syntax-based MT learn from phrase-based MT? In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pages 755–763. Yuan Ding and Martha Palmer. 2005. Machine trans- lation using probabilistic synchronous dependency insertion grammars. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, pages 541–548. Jay Earley. 1968. An efficient context-free parsing algorithm. Ph.D. thesis, Department of Computer Science, Carnegie Mellon University. Jason Eisner. 2003. Learning non-isomorphic tree mappings for machine translation. In The Companion Volume to the Proceedings of 41st Annual Meeting of the Association for Computational Linguistics, pages 205–208. Michel Galley and Christopher D. Manning. 2009. Quadratic-time dependency parsing for machine translation. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pages 773–781. Michel Galley, Mark Hopkins, Kevin Knight, and Daniel Marcu. 2004. What’s in a translation rule? In Daniel Marcu Susan Dumais and Salim Roukos, editors, Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, pages 273– 280. Michel Galley, Jonathan Graehl, Kevin Knight, Daniel Marcu, Steve DeNeefe, Wei Wang, and Ignacio Thayer. 2006. Scalable inference and training of context-rich syntactic translation models. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, pages 961– 968. Niyu Ge. 2010. A direct syntax-driven reordering model for phrase-based machine translation. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 849–857. Daniel Gildea. 2003. Loosely tree-based alignment for machine translation. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, pages 80–87. Jonathan Graehl and Kevin Knight. 2004. Training tree transducers. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, pages 105–1 12. Hany Hassan, Khalil Sima’an, and Andy Way. 2007. Supertagged phrase-based statistical machine translation. In Proceedings ofthe 45thAnnualMeeting oftheAssociation of Computational Linguistics, pages 288–295. James Henderson. 2004. Lookahead in deterministic left-corner parsing. In Proceedings of the Workshop on Incremental Parsing: Bringing Engineering and Cognition Together, pages 26–33. Liang Huang and Haitao Mi. 2010. Efficient incremental decoding for tree-to-string translation. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pages 273–283. Liang Huang and Kenji Sagae. 2010. Dynamic programming for linear-time incremental parsing. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 1077–1086. Liang Huang, Kevin Knight, and Aravind Joshi. 2006. Statistical syntax-directed translation with extended domain of locality. In Proceedings of the 7th Biennial conference of the Association for Machine Translation in the Americas. Kenji Imamura, Hideo Okuma, Taro Watanabe, and Eiichiro Sumita. 2004. Example-based machine translation based on syntactic transfer with statistical models. In Proceedings of the 20th International Conference on Computational Linguistics, pages 99–105. 630 Frederick Jelinek. 1969. Fast sequential decoding algorithm using a stack. IBM Journal of Research and Development, pages 675–685. T. Kasami. 1965. An efficient recognition and syntax analysis algorithm for context free languages. Technical Report AFCRL-65-758, Air Force Cambridge Research Laboratory. Philipp Koehn, Franz Joseph Och, and Daniel Marcu. 2003. Statistical phrase-based translation. In Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, pages 127–133. Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondrej Bojar, Alexandra Constantin, and Evan Herbst. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, pages 177–180. Philipp Koehn. 2010. Statistical Machine Translation. Cambridge University Press. Yang Liu, Qun Liu, and Shouxun Lin. 2006. Tree-tostring alignment template for statistical machine translation. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, pages 609–616. Yang Liu, Yun Huang, Qun Liu, and Shouxun Lin. 2007. Forest-to-string statistical translation rules. In Proceedings ofthe 45thAnnual Meeting ofthe Association of Computational Linguistics, pages 704–71 1. Yang Liu, Yajuan L u¨, and Qun Liu. 2009. Improving tree-to-tree translation with packed forests. In Pro- ceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pages 558–566. Mitchell P. Marcus, Beatrice Santorini, and Mary Ann Marcinkiewicz. 1993. Building a large annotated corpus of English: the Penn Treebank. Computational Linguistics, 19(2):313–330. I. Dan Melamed. 2004. Statistical machine translation by parsing. In Proceedings of the 42nd Meeting of the Association for Computational Linguistics, pages 653–660. Haitao Mi and Liang Huang. 2008. Forest-based translation rule extraction. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, pages 206–214. Haitao Mi, Liang Huang, and Qun Liu. 2008. Forestbased translation. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 192–199. Kevin P. Murphy and Mark A. Paskin. 2001. Linear time inference in hierarchical HMMs. In Proceedings of Neural Information Processing Systems, pages 833– 840. Rebecca Nesson, Stuart Shieber, and Alexander Rush. 2006. Induction of probabilistic synchronous treeinsertion grammars for machine translation. In Proceedings of the 7th Biennial conference of the Associ- ation for Machine Translation in the Americas, pages 128–137. Franz Josef Och and Hermann Ney. 2002. Discriminative training and maximum entropy models for statistical machine translation. In Proceedings of 40th Annual Meeting of the Association for Computational Linguistics, pages 295–302. Franz Josef Och, Daniel Gildea, Sanjeev Khudanpur, Anoop Sarkar, Kenji Yamada, Alex Fraser, Shankar Kumar, Libin Shen, David Smith, Katherine Eng, Viren Jain, Zhen Jin, and Dragomir Radev. 2004. A smorgasbord of features for statistical machine translation. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, pages 161–168. Matt Post and Daniel Gildea. 2008. Parsers as language models for statistical machine translation. In Proceedings of the Eighth Conference of the Association for Machine Translation in the Americas, pages 172–181 . Matt Post and Daniel Gildea. 2009. Language modeling with tree substitution grammars. In NIPS workshop on Grammar Induction, Representation of Language, and Language Learning. Arjen Poutsma. 1998. Data-oriented translation. In Ninth Conference of Computational Linguistics in the Netherlands. Chris Quirk, Arul Menezes, and Colin Cherry. 2005. De- pendency treelet translation: Syntactically informed phrasal SMT. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, pages 271–279. Lawrence R. Rabiner. 1990. A tutorial on hidden Markov models and selected applications in speech recognition. Readings in speech recognition, 53(3):267–296. Brian Roark. 2001. Probabilistic top-down parsing and language modeling. Computational Linguistics, 27(2):249–276. William Schuler, Samir AbdelRahman, Tim Miller, and Lane Schwartz. 2010. Broad-coverage incremental parsing using human-like memory constraints. Computational Linguistics, 36(1): 1–30. William Schuler. 2009. Positive results for parsing with a bounded stack using a model-based right-corner trans631 form. In Proceedings of Human Language Technologies: The 2009Annual Conference ofthe NorthAmerican Chapter of the Association for Computational Linguistics, pages 344–352. Libin Shen, Jinxi Xu, and Ralph Weischedel. 2008. A new string-to-dependency machine translation algorithm with a target dependency language model. In Proceedings of the 46th Annual Meeting of the Asso- ciation for Computational Linguistics: Human Language Technologies, pages 577–585. Stuart M. Shieber and Yves Schabes. 1990. Synchronous tree adjoining grammars. In Proceedings of the 13th International Conference on Computational Linguistics. Stuart M. Shieber. 2004. Synchronous grammars as tree transducers. In Proceedings of the Seventh International Workshop on Tree Adjoining Grammar and Related Formalisms. Mark Steedman. 2000. The syntactic process. MIT Press/Bradford Books, Cambridge, MA. Dekai Wu. 1997. Stochastic inversion transduction grammars and bilingual parsing of parallel corpora. Computational Linguistics, 23(3):377–403. Kenji Yamada and Kevin Knight. 2001. A syntax-based statistical translation model. In Proceedings of 39th Annual Meeting of the Association for Computational Linguistics, pages 523–530. D.H. Younger. 1967. Recognition and parsing of context-free languages in time n cubed. Information and Control, 10(2): 189–208. Min Zhang, Hongfei Jiang, Ai Ti Aw, Jun Sun, Seng Li, and Chew Lim Tan. 2007. A tree-to-tree alignmentbased model for statistical machine translation. In Proceedings of the 11th Machine Translation Summit of the International Association for Machine Transla- tion, pages 535–542. Andreas Zollmann and Ashish Venugopal. 2006. Syntax augmented machine translation via chart parsing. In Proceedings of the Workshop on Statistical Machine Translation, pages 138–141 .