acl acl2010 acl2010-169 acl2010-169-reference knowledge-graph by maker-knowledge-mining

169 acl-2010-Learning to Translate with Source and Target Syntax


Source: pdf

Author: David Chiang

Abstract: Statistical translation models that try to capture the recursive structure of language have been widely adopted over the last few years. These models make use of varying amounts of information from linguistic theory: some use none at all, some use information about the grammar of the target language, some use information about the grammar of the source language. But progress has been slower on translation models that are able to learn the relationship between the grammars of both the source and target language. We discuss the reasons why this has been a challenge, review existing attempts to meet this challenge, and show how some old and new ideas can be combined into a sim- ple approach that uses both source and target syntax for significant improvements in translation accuracy.


reference text

Vamshi Ambati and Alon Lavie. 2008. Improving syntax driven translation models by re-structuring divergent and non-isomorphic parse tree structures. In Proc. AMTA-2008 Student Research Workshop, pages 235–244. Yehoshua Bar-Hillel. 1953 . A quasi-arithmetical notation for syntactic description. Language, 29(1):47–58. Stanley F. Chen and Joshua Goodman. 1998. An empirical study of smoothing techniques for language modeling. Technical Report TR-10-98, Harvard University Center for Research in Computing Technology. David Chiang, Wei Wang, and Kevin Knight. 2009. 11,001 new features for statistical machine translation. In Proc. NAACL HLT 2009, pages 218–226. David Chiang. 2005. A hierarchical phrasebased model for statistical machine translation. In Proc. ACL 2005, pages 263–270. David Chiang. 2007. Hierarchical phrase-based translation. Computational Linguistics, 33(2):201–228. Michael Collins. 1997. Three generative lexicalised models for statistical parsing. In Proc. ACL-EACL, pages 16–23 . Koby Crammer and Yoram Singer. 2003 . Ultraconservative online algorithms for multiclass problems. Journal of Machine Learning Research, 3:95 1–991 . Jason Eisner. 2003 . Learning non-isomorphic tree mappings for machine translation. In Proc. ACL 2003 Companion Volume, pages 205–208. Victoria Fossum, Kevin Knight, and Steven Abney. 2008. Using syntax to improve word alignment for syntax-based statistical machine translation. In Proc. Third Workshop on Statistical Machine Translation, pages 44–52. Alexander Fraser and Daniel Marcu. 2007. Getting the structure right for word alignment: LEAF. In Proc. EMNLP 2007, pages 51–60. Michel Galley, Mark Hopkins, Kevin Knight, and Daniel Marcu. 2004. What’s in a translation rule? In Proc. HLT-NAACL 2004, pages 273–280. Michel Galley, Jonathan Graehl, Kevin Knight, Daniel Marcu, Steve DeNeefe, Wei Wang, and Ignacio Thayer. 2006. Scalable inference and training of context-rich syntactic translation models. In Proc. COLING-ACL 2006, pages 961–968. Mary Hearne and Andy Way. 2003 . Seeing the wood for the trees: Data-Oriented Translation. In Proc. MT Summit IX, pages 165–172. 1451 Liang Huang, Kevin Knight, and Aravind Joshi. 2006. Statistical syntax-directed translation with extended domain of locality. In Proc. AMTA 2006, pages 65–73. Alon Lavie, Alok Parlikar, and Vamshi Ambati. 2008. Syntax-driven learning of sub-sentential translation equivalents and translation rules from parsed parallel corpora. In Proc. SSST-2, pages 87–95. Yang Liu, Qun Liu, and Shouxun Lin. 2006. Treeto-string alignment template for statistical machine translation. In Proc. COLING-ACL 2006, pages 609–616. Yang Liu, Yajuan Lu¨, and Qun Liu. 2009. Improving tree-to-tree translation with packed forests. In Proc. ACL 2009, pages 558–566. I. Dan Melamed, Giorgio Satta, and Ben Wellington. 2004. Generalized multitext grammars. In Proc. ACL 2004, pages 661–668. Slav Petrov, Leon Barrett, Romain Thibaux, and Dan Klein. 2006. Learning accurate, compact, and interpretable tree annotation. In Proc. COLING-ACL 2006, pages 433–440. Arjen Poutsma. 2000. Data-Oriented Translation. In Proc. COLING 2000, pages 635–641 . David Talbot and Thorsten Brants. 2008. Randomized language models via perfect hash functions. In Proc. ACL-08: HLT, pages 505–513. Ashish Venugopal, Andreas Zollmann, Noah A. Smith, and Stephan Vogel. 2009. Preference grammars: Softening syntactic constraints to improve statistical machine translation. In Proc. NAACL HLT 2009, pages 236–244. David J. Weir. 1988. Characterizing Mildly ContextSensitive Grammar Formalisms. Ph.D. thesis, University of Pennsylvania. Benjamin Wellington, Sonjia Waxmonsky, and I. Dan Melamed. 2006. Empirical lower bounds on the complexity of translational equivalence. Proc. COLING-ACL 2006, pages 977–984. In Min Zhang, Hongfei Jiang, Aiti Aw, Haizhou Li, Chew Lim Tan, and Sheng Li. 2008. A tree sequence alignment-based tree-to-tree translation model. In Proc. ACL-08: HLT, pages 559–567. Andreas Zollmann and Ashish Venugopal. 2006. Syntax augmented machine translation via chart parsing. In Proc. Workshop on Statistical Machine Translation, pages 138–141 . 1452