emnlp emnlp2012 emnlp2012-127 emnlp2012-127-reference knowledge-graph by maker-knowledge-mining

127 emnlp-2012-Transforming Trees to Improve Syntactic Convergence


Source: pdf

Author: David Burkett ; Dan Klein

Abstract: We describe a transformation-based learning method for learning a sequence of monolingual tree transformations that improve the agreement between constituent trees and word alignments in bilingual corpora. Using the manually annotated English Chinese Translation Treebank, we show how our method automatically discovers transformations that accommodate differences in English and Chinese syntax. Furthermore, when transformations are learned on automatically generated trees and alignments from the same domain as the training data for a syntactic MT system, the transformed trees achieve a 0.9 BLEU improvement over baseline trees.


reference text

Ann Bies, Martha Palmer, Justin Mott, and Colin Warner. 2007. English Chinese translation treebank v 1.0. Web download. LDC2007T02. William J. Black and Argyrios Vasilakopoulos. 2002. Language independent named entity classification by modified transformation-based learning and by decision tree induction. In COLING. Eric Brill and Philip Resnik. 1994. A transformationbased approach to prepositional phrase attachment dis- ambiguation. In COLING. Eric Brill. 1992. A simple rule-based part of speech tagger. In Proceedings of the workshop on Speech and Natural Language. Eric Brill. 1993. Automatic grammar induction and parsing free text: A transformation-based approach. In ACL. Eric Brill. 1995. Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging. Computational Linguistics, 21(4):543–565. David Burkett, John Blitzer, and Dan Klein. 2010. Joint parsing and alignment with weakly synchronized grammars. In NAACL:HLT. David Chiang. 2010. Learning to translate with source and target syntax. In ACL. Jonathan H. Clark, Chris Dyer, Alon Lavie, and Noah A. Smith. 2011. Better hypothesis testing for statistical machine translation: controlling for optimizer instability. In ACL:HLT. John DeNero and Dan Klein. 2007. Tailoring word alignments to syntactic machine translation. In ACL. Bradley Efron and R. J. Tibshirani. 1994. An Introduction to the Bootstrap (Chapman & Hall/CRC Monographs on Statistics & Applied Probability). Chapman and Hall/CRC. Jason Eisner. 2003. Learning non-isomorphic tree mappings for machine translation. In ACL. Victoria Fossum, Kevin Knight, and Steven Abney. 2008. Using syntax to improve word alignment for syntaxbased statistical machine translation. In ACL MT Workshop. Michel Galley, Mark Hopkins, Kevin Knight, and Daniel Marcu. 2004. What’s in a translation rule? In HLTNAACL. Liang Huang, Kevin Knight, and Aravind Joshi. 2006. Statistical syntax-directed translation with extended domain of locality. In HLT-NAACL. Jason Katz-Brown, Slav Petrov, Ryan McDonald, Franz Och, David Talbot, Hiroshi Ichikawa, Masakazu Seno, and Hideto Kazawa. 2011. Training a parser for machine translation reordering. In EMNLP. Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondrej Bojar, Alexandra Constantin, and Evan Herbst. 2007. Moses: Open source toolkit for statistical machine translation. In ACL. Percy Liang, Ben Taskar, and Dan Klein. 2006. Alignment by agreement. In HLT-NAACL. Mitchell P. Marcus, Mary Ann Marcinkiewicz, and Beatrice Santorini. 1993. Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2):313–330. Yuval Marton and Philip Resnik. 2008. Soft syntactic constraints for hierarchical phrase-based translation. In ACL:HLT. 872 Haitao Mi and Liang Huang. 2008. Forest-based translation rule extraction. In EMNLP. Franz Josef Och. 2003. Miminal error rate training in statistical machine translation. In ACL. Kishore Papineni, Salim Roukos, Todd Ward, and WeiJing Zhu. 2001. Bleu: a method for automatic evaluation of machine translation. Research report, IBM. RC22176. Slav Petrov and Dan Klein. 2007. Improved inference for unlexicalized parsing. In HLT-NAACL. Lance A. Ramshaw and Mitchell P. Marcus. 1995. Text chunking using transformation-based learning. In ACL Workshop on Very Large Corpora. Ken Samuel, Sandra Carberry, and K. Vijay-Shanker. 1998. Dialogue act tagging with transformation-based learning. In COLING. Wei Wang, Kevin Knight, and Daniel Marcu. 2007. Binarizing syntax trees to improve syntax-based machine translation accuracy. In EMNLP. Kenji Yamada and Kevin Knight. 2001. A syntax-based statistical translation model. In ACL. Hui Zhang, Min Zhang, Haizhou Li, Aiti Aw, and Chew Lim Tan. 2009. Forest-based tree sequence to string translation model. In ACL-IJCNLP. Bing Zhao, Young-Suk Lee, Xiaoqiang Luo, and Liu Li. 2011. Learning to transform and select elementary trees for improved syntax-based machine translations. In ACL:HLT. Andreas Zollmann, Ashish Venugopal, Stephan Vogel, and Alex Waibel. 2006. The CMU-AKA syntax augmented machine translation system for IWSLT-06. In IWSLT.