acl acl2011 acl2011-61 acl2011-61-reference knowledge-graph by maker-knowledge-mining

61 acl-2011-Binarized Forest to String Translation

Source: pdf

Author: Hao Zhang ; Licheng Fang ; Peng Xu ; Xiaoyun Wu

Abstract: Tree-to-string translation is syntax-aware and efficient but sensitive to parsing errors. Forestto-string translation approaches mitigate the risk of propagating parser errors into translation errors by considering a forest of alternative trees, as generated by a source language parser. We propose an alternative approach to generating forests that is based on combining sub-trees within the first best parse through binarization. Provably, our binarization forest can cover any non-consitituent phrases in a sentence but maintains the desirable property that for each span there is at most one nonterminal so that the grammar constant for decoding is relatively small. For the purpose of reducing search errors, we apply the synchronous binarization technique to forest-tostring decoding. Combining the two techniques, we show that using a fast shift-reduce parser we can achieve significant quality gains in NIST 2008 English-to-Chinese track (1.3 BLEU points over a phrase-based system, 0.8 BLEU points over a hierarchical phrase-based system). Consistent and significant gains are also shown in WMT 2010 in the English to German, French, Spanish and Czech tracks.

reference text

Chris Callison-Burch, Philipp Koehn, Christof Monz, Kay Peterson, Mark Przybocki, and Omar Zaidan. 2010. Findings of the 2010 joint workshop on statistical machine translation and metrics for machine translation. In Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and Metrics(MATR), pages 17–53, Uppsala, Sweden, July. Association for Computational Linguistics. Revised August 2010. David Chiang. 2005. A hierarchical phrase-based model for statistical machine translation. In Proceedings of the 43rd Annual Conference of the Association for Computational Linguistics (ACL-05), pages 263–270, Ann Arbor, MI. David Chiang. 2007. Hierarchical phrase-based translation. Computational Linguistics, 33(2):201–228. Steve DeNeefe, Kevin Knight, Wei Wang, and Daniel Marcu. 2007. What can syntax-based MT learn from phrase-based MT? In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pages 755–763, Prague, Czech Republic, June. Association for Com- putational Linguistics. Jason Eisner. 2003. Learning non-isomorphic tree mappings for machine translation. In Proceedings of the 41st Meeting of the Association for Computational Linguistics, companion volume, pages 205–208, Sapporo, Japan. Jenny Rose Finkel, Alex Kleeman, and Christopher D. Manning. 2008. Efficient, feature-based, conditional random field parsing. In Proceedings of ACL-08: HLT, pages 959–967, Columbus, Ohio, June. Association for Computational Linguistics. Michel Galley, Mark Hopkins, Kevin Knight, and Daniel Marcu. 2004. What’s in a translation rule? In Proceedings of the 2004 Meeting of the North American chapter of the Association for Computational Linguistics (NAACL-04), pages 273–280. Michel Galley, Jonathan Graehl, Kevin Knight, Daniel Marcu, Steve DeNeefe, Wei Wang, and Ignacio Thayer. 2006. Scalable inference and training of context-rich syntactic translation models. In Proceedings ofthe International Conference on Computational Linguistics/Association for Computational Linguistics (COLING/ACL-06), pages 961–968, July. Joshua Goodman. 1999. Semiring parsing. Computational Linguistics, 25(4):573–605. Jonathan Graehl and Kevin Knight. 2004. Training tree transducers. In Proceedings of the 2004 Meeting of the NorthAmerican chapter ofthe Associationfor Computational Linguistics (NAACL-04). Liang Huang, Kevin Knight, and Aravind Joshi. 2006. Statistical syntax-directed translation with extended domain of locality. In Proceedings of the 7th Biennial Conference oftheAssociationforMachine Translation in the Americas (AMTA), Boston, MA. Liang Huang. 2007. Binarization, synchronous binarization, and target-side binarization. In Proceedings of the NAACL/AMTA Workshop on Syntax and Structure in Statistical Translation (SSST), pages 33–40, Rochester, NY. Liang Huang. 2008. Forest reranking: Discriminative parsing with non-local features. In Proceedings of the 46thAnnual Conference ofthe Associationfor Computational Linguistics: Human Language Technologies (ACL-08:HLT), Columbus, OH. ACL. Philipp Koehn, Franz Josef Och, and Daniel Marcu. 2003. Statistical phrase-based translation. In Proceedings of the 2003 Meeting of the North American chapter of the Association for Computational Linguistics (NAACL-03), Edmonton, Alberta. Philipp Koehn. 2004. Statistical significance tests for machine translation evaluation. In 2004 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 388–395, Barcelona, Spain, July. Shankar Kumar, Wolfgang Macherey, Chris Dyer, and Franz Och. 2009. Efficient minimum error rate train- ing and minimum bayes-risk decoding for translation hypergraphs and lattices. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pages 163–171, Suntec, Singapore, August. Association for Computational Linguistics. Dekang Lin. 2004. A path-based transfer model for machine translation. In Proceedings of the 20th International Conference on Computational Linguistics (COLING-04), pages 625–630, Geneva, Switzerland. Yang Liu, Qun Liu, and Shouxun Lin. 2006. Tree-tostring alignment template for statistical machine translation. In Proceedings of the International Conference on Computational Linguistics/Association for Computational Linguistics (COLING/ACL-06), Sydney, Australia, July. Yang Liu, Yun Huang, Qun Liu, and Shouxun Lin. 2007. Forest-to-string statistical translation rules. In Proceedings of the 45th Annual Conference of the Associationfor Computational Linguistics (ACL-07), Prague. Haitao Mi and Liang Huang. 2008. Forest-based translation rule extraction. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, pages 206–214, Honolulu, Hawaii, October. Association for Computational Linguistics. 844 Haitao Mi, Liang Huang, and Qun Liu. 2008. Forest- based translation. In Proceedings of the 46th Annual Conference of the Association for Computational Linguistics: Human Language Technologies (ACL08:HLT), pages 192–199. Joakim Nivre and Mario Scholz. 2004. Deterministic dependency parsing of English text. In Proceedings of Coling 2004, pages 64–70, Geneva, Switzerland, Aug 23–Aug 27. COLING. Franz Josef Och and Hermann Ney. 2004. The alignment template approach to statistical machine translation. Computational Linguistics, 30(4):417–449. Franz Josef Och. 2003. Minimum error rate training for statistical machine translation. In Proceedings of the 41th Annual Conference of the Association for Computational Linguistics (ACL-03). Kishore Papineni, Salim Roukos, Todd Ward, and WeiJing Zhu. 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Conference of the Association for Computational Linguistics (ACL-02). Arjen Poutsma. 2000. Data-oriented translation. In Proceedings of the 18th International Conference on Computational Linguistics (COLING-00). Chris Quirk, Arul Menezes, and Colin Cherry. 2005. Dependency treelet translation: Syntactically informed phrasal SMT. In Proceedings of the 43rd Annual Conference of the Association for Computational Linguis- tics (ACL-05), pages 271–279, Ann Arbor, Michigan. Libin Shen, Jinxi Xu, and Ralph Weischedel. 2008. A new string-to-dependency machine translation algorithm with a target dependency language model. In Proceedings of the 46th Annual Conference of the Association for Computational Linguistics: Human Language Technologies (ACL-08:HLT), Columbus, OH. ACL. Wei Wang, Kevin Knight, and Daniel Marcu. 2007. Binarizing syntax trees to improve syntax-based machine translation accuracy. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pages 746– 754, Prague, Czech Republic, June. Association for Computational Linguistics. Richard Zens and Hermann Ney. 2006. Discriminative reordering models for statistical machine translation. In Proceedings on the Workshop on Statistical Machine Translation, pages 55–63, New York City, June. Association for Computational Linguistics. Hao Zhang, Liang Huang, Daniel Gildea, and Kevin Knight. 2006. Synchronous binarization for machine translation. In Proceedings of the 2006 Meeting of the NorthAmerican chapter ofthe Associationfor Compu- tational Linguistics (NAACL-06), pages 256–263, New York, NY. Min Hongfei Jiang, Aiti Aw, Haizhou Li, Lim Tan, and Sheng Li. 2008. A tree sequence alignment-based tree-to-tree translation model. In Proceedings of ACL-08: HLT, pages 559–567, ColumZhang, Chew bus, Ohio, June. Association for Computational Linguistics. Hui Zhang, Min Zhang, Haizhou Li, Aiti Aw, and Chew Lim Tan. 2009. Forest-based tree sequence to string translation model. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pages 172–180, Suntec, Singapore, August. Association for Computational Linguistics. Andreas Zollmann and Ashish Venugopal. 2006. Syntax augmented machine translation via chart parsing. In Proceedings on the Workshop on Statistical Machine Translation, pages 138–141, New York City, June. Association for Computational Linguistics. 845