acl acl2011 acl2011-339 acl2011-339-reference knowledge-graph by maker-knowledge-mining

339 acl-2011-Word Alignment Combination over Multiple Word Segmentation

Source: pdf

Author: Ning Xi ; Guangchao Tang ; Boyuan Li ; Yinggong Zhao

Abstract: In this paper, we present a new word alignment combination approach on language pairs where one language has no explicit word boundaries. Instead of combining word alignments of different models (Xiang et al., 2010), we try to combine word alignments over multiple monolingually motivated word segmentation. Our approach is based on link confidence score defined over multiple segmentations, thus the combined alignment is more robust to inappropriate word segmentation. Our combination algorithm is simple, efficient, and easy to implement. In the Chinese-English experiment, our approach effectively improved word alignment quality as well as translation performance on all segmentations simultaneously, which showed that word alignment can benefit from complementary knowledge due to the diversity of multiple and monolingually motivated segmentations. 1

reference text

Peter F. Brown, Stephen A. Della Pietra, Vincent J. Della Peitra, Robert L. Mercer. 1993. The Mathematics of statistical machine translation: parameter estimation. Computational Linguistics, 19(2):263-3 11. Pi-Chuan Chang, Michel Galley, and Christopher D. Manning. 2008. Optimizing Chinese word segmentation for machine translation performance. In Proceedings of third workshop on SMT, Pages:224-232. Tagyoung Chung and Daniel Gildea. 2009. Unsupervised tokenization for machine translation. In Pro5 ceedings of EMNLP, Pages:718-726. Christopher Dyer, Smaranda Muresan, and Philip Resnik. 2008. Generalizing word lattice translation. In Proceedings of ACL, Pages: 1012-1020. Christopher Dyer. 2009. Using a maximum entropy model to build segmentation lattices for mt. In Proceedings of NAACL, Pages:406-414. Franz Josef Och. 2003. Minimum error rate training in statistical machine translation. In Proceedings of ACL, Pages:440-447. Aria Haghighi, John Blitzer, John DeNero, and Dan Klein. 2009. Better word alignments with supervised ITG models. In Proceedings of ACL, Pages: 923-93 1. Fei Huang. 2009. Confidence measure for word alignment. In Proceedings of ACL, Pages:932-940. Philipp Koehn, Franz Josef Och and Daniel Marcu. 2003. Statistical phrase-based translation. In Proceedings of HLT-NAACL, Pages:48-54. Yang Liu, Qun Liu, Shouxun Lin. 2010. Discriminative word alignment by linear modeling. Computational Linguistics, 36(3):303-339. Yanjun Ma, Nicolas Stroppa, and Andy Way. 2007. Bootstrapping word alignment via word packing. In Proceedings of ACL, Pages:304-3 11. Yanjun Ma and Andy Way. 2009. Bilingually motivated domain-adapted word segmentation for statistical machine translation. In Proceedings of EACL, Pages:549-557. Bing Xiang, Yonggang Deng, and Bowen Zhou. 2010. Diversify and combine: improving word alignment for machine translation on low-resource languages. In Proceedings of ACL, Pages:932-940. Xinyan Xiao, Yang Liu, Young-Sook Hwang, Qun Liu, Shouxun Lin. 2010. Joint tokenization and translation. In Proceedings of COLING, Pages: 1200-1208. Jia Xu, Richard Zens, and Hermann Ney. 2004. Do we need Chinese word segmentation for statistical machine translation? In Proceedings of the ACL SIGHAN Workshop, Pages: 122-128. Jia Xu, Evgeny Matusov, Richard Zens, and Hermann Ney. 2005. Integrated Chinese word segmentation in statistical machine translation. In Proceedings of IWSLT. Ruiqiang Zhang, Keiji Yasuda, and Eiichiro Sumita. 2008. Improved statistical machine translation by multiple Chinese word segmentation. In Proceedings of the Third Workshop on SMT, Pages:216-223.