acl acl2011 acl2011-310 acl2011-310-reference knowledge-graph by maker-knowledge-mining

310 acl-2011-Translating from Morphologically Complex Languages: A Paraphrase-Based Approach


Source: pdf

Author: Preslav Nakov ; Hwee Tou Ng

Abstract: We propose a novel approach to translating from a morphologically complex language. Unlike previous research, which has targeted word inflections and concatenations, we focus on the pairwise relationship between morphologically related words, which we treat as potential paraphrases and handle using paraphrasing techniques at the word, phrase, and sentence level. An important advantage of this framework is that it can cope with derivational morphology, which has so far remained largely beyond the capabilities of statistical machine translation systems. Our experiments translating from Malay, whose morphology is mostly derivational, into English show signif- icant improvements over rivaling approaches based on five automatic evaluation measures (for 320,000 sentence pairs; 9.5 million English word tokens).


reference text

Mirna Adriani, Jelita Asian, Bobby Nazief, S. M.M. Tahaghoghi, and Hugh E. Williams. 2007. Stemming Indonesian: A confix-stripping approach. ACM Trans- actions on Asian Language Information Processing, 6: 1–33. Yaser Al-Onaizan, Jan Curin, Michael Jahr, Kevin Knight, John Lafferty, Dan Melamed, Franz-Josef Och, David Purdy, Noah A. Smith, and David Yarowsky. 1999. Statistical machine translation. Technical report, JHU Summer Workshop. Timothy Baldwin and Su’ad Awab. 2006. Open source corpus analysis tools for Malay. In Proceedings of the 5th International Conference on Language Resources and Evaluation, LREC ’06, pages 2212–2215. Peter F. Brown, Vincent J. Della Pietra, Stephen A. Della Pietra, and Robert L. Mercer. 1993. The mathematics of statistical machine translation: parameter estimation. Computational Linguistics, 19(2):263–3 11. Chris Callison-Burch, Philipp Koehn, and Miles Osborne. 2006. Improved statistical machine translation using paraphrases. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, HLT-NAACL ’06, pages 17–24. David Chiang. 2005. A hierarchical phrase-based model for statistical machine translation. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, ACL ’05, pages 263–270. Michael Collins, Philipp Koehn, and Ivona Kuˇ cerov a´. 2005. Clause restructuring for statistical machine translation. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, ACL ’05, pages 531–540. Christopher Dyer, Smaranda Muresan, and Philip Resnik. 2008. Generalizing word lattice translation. In Proceedings ofthe 46th Annual Meeting ofthe Association for Computational Linguistics, ACL ’08, pages 1012– 1020. Chris Dyer, Adam Lopez, Juri Ganitkevitch, Jonathan Weese, Ferhan Ture, Phil Blunsom, Hendra Setiawan, Vladimir Eidelman, and Philip Resnik. 2010. cdec: A decoder, alignment, and learning framework for finitestate and context-free translation models. In Proceedings of the ACL 2010 System Demonstrations, ACL ’ 10, pages 7–12. Christopher Dyer. 2007. The ’noisier channel’ : translation from morphologically complex languages. In Proceedings of the Second Workshop on Statistical Machine Translation, WMT ’07, pages 207–21 1. Chris Dyer. 2009. Using a maximum entropy model to build segmentation lattices for MT. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, NAACL ’09, pages 406–414. Michel Galley, Mark Hopkins, Kevin Knight, and Daniel Marcu. 2004. What’s in a translation rule? In Proceedings of the Human Language Technology Confer- ence of the North American Chapter of the Association for Computational Linguistics, HLT-NAACL ’04, pages 273–280. Sharon Goldwater and David McClosky. 2005. Improving statistical MT through morphological analysis. In Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, HLT-EMNLP ’05, pages 676–683. Nizar Habash and Fatiha Sadat. 2006. Arabic preprocessing schemes for statistical machine translation. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers, HLT-NAACL ’06, pages 49–52. Philipp Koehn and Hieu Hoang. 2007. Factored translation models. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL ’07, pages 868–876. Philipp Koehn and Kevin Knight. 2003. Empirical methods for compound splitting. In Proceedings of the 10th Conference of the European Chapter of the Association for Computational Linguistics, EACL ’03, pages 187–193. Philipp Koehn, Franz Josef Och, and Daniel Marcu. 2003. Statistical phrase-based translation. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguis- tics on Human Language Technology, NAACL ’03, pages 48–54. Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondrej Bojar, Alexandra Constantin, and Evan Herbst. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume on Demo and Poster Sessions, ACL ’07, pages 177–180. Alon Lavie and Michael J. Denkowski. 2009. The meteor metric for automatic evaluation of machine translation. Machine Translation, 23: 105–1 15. Young-Suk Lee. 2004. Morphological analysis for statistical machine translation. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, HLT-NAACL ’04, pages 57–60. 1307 Chang Liu, Daniel Dahlmeier, and Hwee Tou Ng. 2010. TESLA: Translation evaluation of sentences with linear-programming-based analysis. In Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR, WMT ’ 10, pages 354– 359. Preslav Nakov and Hwee Tou Ng. 2009. Improved statis- tical machine translation for resource-poor languages using related resource-rich languages. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, EMNLP ’09, pages 1358– 1367. Preslav Nakov. 2008. Improved statistical machine translation using monolingual paraphrases. In Proceedings ofthe 18th European Conference on Artificial Intelligence, ECAI ’08, pages 338–342. Franz Josef Och and Hermann Ney. 2003. A systematic comparison of various statistical alignment models. Computational Linguistics, 29(1): 19–5 1. Franz Josef Och. 2003. Minimum error rate training in statistical machine translation. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, ACL ’03, pages 160–167. Kishore Papineni, Salim Roukos, Todd Ward, and WeiJing Zhu. 2002. BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, ACL ’02, pages 3 11–3 18. Chris Quirk, Arul Menezes, and Colin Cherry. 2005. Dependency treelet translation: Syntactically informed phrasal SMT. In Proceedings of the 43rd Annual Meeting of the Associationfor Computational Linguistics, ACL ’05, pages 271–279. Matthew Snover, Bonnie Dorr, Richard Schwartz, Lin- nea Micciulla, and John Makhoul. 2006. A study of translation edit rate with targeted human annotation. In Proceedings of the Association for Machine Translation in the Americas, AMTA ’06, pages 223–23 1. David Talbot and Miles Osborne. 2006. Modelling lexical redundancy for machine translation. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, COLINGACL ’06, pages 969–976. Hua Wu and Haifeng Wang. 2007. Pivot language approach for phrase-based statistical machine translation. Machine Translation, 21(3): 165–181 . Mei Yang and Katrin Kirchhoff. 2006. Phrase-based backoff models for machine translation of highly inflected languages. In Proceedings of the European Chapter of the Association for Computational Linguistics, EACL ’06, pages 41–48.