emnlp emnlp2012 emnlp2012-118 emnlp2012-118-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Pidong Wang ; Preslav Nakov ; Hwee Tou Ng
Abstract: We propose a novel, language-independent approach for improving machine translation from a resource-poor language to X by adapting a large bi-text for a related resource-rich language and X (the same target language). We assume a small bi-text for the resourcepoor language to X pair, which we use to learn word-level and phrase-level paraphrases and cross-lingual morphological variants between the resource-rich and the resource-poor language; we then adapt the former to get closer to the latter. Our experiments for Indonesian/Malay–English translation show that using the large adapted resource-rich bitext yields 6.7 BLEU points of improvement over the unadapted one and 2.6 BLEU points over the original small bi-text. Moreover, combining the small bi-text with the adapted bi-text outperforms the corresponding combinations with the unadapted bi-text by 1.5– 3 BLEU points. We also demonstrate applicability to other languages and domains.
Kemal Altintas and Ilyas Cicekli. 2002. A machine translation system between a pair of closely related languages. In Proceedings of the 17th International Symposium on Computer and Information Sciences, ISCIS ’02, pages 192–196. AiTi Aw, Min Zhang, Juan Xiao, and Jian Su. 2006. A phrase-based statistical model for SMS text normalization. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, ACL-COLING ’06. Hitham Abo Bakr, Khaled Shaalan, and Ibrahim Ziedan. 2008. A hybrid approach for converting written Egyptian colloquial dialect into diacritized Arabic. In Proceedings of the 6th International Conference on Informatics and Systems, INFOS ’08. Timothy Baldwin and Su’ad Awab. 2006. Open source corpus analysis tools for Malay. In Proceedings of the 5th International Conference on Language Resources and Evaluation, LREC ’06, pages 2212–2215. Alexandra Birch, Miles Osborne, and Philipp Koehn. 2007. CCG supertags in factored statistical machine translation. In Proceedings of the Second Workshop on Statistical Machine Translation, WMT ’07, pages 9–16. Chris Callison-Burch, Philipp Koehn, and Miles Osborne. 2006. Improved statistical machine translation using paraphrases. In Proceedings of the Human Language Technology Conference of NAACL, HLTNAACL ’06, pages 17–24. Trevor Cohn and Mirella Lapata. 2007. Machine translation by triangulation: Making effective use of multiparallel corpora. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, ACL ’07, pages 728–735. Michael Collins, Philipp Koehn, and Ivona Kuˇ cerov a´. 2005. Clause restructuring for statistical machine translation. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, ACL ’05, pages 531–540. Jan Haji cˇ, Jan Hric, and Vladislav Kubo nˇ. 2000. Machine translation of very close languages. In Proceedings of the Sixth Conference on Applied Natural Language Processing, ANLP ’00, pages 7–12. Bo Han and Timothy Baldwin. 2011. Lexical normalisation of short text messages: Makn sens a #twitter. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, ACL-HLT ’ 11, pages 368–378. Kenneth Heafield and Alon Lavie. 2010. Combining machine translation output with open source: The Carnegie Mellon multi-engine machine translation scheme. The Prague Bulletin of Mathematical Linguistics, 93(1):27–36. Lu´ ıs Marujo, Nuno Grazina, Tiago Lu´ ıs, Wang Ling, Lu´ ısa Coheur, and Isabel Trancoso. 2011. BP2EP adaptation of Brazilian Portuguese texts to European Portuguese. In Proceedings of the 15th Conference of the European Association for Machine Translation, EAMT ’ 11, pages 129–136. Preslav Nakov and Hwee Tou Ng. 2009. Improved statistical machine translation for resource-poor languages using related resource-rich languages. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP ’09, pages 1358–1367. 296 Preslav Nakov and Hwee Tou Ng. 2011. Translating from morphologically complex languages: A paraphrase-based approach. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, ACL-HLT ’ 11, pages 1298–1307. Preslav Nakov and Hwee Tou Ng. 2012. Improving statistical machine translation for a resource-poor language using related resource-rich languages. Journal of Artificial Intelligence Research, 44: 179–222. Preslav Nakov and J o¨rg Tiedemann. 2012. Combining word-level and character-level models for machine translation between closely-related languages. In Proceedings ofthe 50thAnnual Meeting ofthe Association for Computational Linguistics, ACL-Short ’ 12. Kishore Papineni, Salim Roukos, Todd Ward, and WeiJing Zhu. 2002. BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, ACL ’02, pages 3 11–3 18. Eric Ristad and Peter Yianilos. 1998. Learning stringedit distance. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(5):522–532. Wael Salloum and Nizar Habash. 2011. Dialectal to Standard Arabic paraphrasing to improve Arabic- English statistical machine translation. In Proc. of the Workshop on Algorithms and Resources for Modelling of Dialects and Language Varieties, pages 10–21 . Hassan Sawaf. 2010. Arabic dialect handling in hybrid machine translation. In Proceedings of the 9th Conference of the Association for Machine Translation in the Americas, AMTA ’09. Kevin P. Scannell. 2006. Machine translation for closely related language pairs. In Proceedings of the LREC 2006 Workshop on Strategies for Developing Machine Translation for Minority Languages. J o¨rg Tiedemann. 2009. News from OPUS - a collection of multilingual parallel corpora with tools and interfaces. In Recent Advances in Natural Language Processing, volume V, pages 237–248. Masao Utiyama and Hitoshi Isahara. 2007. A comparison of pivot methods for phrase-based statistical machine translation. In Proceedings of the Human Language Technology Conference of NAACL, HLTNAACL ’07, pages 484–491. Hua Wu and Haifeng Wang. 2009. Revisiting pivot language approach for machine translation. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL, ACL ’09, pages 154–162. Xiaoheng Zhang. 1998. Dialect MT: a case study between Cantonese and Mandarin. In Proceedings of the 17th International Conference on Computational Lin- guistics, COLING ’98, pages 1460–1464.