acl acl2011 acl2011-193 acl2011-193-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Klaus Macherey ; Andrew Dai ; David Talbot ; Ashok Popat ; Franz Och
Abstract: Translating compounds is an important problem in machine translation. Since many compounds have not been observed during training, they pose a challenge for translation systems. Previous decompounding methods have often been restricted to a small set of languages as they cannot deal with more complex compound forming processes. We present a novel and unsupervised method to learn the compound parts and morphological operations needed to split compounds into their compound parts. The method uses a bilingual corpus to learn the morphological operations required to split a compound into its parts. Furthermore, monolingual corpora are used to learn and filter the set of compound part candidates. We evaluate our method within a machine translation task and show significant improvements for various languages to show the versatility of the approach.
Enrique Alfonseca, Slaven Bilac, and Stefan Paries. 2008a. Decompounding query keywords from compounding languages. InProc. ofthe 46th AnnualMeeting of the Association for Computational Linguistics (ACL): Human Language Technologies (HLT), pages 253--256, Columbus, Ohio, USA, June. Enrique Alfonseca, Slaven Bilac, and Stefan Paries. 2008b. German decompounding in a difficult corpus. In A. Gelbukh, editor, Lecture Notes in Computer Science (LNCS): Proc. of the 9th Int. Conf. on Intelligent Text Processing and Computational Linguistics (CICLING), volume 4919, pages 128--139. Springer Verlag, February. Ralf D. Brown. 2002. Corpus-Driven Splitting of Compound Words. InProc. ofthe 9th Int. Conf. on Theoretical andMethodologicalIssues in Machine Translation (TMI), pages 12--21 ,Keihanna, Japan, March. Chris Dyer. 2009. Using a maximum entropy model to build segmentation lattices for mt. In Proc. of the Human Language Technologies (HLT): The An- nual Conf. of the North American Chapter of the Association for Computational Linguistics (NAACL), pages 406--414, Boulder, Colorado, June. Nikesh Garera and David Yarowsky. 2008. Translating Compounds by Learning Component Gloss Translation Models via Multiple Languages. In Proc. of the 3rd Internation Conference on Natural Language Processing (IJCNLP), pages 403--410, Hyderabad, India, January. Philipp Koehn and Kevin Knight. 2003. Empirical methods for compound splitting. In Proc. of the 10th Conf. of the European Chapter of the Association for Computational Linguistics (EACL), volume 1, pages 187--193, Budapest, Hungary, April. Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondrej Bojar, Alexandra Constantin, and Evan Herbst. 2007. Moses: Open source toolkit for statistical machine translation. In Proc. of the 44th Annual Meeting of the Association for Computational Linguistics (ACL), volume 1,pages 177--1 80, Prague, Czech Republic, June. Eric W. Noreen. 1989. Computer-Intensive Methods for Testing Hypotheses. John Wiley & Sons, Canada. Kishore Papineni, Salim Roukos, Todd Ward, and WeiJing Zhu. 2002. Bleu: a Method for Automatic Evaluation of Machine Translation. In Proc. of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), pages 3 11--3 18, Philadelphia, Pennsylvania, July. 1404