acl acl2011 acl2011-94 acl2011-94-reference knowledge-graph by maker-knowledge-mining

94 acl-2011-Deciphering Foreign Language

Source: pdf

Author: Sujith Ravi ; Kevin Knight

Abstract: In this work, we tackle the task of machine translation (MT) without parallel training data. We frame the MT problem as a decipherment task, treating the foreign text as a cipher for English and present novel methods for training translation models from nonparallel text.

reference text

Friedrich L. Bauer. 2006. Decrypted Secrets: Methods and Maxims of Cryptology. Springer-Verlag. Phil Blunsom, Trevor Cohn, Chris Dyer, and Miles Osborne. 2009. A Gibbs sampler for phrasal synchronous grammar induction. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing (ACL-IJCNLP), pages 782–790. Peter Brown, Vincent Della Pietra, Stephen Della Pietra, and Robert Mercer. 1993. The mathematics of statistical machine translation: Parameter estimation. Computational linguistics, 19(2):263–3 11. David Chiang, Jonathan Graehl, Kevin Knight, Adam Pauls, and Sujith Ravi. 2010. Bayesian inference for finite-state transducers. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics - Human Language Technologies (NAACL/HLT), pages 447–455. Arthur P. Dempster, Nan M. Laird, and Donald B. Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39(1): 1–38. Jenny Finkel, Trond Grenager, and Christopher Manning. 2005. Incorporating non-local information into information extraction systems by Gibbs sampling. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL), pages 363– 370. Pascal Fung and Kathleen McKeown. 1997. Finding terminology translations from non-parallel corpora. In Proceedings of the Fifth Annual Workshop on Very Large Corpora, pages 192–202. Stuart Geman and Donald Geman. 1984. Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6(6):721–741 . Ulrich Germann, Michael Jahr, Kevin Knight, Daniel Marcu, and Kenji Yamada. 2001. Fast decoding and optimal decoding for machine translation. In Proceedings of the 39th Annual Meeting on Association for Computational Linguistics, pages 228–235. Sharon Goldwater and Thomas Griffiths. 2007. A fully Bayesian approach to unsupervised part-of-speech tagging. In Proceedings ofthe 45thAnnual Meeting ofthe Association of Computational Linguistics, pages 744– 751. Aria Haghighi, Percy Liang, Taylor Berg-Kirkpatrick, and Dan Klein. 2008. Learning bilingual lexicons from monolingual corpora. In Proceedings of the Annual Meeting of the Association for Computational Linguistics - Human Language Technologies (ACL/HLT), pages 771–779. Kevin Knight and Yaser Al-Onaizan. 1998. Translation with finite-state devices. In David Farwell, Laurie Gerber, and Eduard Hovy, editors, Machine Translation and the Information Soup, volume 1529 ofLecture Notes in Computer Science, pages 421–437. Springer Berlin / Heidelberg. Kevin Knight, Anish Nair, Nishit Rathod, and Kenji Ya- mada. 2006. Unsupervised analysis for decipherment problems. In Proceedings of the Joint Conference of the International Committee on Computational Linguistics and the Association for Computational Linguistics, pages 499–506. Philipp Koehn and Kevin Knight. 2000. Estimating word translation probabilities from unrelated monolingual corpora using the EM algorithm. In Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence, pages 711–715. Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ond ˇrej Bojar, Alexandra Constantin, and Evan Herbst. 2007. Moses: open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions. Philip Koehn. 2009. Statistical Machine Translation. Cambridge University Press. David McClosky, Eugene Charniak, and Mark Johnson. 2006. Effective self-training for parsing. In Proceedings ofthe main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, pages 152–159. 21 Gonzalo Navarro. 2001. A guided tour to approximate string matching. ACM Computing Surveys, 33:3 1–88, March. David Newman, Arthur Asuncion, Padhraic Smyth, and Max Welling. 2009. Distributed algorithms for topic models. Journal of Machine Learning Research, 10: 1801–1828. Kishore Papineni, Salim Roukos, Todd Ward, and WeiJing Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pages 3 11–3 18. Reinhard Rapp. 1995. Identifying word translations in non-parallel texts. In Proceedings of the Conference of the Association for Computational Linguistics, pages 320–322. Benjamin Snyder, Regina Barzilay, and Kevin Knight. 2010. A statistical model for lost language decipherment. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 1048–1057. J o¨rg Tiedemann. 2009. News from OPUS - A collection of multilingual parallel corpora with tools and interfaces. In N. Nicolov, K. Bontcheva, G. Angelova, and R. Mitkov, editors, Recent Advances in Natural Language Processing, volume V, pages 237–248. John Benjamins, Amsterdam/Philadelphia. Warren Weaver. 1955. Translation (1949). Reproduced in W.N. Locke, A.D. Booth (eds.). In Machine Translation of Languages, pages 15–23. MIT Press.