emnlp emnlp2012 emnlp2012-111 emnlp2012-111-reference knowledge-graph by maker-knowledge-mining

111 emnlp-2012-Regularized Interlingual Projections: Evaluation on Multilingual Transliteration


Source: pdf

Author: Jagadeesh Jagarlamudi ; Hal Daume III

Abstract: In this paper, we address the problem of building a multilingual transliteration system using an interlingual representation. Our approach uses international phonetic alphabet (IPA) to learn the interlingual representation and thus allows us to use any word and its IPA representation as a training example. Thus, our approach requires only monolingual resources: a phoneme dictionary that lists words and their IPA representations.1 By adding a phoneme dictionary of a new language, we can readily build a transliteration system into any of the existing previous languages, without the expense of all-pairs data or computation. We also propose a regularization framework for learning the interlingual representation, which accounts for language specific phonemic variability, and thus it can find better mappings between languages. Experimental results on the name transliteration task in five diverse languages show a maximum improvement of 29% accuracy and an average improvement of 17% accuracy compared to a state-of-the-art baseline system.


reference text

Nasreen AbdulJaleel and Leah S. Larkey. 2003. Statistical transliteration for english-arabic cross language information retrieval. In Proceedings of the twelfth international conference on Information and knowledge management, CIKM ’03, pages 139–146, New York, NY, USA. ACM. Yaser Al-Onaizan and Kevin Knight. 2002. Machine transliteration of names in arabic text. In Proceedings of the ACL-02 workshop on Computational approaches to semitic languages, SEMITIC ’02, pages 1–13, Stroudsburg, PA, USA. ACL. Wei Gao, Kam fai Wong, and Wai Lam. 2004. Phonemebased transliteration of foreign names for OOV problem. In Proceedings ofthe 1stInternational Joint Conference on Natural Language Processing (IJCNLP), pages 374–381. Li Haizhou, Zhang Min, and Su Jian. 2004. A joint source-channel model for machine transliteration. In Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, ACL ’04, Stroudsburg, PA, USA. ACL. Martin Haspelmath, Matthew Dryer, David Gil, and Bernard Comrie, editors. 2005. The World Atlas of Language Structures. Oxford University Press. Ulf Hermjakob, Kevin Knight, and Hal Daumé III. 2008. Name translation in statistical machine translation learning when to transliterate. In Proceedings of ACL08: HLT, pages 389–397, Columbus, Ohio, June. ACL. Harold Hotelling. 1936. Relation between two sets of variables. Biometrica, 28:322–377. Sung Young Jung, SungLim Hong, and Eunok Paek. 2000. An english to korean transliteration model of extended markov window. In Proceedings of the 18th conference on Computational linguistics - Volume 1, COLING ’00, pages 383–389, Stroudsburg, PA, USA. ACL. Byung-Ju Kang and Key-Sun Choi. 2000. Two ap- proaches for the resolution of word mismatch problem caused by english words and foreign words in korean information retrieval. In Proceedings of the 5th international workshop on on Information retrieval with Asian languages, IRAL ’00, pages 133–140, New York, NY, USA. ACM. Mitesh M. Khapra, Raghavendra Udupa, A. Kumaran, and Pushpak Bhattacharyya. 2010. Pr + rq ≈ pq: Transliteration mining using bridge language. qI n≈ P prqo-: ceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2010, Atlanta, Georgia, USA, July. AAAI Press. Alexandre Klementiev and Dan Roth. 2006. Weakly supervised named entity transliteration and discovery 22 from multilingual comparable corpora. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, ACL44, pages 8 17–824, Stroudsburg, PA, USA. ACL. Kevin Knight and Jonathan Graehl. 1998. Machine transliteration. Computational Linguistics, 24(4):599– 612. Haizhou Li, A. Kumaran, Vladimir Pervouchine, and Min Zhang. 2009. Report of news 2009 machine transliteration shared task. In Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration, NEWS ’09, pages 1–18, Stroudsburg, PA, USA. ACL. Thomas Mandl and Christa Womser-Hacker. 2005. The effect of named entities on effectiveness in crosslanguage information retrieval evaluation. In Proceedings of the 2005 ACM symposium on Applied computing, SAC ’05, pages 1059–1064, New York, NY, USA. ACM. Gideon S. Mann and David Yarowsky. 2001. Multipath translation lexicon induction via bridge languages. In Proceedings of the 2nd meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies, NAACL ’01, pages 1 8, Stroudsburg, PA, USA. ACL. M. K. Odel and R. C. Russel. 1918. U.s. patent numbers, 1,261,167 (1918) and 1,435,663(1922). Michael Paul and Eiichiro Sumita. 2011. Translation quality indicators for pivot-based statistical mt. In Proceedings of 5th International Joint Conference on Natural Language Processing, pages 811–818, Chiang Mai, Thailand, November. AFNLP. Sujith Ravi and Kevin Knight. 2009. Learning phoneme mappings for transliteration without parallel data. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, 37–45, Boulder, Colorado, June. ACL. Xabier Saralegi, Iker Manterola, and Iñaki San Vicente. 2011. Analyzing methods for improving precision of pivot based bilingual dictionaries. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pages 846–856, Edinburgh, Scotland, UK., July. ACL. Richard Sproat, Tao Tao, and ChengXiang Zhai. 2006. Named entity transliteration with comparable corpora. In Proceedings ofthe 21st International Conference on pages Computational Linguistics and the 44th annual meet- ing of the Association for Computational Linguistics, PA, USA. ACL. Tao Tao, Su-Youn Yoon, Andrew Fister, Richard Sproat, and ChengXiang Zhai. 2006. Unsupervised named ACL-44, pages 73–80, Stroudsburg, entity transliteration using temporal and phonetic correlation. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, EMNLP ’06, pages 250–257, Stroudsburg, PA, USA. ACL. Raghavendra Udupa and Mitesh M. Khapra. 2010. Transliteration equivalence using canonical correlation analysis. In ECIR’10, pages 75–86. Raghavendra Udupa, Saravanan K, Anton Bakalov, and Abhijit Bhole. 2009.