acl acl2011 acl2011-197 acl2011-197-reference knowledge-graph by maker-knowledge-mining

197 acl-2011-Latent Class Transliteration based on Source Language Origin


Source: pdf

Author: Masato Hagiwara ; Satoshi Sekine

Abstract: Transliteration, a rich source of proper noun spelling variations, is usually recognized by phonetic- or spelling-based models. However, a single model cannot deal with different words from different language origins, e.g., “get” in “piaget” and “target.” Li et al. (2007) propose a method which explicitly models and classifies the source language origins and switches transliteration models accordingly. This model, however, requires an explicitly tagged training set with language origins. We propose a novel method which models language origins as latent classes. The parameters are learned from a set of transliterated word pairs via the EM algorithm. The experimental results of the transliteration task of Western names to Japanese show that the proposed model can achieve higher accuracy compared to the conventional models without latent classes.


reference text

Farooq Ahmad and Grzegorz Kondrak. 2005. Learning a spelling error model from search query logs. In Proc. of EMNLP-2005, pages 955–962. Eric Brill and Robert C. Moore. 2000. An improved error model for noisy channel spelling. In Proc. ACL2000, pages 286–293. Eric Brill, Gary Kacmarcik, and Chris Brockett. 2001. Automatically harvesting katakana-english term pairs from search engine query logs. In Proc. NLPRS-2001, pages 393–399. Masato Hagiwara and Hisami Suzuki. 2009. Japanese query alteration based on semantic similarity. In Proc. of NAACL-2009, page 191. Kevin Knight and Graehl Jonathan. 1998. Machine transliteration. Computational Linguistics, 24:599– 612. 57