acl acl2012 acl2012-160 acl2012-160-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Ai Ti Aw ; Lian Hau Lee
Abstract: This paper describes the personalized normalization of a multilingual chat system that supports chatting in user defined short-forms or abbreviations. One of the major challenges for multilingual chat realized through machine translation technology is the normalization of non-standard, self-created short-forms in the chat message to standard words before translation. Due to the lack of training data and the variations of short-forms used among different social communities, it is hard to normalize and translate chat messages if user uses vocabularies outside the training data and create short-forms freely. We develop a personalized chat normalizer for English and integrate it with a multilingual chat system, allowing user to create and use personalized short-forms in multilingual chat. 1
AiTi Aw, Min Zhang, Juan Xiao, and Jian Su. 2006. A Phrase-based statistical model for SMS text normalization. In Proc. Of the COLING/ACL 2006 Main Conference Poster Sessions, pages 33-40. Sydney. Monojit Choudhury, Rahul Saraf, Vijit Jain, Animesh Mukherjee, Sudeshna Sarkar, and Anupam Basu. 2007. Investigation and modeling of the structure of texting language. International Journal on Document Analysis and Recognition, 10: 157–174. Paul Cook and Suzanne Stevenson. 2009. An unsupervised model for text message normalization. In CALC ’09: Proceedings of the Workshop on Computational Approaches to Linguistic Creativity, pages 71–78, Boulder, USA. Bo Han and Timothy Baldwin. 2011. Leixcal Normalisation of Short Text Messages: Makn Sens a #twitter. In Proc. Of the 49th Annual Meeting of the Association for Computational Linguistics, pages 368-378, Portland, Oregon, USA. Yijue How and Min-Yen Kan. 2005. Optimizing predictive text entry for short message service on mobile phones. In Proceedings of HCII. Philipp Koehn &al.; Moses: Open Source Toolkit for Statistical Machine Translation, ACL 2007, demonstration session. Koehn, P. (2005). Europarl: A Parallel Corpus for Statistical Machine Translation. In Machine Translation Summit X (pp. 79{ 86). Phuket, Thailand. Franz Josef Och. 2003. Minimum error rate training for statistical machine translation. In Proceedings of the 41th Annual Meeting of the Association Computational Linguistics, Sapporo, July. C. for Shannon. 1948. A mathematical theory of communication. Bell System Technical Journal 27(3): 379-423 A. Stolcke. 2003 SRILM an Extensible Language Modeling Toolkit. In International Conference on Spoken Language Processing, Denver, USA. – 36