emnlp emnlp2010 emnlp2010-55 emnlp2010-55-reference knowledge-graph by maker-knowledge-mining

55 emnlp-2010-Handling Noisy Queries in Cross Language FAQ Retrieval

Source: pdf

Author: Danish Contractor ; Govind Kothari ; Tanveer Faruquie ; L V Subramaniam ; Sumit Negi

Abstract: Recent times have seen a tremendous growth in mobile based data services that allow people to use Short Message Service (SMS) to access these data services. In a multilingual society it is essential that data services that were developed for a specific language be made accessible through other local languages also. In this paper, we present a service that allows a user to query a FrequentlyAsked-Questions (FAQ) database built in a local language (Hindi) using Noisy SMS English queries. The inherent noise in the SMS queries, along with the language mismatch makes this a challenging problem. We handle these two problems by formulating the query similarity over FAQ questions as a combinatorial search problem where the search space consists of combinations of dictionary variations of the noisy query and its top-N translations. We demonstrate the effectiveness of our approach on a real-life dataset.

reference text

Sreangsu Acharyya, Sumit Negi, L Venkata Subramaniam, Shourya Roy. 2009. Language independent 95 unsupervised learning of short message service dialect. International Journal on Document Analysis and Recognition, pp. 175-184. Aiti Aw, Min Zhang, Juan Xiao, and Jian Su. 2006. A phrase-based statistical model for SMS text normalization. In Proceedings of COLING-ACL, pp. 33-40. Peter F. Brown, Vincent J.Della Pietra, Stephen A. Della Pietra, Robert. L. Mercer 1993. The Mathematics of Statistical Machine Translation: Parameter Estimation Computational Linguistics, pp. 263-3 11. Jeunghyun Byun, Seung-Wook Lee, Young-In Song, Hae-Chang Rim. 2008. Two Phase Model for SMS Text Messages Refinement. AAAI Workshop on Enhanced Messaging. Monojit Choudhury, Rahul Saraf, Vijit Jain, Animesh Mukherjee, Sudeshna Sarkar, Anupam Basu. 2007. Investigation and modeling of the structure of texting language. International Journal on Document Analysis and Recognition, pp. 157-174. Philipp Cimiano, Antje Schultz, Sergej Sizov, Philipp Sorg, Steffen Staab. 2009. Explicit versus latent concept models for cross-language information retrieval. In Proceeding of IJCAI, pp. 1513-1518. Danish Contractor, Tanveer A. Faruquie, L. Venkata Subramaniam. 2010. Unsupervised cleansing of noisy text. In Proceeding of COLING 2010: Posters, pp. 189-196. R. Fagin, A. Lotem, and M. Naor. 2001 . Optimal aggregation algorithms for middleware. In Proceedings of the 20th ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pp. 102-1 13. Yijue How and Min-Yen Kan. 2005. Optimizing predictive text entry for short message service on mobile phones. In M. J. Smith and G. Salvendy (Eds.) Proc. of Human Computer Interfaces International,Lawrence Erlbaum Associates Valentin Jijkoun and Maarten de Rijke. 2005. Retrieving answers from frequently asked questions pages on the web. In Proceedings of the Tenth ACM Conference on Information and Knowledge Management, CIKM, pp. 76-83. Catherine Kobus, Francois Yvon and Grraldine Damnati. 2008. Normalizing SMS: Are two metaphors better than one? In Proceedings of COLING, pp. 441-448. Philipp Koehn, Hieu Hoang, Alexandra Birch Mayne, Christopher Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondrej Bojar, Alexandra Constantin, Evan Herbst 2007. Moses: Open source toolkit for statistical machine translation. Annual Meeting of the Association for Computation Linguistics (ACL), Demonstration Session . Sunil Kumar Kopparapu, Akhilesh Srivastava and Arun Pande. 2007. SMS based Natural Language Interface to Yellow Pages Directory. In Proceedings of the 4th international conference on mobile technology, applications, and systems and the 1st international symposium on Computer human interaction in mobile technology, pp. 558-563 . Govind Kothari, Sumit Negi, Tanveer Faruquie, Venkat Chakravarthy and L V Subramaniam 2009. SMS based Interface for FAQ Retrieval. Annual Meeting of the Association for Computation Linguistics (ACL). I. D. Melamed. 1999. Bitext maps and alignment via pattern recognition. Computational Linguistics, pp. 107130. Guimier de Neef, Emilie, Arnaud Debeurme, and Jungyeul Park. 2007. TILT correcteur de SMS : Evaluation et bilan quantitatif. In Actes de TALN, pp. 123132. Douglas W. Oard, Funda Ertunc. 2002. TranslationBased Indexing for Cross-Language Retrieval In Proceedings of the ECIR, pp. 324-333. A. Pirkola 1998. The Effects of Query Structure and Dictionary Setups in Dictionary-Based CrossLanguage Information Retrieval SIGIR ’98: Proceedings ofthe 21stAnnual InternationalACM SIGIR Conference on Research and Development in Information Retrieval ,pp. 55-63. E. Prochasson, C. Viard-Gaudin, and E. Morin. 2007. Language models for handwritten short message services. In Proceedings of the 9th International Conference on Document Analysis and Recognition, pp. 8387. Rudy Schusteritsch, Shailendra Rao, Kerry Rodden. 2005. Mobile Search with Text Messages: Designing the User Experience for Google SMS. In Proceedings of ACM SIGCHI, pp. 1777-1780. Satoshi Sekine, Ralph Grishman. 2003. Hindi-English cross-lingual question-answering system. ACM Transactions on Asian Language Information Processing, pp. 181-192. E. Sneiders. 1999. Automated FAQ Answering: Continued Experience with Shallow Language Understanding Question Answering Systems. Papers from the 1999 AAAI Fall Symposium. Technical Report FS-9902, AAAI Press, pp. 97-107. W. Song, M. Feng, N. Gu, and L. Wenyin. 2007. Question similarity calculation for FAQ answering. In Proceeding of SKG 07, pp. 298-301. X. Xue, J. Jeon, and W.B Croft. 2008. Retrieval Models for Question and Answer Archives. In Proceedings of SIGIR, pp. 475-482. 96