acl acl2010 acl2010-177 acl2010-177-reference knowledge-graph by maker-knowledge-mining

177 acl-2010-Multilingual Pseudo-Relevance Feedback: Performance Study of Assisting Languages

Source: pdf

Author: Manoj Kumar Chinnakotla ; Karthik Raman ; Pushpak Bhattacharyya

Abstract: In a previous work of ours Chinnakotla et al. (2010) we introduced a novel framework for Pseudo-Relevance Feedback (PRF) called MultiPRF. Given a query in one language called Source, we used English as the Assisting Language to improve the performance of PRF for the source language. MulitiPRF showed remarkable improvement over plain Model Based Feedback (MBF) uniformly for 4 languages, viz., French, German, Hungarian and Finnish with English as the assisting language. This fact inspired us to study the effect of any source-assistant pair on MultiPRF performance from out of a set of languages with widely different characteristics, viz., Dutch, English, Finnish, French, German and Spanish. Carrying this further, we looked into the effect of using two assisting languages together on PRF. The present paper is a report of these investigations, their results and conclusions drawn therefrom. While performance improvement on MultiPRF is observed whatever the assisting language and whatever the source, observations are mixed when two assisting languages are used simultaneously. Interestingly, the performance improvement is more pronounced when the source and assisting languages are closely related, e.g., French and Spanish.

reference text

Giambattista Amati, Claudio Carpineto, and Giovanni Romano. 2004. Query Difficulty, Robustness, and Selective Application of Query Expansion. In ECIR ’04, pages 127–137. Alexandra Birch, Miles Osborne and Philipp Koehn. 2008. Predicting Success in Machine Translation. In EMNLP ’08, pages 745-754, ACL. Martin Braschler and Carol Peters. 2004. Cross-Language Evaluation Forum: Objectives, Results, Achievements. Inf. Retr., 7(1-2):7–31. Martin Braschler and Peter Sch a¨uble. 1998. Multilingual Information Retrieval based on Document Alignment Techniques. In ECDL ’98, pages 183–197, Springer-Verlag. Chris Buckley, Gerald Salton, James Allan, and Amit Singhal. 1994. Automatic Query Expansion using SMART : TREC 3. In TREC-3, pages 69–80. Guihong Cao, Jian-Yun Nie, Jianfeng Gao, and Stephen Robertson. 2008. Selecting Good Expansion Terms for Pseudo-Relevance Feedback. In SIGIR ’08, pages 243– 250. ACM. Manoj K. Chinnakotla, Karthik Raman, and Pushpak Bhattacharyya. 2010. Multilingual PRF: English Lends a Helping Hand. In SIGIR ’10, ACM. Kevyn Collins-Thompson and Jamie Callan. 2005. Query Expansion Using Random Walk Models. In CIKM ’05, pages 704–71 1. ACM. Steve Cronen-Townsend, Yun Zhou, and W. Bruce Croft. 2004. A Framework for Selective Query Expansion. In CIKM ’04, pages 236–237. ACM. Ido Dagan, Alon Itai, and Ulrike Schwall. 1991. Two Languages Are More Informative Than One. In ACL ’91, pages 130–137. ACL. A. Dempster, N. Laird, and D. Rubin. 1977. Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society, 39: 1–38. T. Susan Dumais, A. Todd Letsche, L. Michael Littman, and K. Thomas Landauer. 1997. Automatic Cross-Language Retrieval Using Latent Semantic Indexing. In AAAI ’97, pages 18–24. Wei Gao, John Blitzer, and Ming Zhou. 2008. Using English Information in Non-English Web Search. In iNEWS ’08, pages 17–24. ACM. David Hawking, Paul Thistlewaite, and Donna Harman. 1999. Scaling Up the TREC Collection. Inf. Retr., 1(12): 115–137. Hieu Hoang, Alexandra Birch, Chris Callison-burch, Richard Zens, Rwth Aachen, Alexandra Constantin, Marcello Federico, Nicola Bertoldi, Chris Dyer, Brooke Cowan, Wade Shen, Christine Moran, and Ondej Bojar. 2007. Moses: Open Source Toolkit for Statistical Machine Translation. In ACL ’07, pages 177–180. P. Jourlin, S. E. Johnson, K. Sp¨ arck Jones and P. C. Woodland. 1999. Improving Retrieval on Imperfect Speech Transcriptions (Poster Abstract). In SIGIR ’99, pages 283–284. ACM. John Lafferty and Chengxiang Zhai. 2003. Probabilistic Relevance Models Based on Document and Query Generation. Language Modeling for Information Retrieval, pages 1–10. Kluwer International Series on IR. K. Sparck Jones, S. Walker, and S. E. Robertson. 2000. A Probabilistic Model of Information Retrieval: Development and Comparative Experiments. Inf. Process. Manage., 36(6):779–808. John Lafferty and Chengxiang Zhai. 2001. Document Language Models, Query Models, and Risk Minimization for Information Retrieval. In SIGIR ’01, pages 111–1 19. ACM. Victor Lavrenko and W. Bruce Croft. 2001. Relevance Based Language Models. In SIGIR ’01, pages 120–127. ACM. Victor Lavrenko, Martin Choquette, and W. Bruce Croft. 2002. Cross-Lingual Relevance Models. In SIGIR ’02, pages 175–182, ACM. Edgar Meij, Dolf Trieschnigg, Maarten Rijke de, and Wessel Kraaij. 2009. Conceptual Language Models for Domainspecific Retrieval. Information Processing & Management, 2009. Donald Metzler and W. Bruce Croft. 2007. Latent Concept Expansion Using Markov Random Fields. In SIGIR ’07, pages 3 11–3 18. ACM. Mandar Mitra, Amit Singhal, and Chris Buckley. 1998. Improving Automatic Query Expansion. In SIGIR ’98, pages 206–214. ACM. Franz Josef Och and Hermann Ney. 2003. A Systematic Comparison of Various Statistical Alignment Models. Computational Linguistics, 29(1): 19–5 1. I. Ounis, G. Amati, Plachouras V., B. He, C. Macdonald, and Johnson. 2005. Terrier Information Retrieval Platform. In ECIR ’05, volume 3408 of Lecture Notes in Computer Science, pages 5 17–5 19. Springer. Koehn Philipp. 2005. Europarl: A Parallel Corpus for Statistical Machine Translation. In MT Summit ’05. Stephen Robertson. 2006. On GMAP: and Other Transformations. In CIKM ’06, pages 78–83. ACM. Tetsuya Sakai, Toshihiko Manabe, and Makoto Koyama. 2005. Flexible Pseudo-Relevance Feedback Via Selective Sampling. ACM TALIP, 4(2): 111–135. Tao Tao and ChengXiang Zhai. 2006. Regularized Estimation of Mixture Models for Robust Pseudo-Relevance Feedback. In SIGIR ’06, pages 162–169. ACM. Tuomas Talvensaari, Jorma Laurikkala, Kalervo J ¨arvelin, Martti Juhola, and Heikki Keskustalo. 2007. Creating and Exploiting a Comparable Corpus in Cross-language Information Retrieval. ACM Trans. Inf. Syst., 25(1):4, 2007. Jrg Tiedemann. 2001. The Use of Parallel Corpora in Monolingual Lexicography - How word alignment can identify morphological and semantic relations. In COMPLEX ’01, pages 143–151. Ellen M. Voorhees. 1994. Query Expansion Using LexicalSemantic Relations. In SIGIR ’94, pages 61–69. SpringerVerlag. 1355 Ellen Voorhees. 2006. Overview of the TREC 2005 Robust Retrieval Track. In TREC 2005, Gaithersburg, MD. NIST. Dan Wu, Daqing He, Heng Ji, and Ralph Grishman. 2008. A Study of Using an Out-of-Box Commercial MT System for Query Translation in CLIR. In iNEWS ’08, pages 71– 76. ACM. Jinxi Xu and W. Bruce Croft. 2000. Improving the Effectiveness of Information Retrieval with Local Context Analysis. ACM Trans. Inf. Syst., 18(1):79–1 12. Jinxi Xu, Alexander Fraser, and Ralph Weischedel. 2002. Empirical Studies in Strategies for Arabic Retrieval. In SIGIR ’02, pages 269–274. ACM. Yang Xu, Gareth J.F. Jones, and Bin Wang. 2009. Query Dependent Pseudo-Relevance Feedback Based on Wikipedia. In SIGIR ’09, pages 59–66. ACM. Chengxiang Zhai and John Lafferty. 2001. Model-based Feedback in the Language Modeling approach to Information Retrieval. In CIKM ’01, pages 403–410. ACM. Chengxiang Zhai and John Lafferty. 2004. A Study of Smoothing Methods for Language Models applied to Information Retrieval. ACM Transactions on Information Systems, 22(2): 179–214. 1356