acl acl2013 acl2013-273 acl2013-273-reference knowledge-graph by maker-knowledge-mining

273 acl-2013-Paraphrasing Adaptation for Web Search Ranking

Source: pdf

Author: Chenguang Wang ; Nan Duan ; Ming Zhou ; Ming Zhang

Abstract: Mismatch between queries and documents is a key issue for the web search task. In order to narrow down such mismatch, in this paper, we present an in-depth investigation on adapting a paraphrasing technique to web search from three aspects: a search-oriented paraphrasing model; an NDCG-based parameter optimization algorithm; an enhanced ranking model leveraging augmented features computed on paraphrases of original queries. Ex- periments performed on the large scale query-document data set show that, the search performance can be significantly improved, with +3.28% and +1.14% NDCG gains on dev and test sets respectively.

reference text

Ricardo A Baeza-Yates. 1992. Introduction to data structures and algorithms related to information retrieval. Colin Bannard and Chris Callison-Burch. 2005. Paraphrasing with bilingual parallel corpora. In Proceedings of ACL, pages 597–604. Eric Brill and Robert C. Moore. 2000. An improved error model for noisy channel spelling correction. In Proceedings of ACL, pages 286–293. Jean-C´ edric Chappelier and Martin Rajman. 1998. A generalized cyk algorithm for parsing stochastic cfg. In Workshop on Tabulation in Parsing and Deduction, pages 133–137. David Chiang. 2005. A hierarchical phrase-based model for statistical machine translation. In Proceedings of ACL, pages 263–270. Nick Craswell and Martin Szummer. 2007. Random walks on the click graph. In Proceedings of SIGIR, SIGIR ’07, pages 239–246. Hang Cui, Ji-Rong Wen, Jian-Yun Nie, and Wei-Ying Ma. 2002. Probabilistic query expansion using query logs. In Proceedings of WWW, pages 325– 332. Van Dang and Bruce W. Croft. 2010. Query reformulation using anchor text. In Proceedings of WSDM, pages 41–50. Jonathan L. Elsas, Jaime Arguello, Jamie Callan, and Jaime G. Carbonell. 2008. Retrieval and feedback models for blog feed search. In Proceedings of SIGIR, pages 347–354. Jiafeng Guo, Gu Xu, Hang Li, and Xueqi Cheng. 2008. A unified and discriminative model for query refinement. In Proceedings of SIGIR, SIGIR ’08, pages 379–386. Yufeng Jing and W. Bruce Croft. 1994. An association thesaurus for information retrieval. In In RIAO 94 Conference Proceedings, pages 146–160. Thorsten Joachims. 2006. Training linear svms in linear time. In Proceedings of KDD, pages 217–226. Rosie Jones, Benjamin Rey, Omid Madani, and Wiley Greiner. 2006. Generating query substitutions. In Proceedings of WWW, pages 387–396. Philipp Koehn, Franz Josef Och, and Daniel Marcu. 2003. Statistical phrase-based translation. In Proceedings of NAACL, pages 48–54. Victor Lavrenko and W. Bruce Croft. 2001 . Relevance based language models. In Proceedings of SIGIR, pages 120–127. Dekang Lin and Patrick Pantel. 2001 . Discovery of inference rules for question-answering. Natural Language Engineering, pages 343–360. Tie-Yan Liu, Jun Xu, Tao Qin, Wenying Xiong, and Hang Li. 2007. Letor: Benchmark dataset for research on learning to rank for information retrieval. In Proceedings of SIGIR workshop, pages 3–10. George A Miller. 1995. Wordnet: a lexical database for english. Communications of the ACM, pages 39– 41. Franz Josef Och and Hermann Ney. 2000. Improved statistical alignment models. In Proceedings of ACL, pages 440–447. Franz Josef Och. 2003. Minimum error rate training in statistical machine translation. In Proceedings of ACL, pages 160–167. Kishore Papineni, Salim Roukos, Todd Ward, and WeiJing Zhu. 2002. Bleu: A method for automatic evaluation of machine translation. In Proceedings of ACL, pages 311–318. Chris Quirk, Chris Brockett, and William Dolan. 2004. Monolingual machine translation for paraphrase generation. In Proceedings ofEMNLP, pages 142–149. Mark D Smucker, James Allan, and Ben Carterette. 2007. A comparison of statistical significance tests for information retrieval evaluation. In Proceedings of CIKM, pages 623–632. 45 Xuanhui Wang and ChengXiang Zhai. 2008. Mining term association patterns from search logs for effective query reformulation. In Proceedings of the 1 ACM conference on Information and knowl7th edge management, Proceedings of CIKM, pages 479–488. Yang Xu, Gareth J.F. Jones, and Bin Wang. 2009. Query dependent pseudo-relevance feedback based on wikipedia. In Proceedings of SIGIR, pages 59– 66. Shipeng Yu, Deng Cai, Ji-Rong Wen, and Wei-Ying Ma. 2003. Improving pseudo-relevance feedback in web information retrieval using web page segmentation. In Proceedings of WWW, pages 11–18. Wei Zhang and Clement Yu. 2006. Uic at trec 2006 blog track. In Proceedings of TREC. Shiqi Zhao, Ming Zhou, and Ting Liu. 2007. Learning question paraphrases for qa from encarta logs. In Proceedings of IJCAI, pages 1795–1800. Shiqi Zhao, Xiang Lan, Ting Liu, and Sheng Li. 2009. Application-driven statistical paraphrase generation. In Proceedings of ACL, pages 834–842. 46