acl acl2011 acl2011-182 acl2011-182-reference knowledge-graph by maker-knowledge-mining

182 acl-2011-Joint Annotation of Search Queries

Source: pdf

Author: Michael Bendersky ; W. Bruce Croft ; David A. Smith

Abstract: W. Bruce Croft Dept. of Computer Science University of Massachusetts Amherst, MA cro ft @ c s .uma s s .edu David A. Smith Dept. of Computer Science University of Massachusetts Amherst, MA dasmith@ c s .umas s .edu articles or web pages). As previous research shows, these differences severely limit the applicability of Marking up search queries with linguistic annotations such as part-of-speech tags, capitalization, and segmentation, is an impor- tant part of query processing and understanding in information retrieval systems. Due to their brevity and idiosyncratic structure, search queries pose a challenge to existing NLP tools. To address this challenge, we propose a probabilistic approach for performing joint query annotation. First, we derive a robust set of unsupervised independent annotations, using queries and pseudo-relevance feedback. Then, we stack additional classifiers on the independent annotations, and exploit the dependencies between them to further improve the accuracy, even with a very limited amount of available training data. We evaluate our method using a range of queries extracted from a web search log. Experimental results verify the effectiveness of our approach for both short keyword queries, and verbose natural language queries.

reference text

Niranjan Balasubramanian and James Allan. 2009. Syntactic query models for restatement retrieval. In Proc. of SPIRE, pages 143–155. Cory Barr, Rosie Jones, and Moira Regelson. 2008. The linguistic structure of english web-search queries. In Proc. of EMNLP, pages 1021–1030. Michael Bendersky and W. Bruce Croft. 2009. Analysis of long queries in a large scale search log. In Proc. of Workshop on Web Search Click Data, pages 8–14. Michael Bendersky, David Smith, and W. Bruce Croft. 2009. Two-stage query segmentation for information retrieval. In Proc. of SIGIR, pages 810–81 1. Michael Bendersky, W. Bruce Croft, and David A. Smith. 2010. Structural annotation of search queries using pseudo-relevance feedback. In Proc. of CIKM, pages 1537–1540. Shane Bergsma and Qin I. Wang. 2007. Learning noun phrase query segmentation. In Proc. of EMNLP, pages 819–826. Thorsten Brants and Alex Franz. 2006. Web 1T 5-gram Version 1. Chris Buckley. 1995. Automatic query expansion using SMART. In Proc. of TREC-3, pages 69–80. Jenny R. Finkel and Christopher D. Manning. 2009. Joint parsing and named entity recognition. In Proc. of NAACL, pages 326–334. Jiafeng Guo, Gu Xu, Hang Li, and Xueqi Cheng. 2008. A unified and discriminative model for query refinement. In Proc. of SIGIR, pages 379–386. Jiafeng Guo, Gu Xu, Xueqi Cheng, and Hang Li. 2009. Named entity recognition in query. In Proc. of SIGIR, pages 267–274. Matthias Hagen, Martin Potthast, Benno Stein, and Christof Braeutigam. 2010. The power of naive query segmentation. In Proc. of SIGIR, pages 797–798. Matthias Hagen, Martin Potthast, Benno Stein, and Christof Br ¨autigam. 2011. Query segmentation revisited. In Proc. of WWW, pages 97–106. Rosie Jones and Daniel C. Fain. 2003. Query word deletion prediction. In Proc. of SIGIR, pages 435–436. Rosie Jones, Benjamin Rey, Omid Madani, and Wiley Greiner. 2006. Generating query substitutions. In Proc. of WWW, pages 387–396. Giridhar Kumaran and James Allan. 2007. A case for shorter queries, and helping user create them. In Proc. of NAACL, pages 220–227. Giridhar Kumaran and Vitor R. Carvalho. 2009. Reducing long queries using query quality predictors. In Proc. of SIGIR, pages 564–571. John D. Lafferty, Andrew McCallum, and Fernando C. N. Pereira. 2001 . Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proc. of ICML, pages 282–289. 111 Victor Lavrenko and W. Bruce Croft. 2001. Relevance based language models. In Proc. of SIGIR, pages 120– 127. Matthew Lease. 2007. Natural language processing for information retrieval: the time is ripe (again). In Proceedings of PIKM. Xiao Li. 2010. Understanding the semantic structure of noun phrase queries. In Proc. of ACL, pages 1337– 1345, Morristown, NJ, USA. Rachel T. Lo, Ben He, and Iadh Ounis. 2005. Automatically building a stopword list for an information retrieval system. In Proc. of DIR. Yumao Lu, Fuchun Peng, Gilad Mishne, Xing Wei, and Benoit Dumoulin. 2009. Improving Web search relevance with semantic features. In Proc. of EMNLP, pages 648–657. Mehdi Manshadi and Xiao Li. 2009. Semantic Tagging of Web Search Queries. In Proc. of ACL, pages 861 869. Andr e´ F. T. Martins, Dipanjan Das, Noah A. Smith, and Eric P. Xing. 2008. Stacking dependency parsers. In Proc. of EMNLP, pages 157–166. Joakim Nivre and Ryan McDonald. 2008. Integrating graph-based and transition-based dependency parsers. In Proc. of ACL, pages 950–958. Marius Pas ¸ca. 2007. Weakly-supervised discovery of named entities using web search queries. In Proc. of CIKM, pages 683–690. Dou Shen, Toby Walkery, Zijian Zhengy, Qiang Yangz, and Ying Li. 2008. Personal name classification in web queries. In Proc. of WSDM, pages 149–158. Bin Tan and Fuchun Peng. 2008. Unsupervised query segmentation using generative language models and Wikipedia. In Proc. of WWW, pages 347–356. Kristina Toutanova, Aria Haghighi, and Christopher D. Manning. 2008. A global joint model for semantic role labeling. ComputationalLinguistics, 34: 161–191, June. Xing Wei, Fuchun Peng, and Benoit Dumoulin. 2008. Analyzing web text association to disambiguate abbre- viation in queries. In Proc. of SIGIR, pages 75 1–752.