acl acl2011 acl2011-181 acl2011-181-reference knowledge-graph by maker-knowledge-mining

181 acl-2011-Jigs and Lures: Associating Web Queries with Structured Entities

Source: pdf

Author: Patrick Pantel ; Ariel Fuxman

Abstract: We propose methods for estimating the probability that an entity from an entity database is associated with a web search query. Association is modeled using a query entity click graph, blending general query click logs with vertical query click logs. Smoothing techniques are proposed to address the inherent data sparsity in such graphs, including interpolation using a query synonymy model. A large-scale empirical analysis of the smoothing techniques, over a 2-year click graph collected from a commercial search engine, shows significant reductions in modeling error. The association models are then applied to the task of recommending products to web queries, by annotating queries with products from a large catalog and then mining query- product associations through web search session analysis. Experimental analysis shows that our smoothing techniques improve coverage while keeping precision stable, and overall, that our top-performing model affects 9% of general web queries with 94% precision.

reference text

[Agichtein et al.2006] Eugene Agichtein, Eric Brill, and Susan T. Dumais. 2006. Improving web search ranking by incorporating user behavior information. In SIGIR, pages 19–26. [Agirre et al.2009] Eneko Agirre, Enrique Alfonseca, Keith Hall, Jana Kravalova, Marius Pas ¸ca, and Aitor Soroa. 2009. A study on similarity and relatedness 91 using distributional and wordnet-based approaches. In NAACL, pages 19–27. [Baeza-Yates et al.2004] Ricardo Baeza-Yates, Carlos Hurtado, and Marcelo Mendoza. 2004. Query recommendation using query logs in search engines. In Wolfgang Lindner, Marco Mesiti, Can T ¨urker, Yannis Tzitzikas, and Athena Vakali, editors, EDBT Workshops, volume 3268 of Lecture Notes in Computer Science, pages 588–596. Springer. [Baeza-Yates2004] Ricardo Baeza-Yates. 2004. Web usage mining in search engines. In In Web Mining: Applications and Techniques, Anthony Scime, editor. Idea Group, pages 307–321 . [Bell et al.2007] R. Bell, Y. Koren, and C. Volinsky. 2007. Modeling relationships at multiple scales to improve accuracy of large recommender systems. In KDD, pages 95–104. [Boldi et al.2009] Paolo Boldi, Francesco Bonchi, Carlos Castillo, Debora Donato, and Sebastiano Vigna. 2009. Query suggestions using query-flow graphs. In WSCD ’09: Proceedings of the 2009 workshop on Web Search Click Data, pages 56–63. ACM. [Cohen1960] Jacob Cohen. 1960. A coefficient of agree- ment for nominal scales. Educational and Psychological Measurement, 20(1):37–46, April. [Craswell and Szummer2007] Nick Craswell and Martin Szummer. 2007. Random walks on the click graph. In SIGIR, pages 239–246. [Fuxman et al.2008] A. Fuxman, P. Tsaparas, K. Achan, and R. Agrawal. 2008. Using the wisdom of the crowds for keyword generation. In WWW, pages 61– 70. [Gao et al.2009] Jianfeng Gao, Wei Yuan, Xiao Li, Kefeng Deng, and Jian-Yun Nie. 2009. Smoothing clickthrough data for web search ranking. In SIGIR, pages 355–362. [Good1953] Irving John Good. 1953. The population frequencies of species and the estimation of population parameters. Biometrika, 40(3 and 4):237–264. [Jagabathula et al.201 1] S. Jagabathula, N. Mishra, and S. Gollapudi. 2011. Shopping for products you don’t know you need. In To appear at WSDM. [Jain and Pantel2009] Alpa Jain and Patrick Pantel. 2009. Identifying comparable entities on the web. In CIKM, pages 1661–1664. [Jelinek and Mercer1980] Frederick Jelinek and Robert L. Mercer. 1980. Interpolated estimation of markov source parameters from sparse data. In In Proceedings of the Workshop on Pattern Recognition in Practice, pages 381–397. [Katz1987] Slava M. Katz. 1987. Estimation of probabil- ities from sparse data for the language model component of a speech recognizer. In IEEE Transactions on Acoustics, Speech and Signal Processing, pages 400– 401. [Kneser and Ney1995] Reinhard Kneser and Hermann Ney. 1995. Improved backing-off for m-gram language modeling. In In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pages 181–184. [Kurland and Lee2004] O. Kurland and L. Lee. 2004. Corpus structure, language models, and ad-hoc information retrieval. In SIGIR, pages 194–201. [Lidstone1920] George James Lidstone. 1920. Note on the general case of the bayes-laplace formula for inductive or a posteriori probabilities. Transactions of the Faculty of Actuaries, 8: 182–192. [Linden et al.2003] G. Linden, B. Smith, and J. York. 2003. Amazon.com recommendations: Item-to-item collaborative filtering. IEEE Internet Computing, 7(1):76–80. [Liu and Croft2004] X. Liu and W. Croft. 2004. Clusterbased retrieval using language models. In SIGIR, pages 186–193. [Mei et al.2008a] Q. Mei, D. Zhang, and C. Zhai. 2008a. A general optimization framework for smoothing language models on graph structures. In SIGIR, pages 611–618. [Mei et al.2008b] Q. Mei, D. Zhou, and Church K. 2008b. Query suggestion using hitting time. In CIKM, pages 469–478. [Nie et al.2007] Z. Nie, J. Wen, and W. Ma. 2007. Object-level vertical search. In Conference on Innovative Data Systems Research (CIDR), pages 235–246. [Pantel and Lin2002] Patrick Pantel and Dekang Lin. 2002. Discovering word senses from text. In SIGKDD, pages 613–619, Edmonton, Canada. [Pantel et al.2004] Patrick Pantel, Deepak Ravichandran, and Eduard Hovy. 2004. Towards terascale knowledge acquisition. In COLING, pages 771–777. [Pantel et al.2009] Patrick Pantel, Eric Crestan, Arkady Borkovsky, Ana-Maria Popescu, and Vishnu Vyas. 2009. Web-scale distributional similarity and entity set expansion. In EMNLP, pages 938–947. [Pa ¸sca and Durme2008] Marius Pas ¸ca and Benjamin Van Durme. 2008. Weakly-supervised acquisition of open-domain classes and class attributes from web documents and query logs. In ACL, pages 19–27. [Ponte and Croft1998] J. Ponte and B. Croft. 1998. A language modeling approach to information retrieval. In SIGIR, pages 275–281. [Sarwar et al.2001] B. Sarwar, G. Karypis, J. Konstan, and J. Reidl. 2001. Item-based collaborative filtering recommendation system. In WWW, pages 285–295. [Tao et al.2006] T. Tao, X. Wang, Q. Mei, and C. Zhai. 2006. Language model information retrieval with document expansion. In HLT/NAACL, pages 407–414. 92 [Wen et al.2001] Ji-Rong Wen, Jian-Yun Nie, and HongJiang Zhang. 2001. Clustering user queries of a search engine. In WWW, pages 162–168. [Witten and Bell1991] I.H. Witten and T.C. Bell. 1991. The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression. IEEE Transactions on Information Theory, 37(4). [Zhai and Lafferty2001] C. Zhai and J. Lafferty. 2001. A study of smoothing methods for language models applied to ad hoc information retrieval. In SIGIR, pages 334–342. [Zhang and Nasraoui2006] Z. Zhang and O. Nasraoui. 2006. Mining search engine query logs for query recommendation. In WWW, pages 1039–1040.