emnlp emnlp2010 emnlp2010-73 emnlp2010-73-reference knowledge-graph by maker-knowledge-mining

73 emnlp-2010-Learning Recurrent Event Queries for Web Search

Source: pdf

Author: Ruiqiang Zhang ; Yuki Konda ; Anlei Dong ; Pranam Kolari ; Yi Chang ; Zhaohui Zheng

Abstract: Recurrent event queries (REQ) constitute a special class of search queries occurring at regular, predictable time intervals. The freshness of documents ranked for such queries is generally of critical importance. REQ forms a significant volume, as much as 6% of query traffic received by search engines. In this work, we develop an improved REQ classifier that could provide significant improvements in addressing this problem. We analyze REQ queries, and develop novel features from multiple sources, and evaluate them using machine learning techniques. From historical query logs, we develop features utilizing query frequency, click information, and user intent dynamics within a search session. We also develop temporal features by time series analysis from query frequency. Other generated features include word matching with recurrent event seed words and time sensitivity of search result set. We use Naive Bayes, SVM and decision tree based logistic regres- sion model to train REQ classifier. The results on test data show that our models outperformed baseline approach significantly. Experiments on a commercial Web search engine also show significant gains in overall relevance, and thus overall user experience.

reference text

R. Baeza-Yates, F. Saint-Jean, and C. Castillo. 2002. Web dynamics, age and page qualit. String Processing and Information Retrieval, pages 453–461 . Steven M. Beitzel, Eric C. Jensen, Ophir Frieder, David Grossman, David D. Lewis, Abdur Chowdhury, and 1138 Aleksandr Kolcz. 2005. Automatic web query classification using labeled and unlabeled training data. In SIGIR ’05, pages 581–582. K. Berberich, M. Vazirgiannis, and G. Weikum. 2005. Time-aware authority rankings. Internet Math, 2(3):301–332. L. Breiman, J. Friedman, R. Olshen, and C. Stone. 1984. Classification and Regression Trees. Wadsworth and Brooks, Monterey, CA. S. Brin and L. Page. 1998. The anatomy of a largescale hypertextual web search engine. Proceedings of International Conference on World Wide Web. Andrei Z. Broder, Marcus Fontoura, Evgeniy Gabrilovich, Amruta Joshi, Vanja Josifovski, and Tong Zhang. Robust classification of rare queries using web knowledge. In SIGIR ’07, pages 23 1–238. Chih-Chung Chang and Chih-Jen Lin, 2001 . LIBSVM: a library for support vector machines. Software available at http://www.csie.ntu.edu.tw/ cjlin/libsvm. J. Cho, S. Roy, and R. Adams. 2005. Page quality: In search ofan unbiased web ranking. Proc. ofACMSIGMOD Conference. F. Diaz. 2009. Integration of news content into web results. Proceedings of the Second ACM International Conference on Web Search andData Mining (WSDM), pages 182–191 . Anlei Dong, Yi Chang, Zhaohui Zheng, Gilad Mishne, Jing Bai, Ruiqiang Zhang, Karolina Buchner, Ciya Liao, and Fernando Diaz. 2010a. Towards recency ranking in web search. Proceedings of the Third ACM International Conference on Web Search and Data Mining (WSDM), pages 11–20. Anlei Dong, Ruiqiang Zhang, Pranam Kolari, Jing Bai, Fernando Diaz, Yi Chang, Zhaohui Zheng, and Hongyuan Zha. 2010b. Time is of the essence: improving recency ranking using twitter data. 19th International World Wide Web Conference (WWW), pages 331–340. Jonathan L. Elsas and Susan T. Dumais. 2010. Leveraging temporal dynamics of document content in relevance ranking. In WSDM, pages 1–10. J. H. Friedman. 2001. Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5): 1189–1232. Kalervo Jarvelin and Jaana Kekalainen. 2002. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems, 20:2002. K. Sparck Jones, S. Walker, and S. E. Robertson. 2000. A probabilistic model of information retrieval: development and comparative experiments. Inf. Process. Manage., 36(6):779–808. A. C. Knig, M. Gamon, and Q. Wu. 2009. Click-through prediction for news queries. Proc. of SIGIR, pages 347–354. Ying Li, Zijian Zheng, and Honghua (Kathy) Dai. 2005. Kdd cup-2005 report: facing a great challenge. SIGKDD Explor. Newsl., 7(2):91–99. Xiao Li, Ye yi Wang, and Alex Acero. 2008. Learning query intent from regularized click graphs. In In SIGIR 2008, pages 339–346. ACM. Donald Metzler, Rosie Jones, Fuchun Peng, and Ruiqiang Zhang. 2009. Improving search relevance for implicitly temporal queries. In SIGIR ’09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, pages 700–701. S. Nunes. 2007. Exploring temporal evidence in web information retrieval. BCS IRSG Symposium: Future Directions in Information Access. S. Pandey, S. Roy, C. Olston, J. Cho, and S. Chakrabarti. 2005. Shuffling a stacked deck: The case for partially randomized ranking of search engine results. VLDB. G. Salton and M. J. McGill. 1983. Introduction to modern information retrieval. McGraw-Hill, NY. Dou Shen, Rong Pan, Jian-Tao Sun, Jeffrey Junfeng Pan, Kangheng Wu, Jie Yin, and Qiang Yang. 2005. Q2c@ust: our winning solution to query classification in kddcup 2005. SIGKDD Explor. Newsl., 7(2): 100– 110. Dou Shen, Rong Pan, Jian-Tao Sun, Jeffrey Junfeng Pan, Kangheng Wu, Jie Yin, and Qiang Yang. 2006. Query enrichment for web-query classification. ACM Trans. Inf. Syst., 24(3):320–352. Ruiqiang Zhang, Yi Chang, Zhaohui Zheng, Donald Metzler, and Jian-yun Nie. 2009. Search result re-ranking by feedback control adjustment for timesensitive query. In HLT-NAACL ’09, pages 165–168. 1139