nips nips2011 nips2011-160 nips2011-160-reference knowledge-graph by maker-knowledge-mining

160 nips-2011-Linear Submodular Bandits and their Application to Diversified Retrieval


Source: pdf

Author: Yisong Yue, Carlos Guestrin

Abstract: Diversified retrieval and online learning are two core research areas in the design of modern information retrieval systems. In this paper, we propose the linear submodular bandits problem, which is an online learning setting for optimizing a general class of feature-rich submodular utility models for diversified retrieval. We present an algorithm, called LSBG REEDY, and prove that it efficiently converges to a near-optimal model. As a case study, we applied our approach to the setting of personalized news recommendation, where the system must recommend small sets of news articles selected from tens of thousands of available articles each day. In a live user study, we found that LSBG REEDY significantly outperforms existing online learning approaches. 1


reference text

[1] Y. Abbasi-Yadkori, D. Pal, and C. Szepesvari. Online least squares estimation with self-normalized processes: An application to bandit problems, 2011. http://arxiv.org/abs/1102.2670.

[2] J. Abernathy, E. Hazan, and A. Rakhlin. Competing in the dark: An efficient algorithm for bandit linear optimization. In Conference on Learning Theory (COLT), 2008.

[3] R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong. Diversifying search results. In ACM Conference on Web Search and Data Mining (WSDM), 2009.

[4] D. Blei, A. Ng, and M. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research (JMLR), 3:993–1022, 2003.

[5] J. Carbonell and J. Goldstein. The use of MMR, diversity-based re-ranking for reordering documents and producing summaries. In ACM Conference on Information Retrieval (SIGIR), 1998.

[6] H. Chen and D. Karger. Less is more. In ACM Conference on Information Retrieval (SIGIR), 2006.

[7] W. Chu, L. Li, L. Reyzin, and R. Schapire. Contextual bandits with linear payoff functions. In Conference on Artificial Intelligence and Statistics (AISTATS), 2011.

[8] V. Dani, T. Hayes, and S. Kakade. Stochastic linear optimization under bandit feedback. In Conference on Learning Theory (COLT), 2008.

[9] K. El-Arini, G. Veda, D. Shahaf, and C. Guestrin. Turning down the noise in the blogosphere. In ACM Conference on Knowledge Discovery and Data Mining (KDD), 2009.

[10] U. Feige. A threshold of ln n for approximating set cover. Journal of the ACM (JACM), 45(4):634–652, 1998.

[11] Y. Hu, J. Boyd-Graber, and B. Satinoff. Interactive topic modeling. In Annual Meeting of the Association for Computational Linguistics (ACL), 2011.

[12] S. Kale, L. Reyzin, and R. Schapire. Non-stochastic bandit slate problems. In Neural Information Processing Systems (NIPS), 2010.

[13] J. Langford and T. Zhang. The epoch-greedy algorithm for contextual multi-armed bandits. In Neural Information Processing Systems (NIPS), 2007.

[14] J. Leskovec, A. Krause, C. Guestrin, C. Faloutsos, J. VanBriesen, and N. Glance. Cost-effective outbreak detection in networks. In ACM Conference on Knowledge Discovery and Data Mining (KDD), 2007.

[15] L. Li, W. Chu, J. Langford, and R. Schapire. A contextual-bandit approach to personalized news article recommendation. In World Wide Web Conference (WWW), 2010.

[16] L. Li, D. Wang, T. Li, D. Knox, and B. Padmanabhan. Scene: A scalable two-stage personalized news recommendation system. In ACM Conference on Information Retrieval (SIGIR), 2011.

[17] G. Nemhauser, L. Wolsey, and M. Fisher. An analysis of the approximations for maximizing submodular set functions. Mathematical Programming, 14:265–294, 1978.

[18] F. Radlinski, R. Kleinberg, and T. Joachims. Learning diverse rankings with multi-armed bandits. In International Conference on Machine Learning (ICML), 2008.

[19] F. Radlinski, M. Kurup, and T. Joachims. How does clickthrough data reflect retrieval quality? In ACM Conference on Information and Knowledge Management (CIKM), 2008.

[20] P. Rusmevichientong and J. Tsitsiklis. Linearly parameterized bandits. Mathematics of Operations Research, 35(2):395–411, 2010.

[21] A. Slivkins, F. Radlinski, and S. Gollapudi. Learning optimally diverse rankings over large document collections. In International Conference on Machine Learning (ICML), 2010.

[22] M. Streeter and D. Golovin. An online algorithm for maximizing submodular functions. In Neural Information Processing Systems (NIPS), 2008.

[23] M. Streeter, D. Golovin, and A. Krause. Online learning of assignments. In Neural Information Processing Systems (NIPS), 2009.

[24] A. Swaminathan, C. Mathew, and D. Kirovski. Essential pages. In The IEEE/WIC/ACM International Conference on Web Intelligence (WI), 2009.

[25] Y. Yue and T. Joachims. Predicting diverse subsets using structural svms. In International Conference on Machine Learning (ICML), 2008.

[26] C. Zhai, W. W. Cohen, and J. Lafferty. Beyond independent relevance: methods and evaluation metrics for subtopic retrieval. In ACM Conference on Information Retrieval (SIGIR), 2003.

[27] X. Zhu, A. Goldberg, J. V. Gael, and D. Andrzejewski. Improving diversity in ranking using absorbing random walks. In NAACL Conference on Human Language Technologies (HLT), 2007. 9