nips nips2013 nips2013-7 nips2013-7-reference knowledge-graph by maker-knowledge-mining

7 nips-2013-A Gang of Bandits

Source: pdf

Author: Nicolò Cesa-Bianchi, Claudio Gentile, Giovanni Zappella

Abstract: Multi-armed bandit problems formalize the exploration-exploitation trade-offs arising in several industrially relevant applications, such as online advertisement and, more generally, recommendation systems. In many cases, however, these applications have a strong social component, whose integration in the bandit algorithm could lead to a dramatic performance increase. For instance, content may be served to a group of users by taking advantage of an underlying network of social relationships among them. In this paper, we introduce novel algorithmic approaches to the solution of such networked bandit problems. More speciﬁcally, we design and analyze a global recommendation strategy which allocates a bandit algorithm to each network node (user) and allows it to “share” signals (contexts and payoffs) with the neghboring nodes. We then derive two more scalable variants of this strategy based on different ways of clustering the graph nodes. We experimentally compare the algorithm and its variants to state-of-the-art methods for contextual bandits that do not use the relational information. Our experiments, carried out on synthetic and real-world datasets, show a consistent increase in prediction performance obtained by exploiting the network structure. 1

reference text

[1] Y. Abbasi-Yadkori, D. P´ l, and C. Szepesv´ ri. Improved algorithms for linear stochastic bandits. Advances a a in Neural Information Processing Systems, 2011.

[2] K. Amin, M. Kearns, and U. Syed. Graphical models for bandit problems. Proceedings of the TwentySeventh Conference Uncertainty in Artiﬁcial Intelligence, 2011.

[3] D. Asanov. Algorithms and methods in recommender systems. Berlin Institute of Technology, Berlin, Germany, 2011.

[4] P. Auer. Using conﬁdence bounds for exploration-exploitation trade-offs. Journal of Machine Learning Research, 3:397–422, 2002.

[5] T. Bogers. Movie recommendation using random walks over the contextual graph. In CARS’10: Proceedings of the 2nd Workshop on Context-Aware Recommender Systems, 2010.

[6] I. Cantador, P. Brusilovsky, and T. Kuﬂik. 2nd Workshop on Information Heterogeneity and Fusion in Recommender Systems (HetRec 2011). In Proceedings of the 5th ACM Conference on Recommender Systems, RecSys 2011. ACM, 2011.

[7] S. Caron, B. Kveton, M. Lelarge, and S. Bhagat. Leveraging side observations in stochastic bandits. In Proceedings of the 28th Conference on Uncertainty in Artiﬁcial Intelligence, pages 142–151, 2012.

[8] G. Cavallanti, N. Cesa-Bianchi, and C. Gentile. Linear algorithms for online multitask classiﬁcation. Journal of Machine Learning Research, 11:2597–2630, 2010.

[9] W. Chu, L. Li, L. Reyzin, and R. E. Schapire. Contextual bandits with linear payoff functions. In Proceedings of the International Conference on Artiﬁcial Intelligence and Statistics, pages 208–214, 2011.

[10] K. Crammer and C. Gentile. Multiclass classiﬁcation with bandit feedback using adaptive regularization. Machine Learning, 90(3):347–383, 2013.

[11] I. S. Dhillon, Y. Guan, and B. Kulis. Weighted graph cuts without eigenvectors a multilevel approach. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 29(11):1944–1957, 2007.

[12] T. Evgeniou and M. Pontil. Regularized multi–task learning. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’04, pages 109–117, New York, NY, USA, 2004. ACM.

[13] I. Guy, N. Zwerdling, D. Carmel, I. Ronen, E. Uziel, S. Yogev, and S. Ofek-Koifman. Personalized recommendation of social software items based on social relations. In Proceedings of the Third ACM Conference on Recommender Sarxiv ystems, pages 53–60. ACM, 2009.

[14] S. Kar, H. V. Poor, and S. Cui. Bandit problems in networks: Asymptotically efﬁcient distributed allocation rules. In Decision and Control and European Control Conference (CDC-ECC), 2011 50th IEEE Conference on, pages 1771–1778. IEEE, 2011.

[15] L. Li, W. Chu, J. Langford, and R. E. Schapire. A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th International Conference on World Wide Web, pages 661–670. ACM, 2010.

[16] S. Mannor and O. Shamir. From bandits to experts: On the value of side-observations. In Advances in Neural Information Processing Systems, pages 684–692, 2011.

[17] C. A. Micchelli and M. Pontil. Kernels for multi–task learning. In Advances in Neural Information Processing Systems, pages 921–928, 2004.

[18] A. Said, E. W. De Luca, and S. Albayrak. How social relationships affect user similarities. In Proceedings of the 2010 Workshop on Social Recommender Systems, pages 1–4, 2010.

[19] A. Slivkins. Contextual bandits with similarity information. Journal of Machine Learning Research – Proceedings Track, 19:679–702, 2011.

[20] B. Swapna, A. Eryilmaz, and N. B. Shroff. Multi-armed bandits in the presence of side observations in social networks. In Proceedings of 52nd IEEE Conference on Decision and Control (CDC), 2013.

[21] B. Sz¨ r´ nyi, R. Busa-Fekete, I. Hegedus, R. Orm´ ndi, M. Jelasity, and B. K´ gl. Gossip-based distributed oe a e stochastic bandit algorithms. Proceedings of the 30th International Conference on Machine Learning, 2013.

[22] K. Weinberger, A. Dasgupta, J. Langford, A. Smola, and J. Attenberg. Feature hashing for large scale multitask learning. In Proceedings of the 26th International Conference on Machine Learning, pages 1113–1120. Omnipress, 2009. 9