nips nips2013 nips2013-95 nips2013-95-reference knowledge-graph by maker-knowledge-mining

95 nips-2013-Distributed Exploration in Multi-Armed Bandits

Source: pdf

Author: Eshcar Hillel, Zohar S. Karnin, Tomer Koren, Ronny Lempel, Oren Somekh

Abstract: We study exploration in Multi-Armed Bandits in a setting where k players collaborate in order to identify an ε-optimal arm. Our motivation comes from recent employment of bandit algorithms in computationally intensive, large-scale applications. Our results demonstrate a non-trivial tradeoff between the number of arm pulls required by each of the players, and the amount of communication between them. In particular, our main result shows that by allowing the k players to com√ municate only once, they are able to learn k times faster than a single player. √ That is, distributing learning to k players gives rise to a factor k parallel speedup. We complement this result with a lower bound showing this is in general the best possible. On the other extreme, we present an algorithm that achieves the ideal factor k speed-up in learning performance, with communication only logarithmic in 1/ε. 1

reference text

[1] A. Agarwal and J. C. Duchi. Distributed delayed stochastic optimization. In NIPS, pages 873–881, 2011.

[2] D. Agarwal, B.-C. Chen, P. Elango, N. Motgi, S.-T. Park, R. Ramakrishnan, S. Roy, and J. Zachariah. Online models for content optimization. In NIPS, pages 17–24, December 2008.

[3] J.-Y. Audibert, S. Bubeck, and R. Munos. Best arm identiﬁcation in multi-armed bandits. In COLT, pages 41–53, 2010.

[4] P. Auer and R. Ortner. UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem. Periodica Mathematica Hungarica, 61(1-2):55–65, 2010.

[5] P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite-time analysis of the multiarmed bandit problem. Machine learning, 47(2):235–256, 2002.

[6] M. Balcan, A. Blum, S. Fine, and Y. Mansour. Distributed learning, communication complexity and privacy. Arxiv preprint arXiv:1204.3514, 2012.

[7] S. Bubeck, R. Munos, and G. Stoltz. Pure exploration in multi-armed bandits problems. In Algorithmic Learning Theory, pages 23–37. Springer, 2009.

[8] D. Chakrabarti, R. Kumar, F. Radlinski, and E. Upfal. Mortal multi-armed bandits. In NIPS, pages 273–280, 2008.

[9] H. Daum´ III, J. M. Phillips, A. Saha, and S. Venkatasubramanian. Efﬁcient protocols for e distributed classiﬁcation and optimization. In ALT, 2012.

[10] H. Daum´ III, J. M. Phillips, A. Saha, and S. Venkatasubramanian. Protocols for learning e classiﬁers on distributed data. AISTAT, 2012.

[11] J. Dean and S. Ghemawat. MapReduce: simpliﬁed data processing on large clusters. Commun. ACM, 51(1):107–113, Jan. 2008.

[12] O. Dekel, R. Gilad-Bachrach, O. Shamir, and L. Xiao. Optimal distributed online prediction using mini-batches. Journal of Machine Learning Research, 13:165–202, 2012.

[13] J. Duchi, A. Agarwal, and M. J. Wainwright. Distributed dual averaging in networks. NIPS, 23, 2010.

[14] E. Even-Dar, S. Mannor, and Y. Mansour. Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems. The Journal of Machine Learning Research, 7:1079–1105, 2006.

[15] V. Gabillon, M. Ghavamzadeh, A. Lazaric, and S. Bubeck. Multi-bandit best arm identiﬁcation. NIPS, 2011.

[16] E. Hillel, Z. Karnin, T. Koren, R. Lempel, and O. Somekh. Distributed exploration in multiarmed bandits. arXiv preprint arXiv:1311.0800, 2013.

[17] V. Kanade, Z. Liu, and B. Radunovic. Distributed non-stochastic experts. In Advances in Neural Information Processing Systems 25, pages 260–268, 2012.

[18] Z. Karnin, T. Koren, and O. Somekh. Almost optimal exploration in multi-armed bandits. In Proceedings of the 30th International Conference on Machine Learning, 2013.

[19] K. Liu and Q. Zhao. Distributed learning in multi-armed bandit with multiple players. IEEE Transactions on Signal Processing, 58(11):5667–5681, Nov. 2010.

[20] S. Mannor and J. Tsitsiklis. The sample complexity of exploration in the multi-armed bandit problem. The Journal of Machine Learning Research, 5:623–648, 2004.

[21] O. Maron and A. W. Moore. Hoeffding races: Accelerating model selection search for classiﬁcation and function approximation. In NIPS, 1994.

[22] V. Mnih, C. Szepesv´ ri, and J.-Y. Audibert. Empirical bernstein stopping. In ICML, pages a 672–679. ACM, 2008.

[23] F. Radlinski, M. Kurup, and T. Joachims. How does clickthrough data reﬂect retrieval quality? In CIKM, pages 43–52, October 2008.

[24] Y. Yue and T. Joachims. Interactively optimizing information retrieval systems as a dueling bandits problem. In ICML, page 151, June 2009. 9