nips nips2008 nips2008-17 nips2008-17-reference knowledge-graph by maker-knowledge-mining

17 nips-2008-Algorithms for Infinitely Many-Armed Bandits

Source: pdf

Author: Yizao Wang, Jean-yves Audibert, Rémi Munos

Abstract: We consider multi-armed bandit problems where the number of arms is larger than the possible number of experiments. We make a stochastic assumption on the mean-reward of a new selected arm which characterizes its probability of being a near-optimal arm. Our assumption is weaker than in previous works. We describe algorithms based on upper-conﬁdence-bounds applied to a restricted set of randomly selected arms and provide upper-bounds on the resulting expected regret. We also derive a lower-bound which matches (up to a logarithmic factor) the upper-bound in some cases. 1

reference text

[1] R. Agrawal. The continuum-armed bandit problem. SIAM J. Control and Optimization, 33:1926–1951, 1995.

[2] J.-Y. Audibert, R. Munos, and C. Szepesvári. Tuning bandit algorithms in stochastic environments. In M. Hutter, R. A. Servedio, and E. Takimoto, editors, ALT, volume 4754 of Lecture Notes in Computer Science, pages 150–165. Springer, 2007.

[3] P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47(2/3):235–256, 2002.

[4] P. Auer, R. Ortner, and C. Szepesvári. Improved rates for the stochastic continuum-armed bandit problem. 20th COLT, San Diego, CA, USA, 2007.

[5] D. A. Berry, R. W. Chen, A. Zame, D. C. Heath, and L. A. Shepp. Bandit problems with inﬁnitely many arms. The Annals of Statistics, 25(5):2103–2116, 1997.

[6] R. Kleinberg. Nearly tight bounds for the continuum-armed bandit problem. In NIPS-2004, 2004.

[7] R. Kleinberg, A. Slivkins, and E. Upfal. Multi-armed bandit problems in metric spaces. In Proceedings of the 40th ACM Symposium on Theory of Computing, 2008.

[8] T. L. Lai and H. Robbins. Asymptotically efﬁcient adaptive allocation rules. Advances in Applied Mathematics, 6:4–22, 1985.

[9] O. Teytaud, S. Gelly, and M. Sebag. Anytime many-armed bandit. Conférence francophone sur l’Apprentissage automatique (CAp) Grenoble, France, 2007. 8