nips nips2011 nips2011-272 nips2011-272-reference knowledge-graph by maker-knowledge-mining

272 nips-2011-Stochastic convex optimization with bandit feedback

Source: pdf

Author: Alekh Agarwal, Dean P. Foster, Daniel J. Hsu, Sham M. Kakade, Alexander Rakhlin

Abstract: This paper addresses the problem of minimizing a convex, Lipschitz function f over a convex, compact set X under a stochastic bandit feedback model. In this model, the algorithm is allowed to observe noisy realizations of the function value f (x) at any query point x ∈ X . We demonstrate √ a generalization of the ellipsoid algorithm that √ incurs O(poly(d) T ) regret. Since any algorithm has regret at least Ω( T ) on this problem, our algorithm is optimal in terms of the scaling with T . 1

reference text

[1] A. Agarwal, O. Dekel, and L. Xiao. Optimal algorithms for online convex optimization with multi-point bandit feedback. In COLT, 2010.

[2] A. Agarwal, D. Foster, D. Hsu, S. Kakade, and A. Rakhlin. Stochastic convex optimization with bandit feedback. URL http://arxiv.org/abs/1107.1744, 2011.

[3] R. Agrawal. The continuum-armed bandit problem. SIAM journal on control and optimization, 33:1926, 1995.

[4] P. Auer, R. Ortner, and C. Szepesv´ri. Improved rates for the stochastic continuuma armed bandit problem. Learning Theory, pages 454–468, 2007.

[5] K. Ball. An elementary introduction to modern convex geometry. In Flavors of Geometry, number 31 in Publications of the Mathematical Sciences Research Institute, pages 1–55. 1997.

[6] D. Bertsimas and S. Vempala. Solving convex programs by random walks. Journal of the ACM, 51(4):540–556, 2004.

[7] S. Bubeck, R. Munos, G. Stolz, and C. Szepesv´ri. X -armed bandits. Journal of a Machine Learning Research, 12:1655–1695, 2011.

[8] E.W. Cope. Regret and convergence bounds for a class of continuum-armed bandit problems. Automatic Control, IEEE Transactions on, 54(6):1243–1253, 2009.

[9] V. Dani, T.P. Hayes, and S.M. Kakade. Stochastic linear optimization under bandit feedback. In Proceedings of the 21st Annual Conference on Learning Theory (COLT), 2008.

[10] A. D. Flaxman, A. T. Kalai, and B. H. Mcmahan. Online convex optimization in the bandit setting: gradient descent without a gradient. In Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms, pages 385–394, 2005.

[11] Donald Goldfarb and Michael J. Todd. Modiﬁcations and implementation of the ellipsoid algorithm for linear programming. Mathematical Programming, 23:1–19, 1982.

[12] R. Kleinberg. Nearly tight bounds for the continuum-armed bandit problem. Advances in Neural Information Processing Systems, 18, 2005.

[13] R. Kleinberg, A. Slivkins, and E. Upfal. Multi-armed bandits in metric spaces. In Proceedings of the 40th annual ACM symposium on Theory of computing, pages 681– 690. ACM, 2008.

[14] A. Nemirovski and D. Yudin. Problem Complexity and Method Eﬃciency in Optimization. Wiley, New York, 1983.

[15] Y. Nesterov. Random gradient-free minimization of convex functions. Technical Report 2011/1, CORE DP, 2011.

[16] N. Srinivas, A. Krause, S.M. Kakade, and M. Seeger. Gaussian process optimization in the bandit setting: No regret and experimental design. Arxiv preprint arXiv:0912.3995, 2009.

[17] J. Y. Yu and S. Mannor. Unimodal bandits. In ICML, 2011. 9