nips nips2011 nips2011-272 nips2011-272-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Alekh Agarwal, Dean P. Foster, Daniel J. Hsu, Sham M. Kakade, Alexander Rakhlin
Abstract: This paper addresses the problem of minimizing a convex, Lipschitz function f over a convex, compact set X under a stochastic bandit feedback model. In this model, the algorithm is allowed to observe noisy realizations of the function value f (x) at any query point x ∈ X . We demonstrate √ a generalization of the ellipsoid algorithm that √ incurs O(poly(d) T ) regret. Since any algorithm has regret at least Ω( T ) on this problem, our algorithm is optimal in terms of the scaling with T . 1
[1] A. Agarwal, O. Dekel, and L. Xiao. Optimal algorithms for online convex optimization with multi-point bandit feedback. In COLT, 2010.
[2] A. Agarwal, D. Foster, D. Hsu, S. Kakade, and A. Rakhlin. Stochastic convex optimization with bandit feedback. URL http://arxiv.org/abs/1107.1744, 2011.
[3] R. Agrawal. The continuum-armed bandit problem. SIAM journal on control and optimization, 33:1926, 1995.
[4] P. Auer, R. Ortner, and C. Szepesv´ri. Improved rates for the stochastic continuuma armed bandit problem. Learning Theory, pages 454–468, 2007.
[5] K. Ball. An elementary introduction to modern convex geometry. In Flavors of Geometry, number 31 in Publications of the Mathematical Sciences Research Institute, pages 1–55. 1997.
[6] D. Bertsimas and S. Vempala. Solving convex programs by random walks. Journal of the ACM, 51(4):540–556, 2004.
[7] S. Bubeck, R. Munos, G. Stolz, and C. Szepesv´ri. X -armed bandits. Journal of a Machine Learning Research, 12:1655–1695, 2011.
[8] E.W. Cope. Regret and convergence bounds for a class of continuum-armed bandit problems. Automatic Control, IEEE Transactions on, 54(6):1243–1253, 2009.
[9] V. Dani, T.P. Hayes, and S.M. Kakade. Stochastic linear optimization under bandit feedback. In Proceedings of the 21st Annual Conference on Learning Theory (COLT), 2008.
[10] A. D. Flaxman, A. T. Kalai, and B. H. Mcmahan. Online convex optimization in the bandit setting: gradient descent without a gradient. In Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms, pages 385–394, 2005.
[11] Donald Goldfarb and Michael J. Todd. Modifications and implementation of the ellipsoid algorithm for linear programming. Mathematical Programming, 23:1–19, 1982.
[12] R. Kleinberg. Nearly tight bounds for the continuum-armed bandit problem. Advances in Neural Information Processing Systems, 18, 2005.
[13] R. Kleinberg, A. Slivkins, and E. Upfal. Multi-armed bandits in metric spaces. In Proceedings of the 40th annual ACM symposium on Theory of computing, pages 681– 690. ACM, 2008.
[14] A. Nemirovski and D. Yudin. Problem Complexity and Method Efficiency in Optimization. Wiley, New York, 1983.
[15] Y. Nesterov. Random gradient-free minimization of convex functions. Technical Report 2011/1, CORE DP, 2011.
[16] N. Srinivas, A. Krause, S.M. Kakade, and M. Seeger. Gaussian process optimization in the bandit setting: No regret and experimental design. Arxiv preprint arXiv:0912.3995, 2009.
[17] J. Y. Yu and S. Mannor. Unimodal bandits. In ICML, 2011. 9