nips nips2011 nips2011-204 nips2011-204-reference knowledge-graph by maker-knowledge-mining

204 nips-2011-Online Learning: Stochastic, Constrained, and Smoothed Adversaries


Source: pdf

Author: Alexander Rakhlin, Karthik Sridharan, Ambuj Tewari

Abstract: Learning theory has largely focused on two main learning scenarios: the classical statistical setting where instances are drawn i.i.d. from a fixed distribution, and the adversarial scenario wherein, at every time step, an adversarially chosen instance is revealed to the player. It can be argued that in the real world neither of these assumptions is reasonable. We define the minimax value of a game where the adversary is restricted in his moves, capturing stochastic and non-stochastic assumptions on data. Building on the sequential symmetrization approach, we define a notion of distribution-dependent Rademacher complexity for the spectrum of problems ranging from i.i.d. to worst-case. The bounds let us immediately deduce variation-type bounds. We study a smoothed online learning scenario and show that exponentially small amount of noise can make function classes with infinite Littlestone dimension learnable. 1


reference text

[1] J. Abernethy, A. Agarwal, P. Bartlett, and A. Rakhlin. A stochastic view of optimal regret through minimax duality. In COLT, 2009.

[2] S. Ben-David, D. Pal, and S. Shalev-Shwartz. Agnostic online learning. In Proceedings of the 22th Annual Conference on Learning Theory, 2009.

[3] J.O. Berger. Statistical decision theory and Bayesian analysis. Springer, 1985.

[4] E. Hazan and S. Kale. Better algorithms for benign bandits. In SODA, 2009.

[5] S.M. Kakade, K. Sridharan, and A. Tewari. On the complexity of linear prediction: Risk bounds, margin bounds, and regularization. NIPS, 22, 2008.

[6] A. Lazaric and R. Munos. Hybrid Stochastic-Adversarial On-line Learning. In COLT, 2009.

[7] M. Ledoux and M. Talagrand. Probability in Banach Spaces. Springer-Verlag, New York, 1991.

[8] N. Littlestone. Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Machine Learning, 2(4):285–318, 04 1988.

[9] S. Posner and S. Kulkarni. On-line learning of functions of bounded variation under various sampling schemes. In Proceedings of the sixth annual conference on Computational learning theory, pages 439–445. ACM, 1993.

[10] A. Rakhlin, K. Sridharan, and A. Tewari. Online learning: Random averages, combinatorial parameters, and learnability. In NIPS, 2010. Full version available at arXiv:1006.1138.

[11] A. Rakhlin, K. Sridharan, and A. Tewari. Online learning: Beyond regret. In COLT, 2011. Full version available at arXiv:1011.3168.

[12] S. Shalev-Shwartz, O. Shamir, N. Srebro, and K. Sridharan. Learnability, stability and uniform convergence. JMLR, 11:2635–2670, Oct 2010.

[13] D. A. Spielman and S. H. Teng. Smoothed analysis of algorithms: Why the simplex algorithm usually takes polynomial time. Journal of the ACM, 51(3):385–463, 2004.

[14] A. W. Van Der Vaart and J. A. Wellner. Weak Convergence and Empirical Processes : With Applications to Statistics. Springer Series, March 1996. 9