nips nips2011 nips2011-220 nips2011-220-reference knowledge-graph by maker-knowledge-mining

220 nips-2011-Prediction strategies without loss

Source: pdf

Author: Michael Kapralov, Rina Panigrahy

Abstract: Consider a sequence of bits where we are trying to predict the next bit from the previous bits. Assume we are allowed to say ‘predict 0’ or ‘predict 1’, and our payoff is +1 if the prediction is correct and −1 otherwise. We will say that at each point in time the loss of an algorithm is the number of wrong predictions minus the number of right predictions so far. In this paper we are interested in algorithms that have essentially zero (expected) loss over any string at any point in time and yet have small regret with respect to always predicting 0 or always predicting 1. For a sequence of length T our algorithm has regret 14 T and loss √ 2 2 T e− T in expectation for all strings. We show that the tradeoff between loss and regret is optimal up to constant factors. Our techniques extend to the general setting of N experts, where the related problem of trading off regret to the best expert for regret to the ’special’ expert has been studied by Even-Dar et al. (COLT’07). We obtain essentially zero loss with respect to the special expert and optimal loss/regret tradeoff, improving upon the results of Even-Dar et al and settling the main question left open in their paper. The strong loss bounds of the algorithm have some surprising consequences. First, we obtain a parameter free algorithm for the experts problem that has optimal regret bounds with respect to k-shifting optima, i.e. bounds with respect to the optimum that is allowed to change arms multiple times. Moreover, for any window of size n the regret of our algorithm to any expert never exceeds O( n(log N + log T )), where N is the number of experts and T is the time horizon, while maintaining the essentially zero loss property. 1

reference text

[1] J.-Y. Audibert and S. Bubeck. Minimax policies for adversarial and stochastic bandits. COLT, 2009.

[2] P. Auer, N. Cesa-Bianchi, Y. Freund, and R. Schapire. The nonstochastic multi-armed bandit problem. SIAM J. Comput., 32:48–77, 2002.

[3] A. Blum and Y. Mansour. From external to internal regret. Journal of Machine Learning Research, pages 1307–1324, 2007.

[4] K. Chaudhuri, Y. Freund, and D. Hsu. A parameter free hedging algorithm. NIPS, 2009.

[5] T. Cover. Behaviour of sequential predictors of binary sequences. Transactions of the Fourth Prague Conference on Information Theory, Statistical Decision Functions, Random Processes, 1965.

[6] T. Cover. Universal portfolios. Mathematical Finance, 1991.

[7] E. Even-Dar, M. Kearns, Y. Mansour, and J. Wortman. Regret to the best vs. regret to the average. Machine Learning, 72:21–37, 2008.

[8] Y. Freund. Predicting a binary sequence almost as well as the optimal biased coin. COLT, 1996.

[9] Y. Freund, R. E. Schapire, Y. Singer, and M. K. Warmuth. Using and combining predictors that specialize. STOC, pages 334–343, 1997.

[10] E. Hazan and C. Seshadhri. Efﬁcient learning algorithms for changing environments (full version available at http://eccc.hpi-web.de/eccc-reports/2007/tr07-088/index.html). ICML, pages 393–400, 2009.

[11] M. Kapralov and R. Panigrahy. Prediction without loss in multi-armed bandit problems. http://arxiv.org/abs/1008.3672, 2010.

[12] N. Littlestone and M.K. Warmuth. The weighted majority algorithm. FOCS, 1989.

[13] V. Vovk. A game of prediction with expert advice. Journal of Computer and System Sciences, 1998.

[14] V. Vovk. Derandomizing stochastic prediction strategies. Machine Learning, pages 247–282, 1999. 9