nips nips2011 nips2011-25 nips2011-25-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Tim V. Erven, Wouter M. Koolen, Steven D. Rooij, Peter Grünwald
Abstract: Most methods for decision-theoretic online learning are based on the Hedge algorithm, which takes a parameter called the learning rate. In most previous analyses the learning rate was carefully tuned to obtain optimal worst-case performance, leading to suboptimal performance on easy instances, for example when there exists an action that is significantly better than all others. We propose a new way of setting the learning rate, which adapts to the difficulty of the learning problem: in the worst case our procedure still guarantees optimal performance, but on easy instances it achieves much smaller regret. In particular, our adaptive method achieves constant regret in a probabilistic setting, when there exists an action that on average obtains strictly smaller loss than all other actions. We also provide a simulation study comparing our approach to existing methods. 1
[1] Y. Freund and R. E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55:119–139, 1997.
[2] N. Littlestone and M. K. Warmuth. The weighted majority algorithm. Information and Computation, 108(2):212–261, 1994.
[3] V. Vovk. A game of prediction with expert advice. Journal of Computer and System Sciences, 56(2):153–173, 1998.
[4] N. Cesa-Bianchi and G. Lugosi. Prediction, learning, and games. Cambridge University Press, 2006.
[5] Y. Freund and R. E. Schapire. Adaptive game playing using multiplicative weights. Games and Economic Behavior, 29:79–103, 1999.
[6] E. Hazan and S. Kale. Extracting certainty from uncertainty: Regret bounded by variation in costs. In Proceedings of the 21st Annual Conference on Learning Theory (COLT), pages 57–67, 2008.
[7] N. Cesa-Bianchi, Y. Freund, D. Haussler, D. P. Helmbold, R. E. Schapire, and M. K. Warmuth. How to use expert advice. Journal of the ACM, 44(3):427–485, 1997.
[8] P. Auer, N. Cesa-Bianchi, and C. Gentile. Adaptive and self-confident on-line learning algorithms. Journal of Computer and System Sciences, 64:48–75, 2002.
[9] V. Vovk. Competitive on-line statistics. International Statistical Review, 69(2):213–248, 2001.
[10] D. Haussler, J. Kivinen, and M. K. Warmuth. Sequential prediction of individual sequences under general loss functions. IEEE Transactions on Information Theory, 44(5):1906–1925, 1998.
[11] A. N. Shiryaev. Probability. Springer-Verlag, 1996. 9