nips nips2009 nips2009-14 nips2009-14-reference knowledge-graph by maker-knowledge-mining

14 nips-2009-A Parameter-free Hedging Algorithm

Source: pdf

Author: Kamalika Chaudhuri, Yoav Freund, Daniel J. Hsu

Abstract: We study the problem of decision-theoretic online learning (DTOL). Motivated by practical applications, we focus on DTOL when the number of actions is very large. Previous algorithms for learning in this framework have a tunable learning rate parameter, and a barrier to using online-learning in practical applications is that it is not understood how to set this parameter optimally, particularly when the number of actions is large. In this paper, we offer a clean solution by proposing a novel and completely parameter-free algorithm for DTOL. We introduce a new notion of regret, which is more natural for applications with a large number of actions. We show that our algorithm achieves good performance with respect to this new notion of regret; in addition, it also achieves performance close to that of the best bounds achieved by previous algorithms with optimally-tuned parameters, according to previous notions of regret. 1

reference text

[1] Y. Freund and R. E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55:119–139, 1997.

[2] N. Littlestone and M. Warmuth. The weighted majority algorithm. Information and Computation, 108:212–261, 1994.

[3] V. Vovk. A game of prediction witih expert advice. Journal of Computer and System Sciences, 56(2):153– 173, 1998.

[4] K. Chaudhuri, Y. Freund, and D. Hsu. arXiv:0903.2862. Tracking using explanation-based modeling, 2009.

[5] Y. Freund and R. E. Schapire. Adaptive game playing using multiplicative weights. Games and Economic Behavior, 29:79–103, 1999.

[6] P. Auer, N. Cesa-Bianchi, and C. Gentile. Adaptive and self-conﬁdent on-line learning algorithms. Journal of Computer and System Sciences, 64(1), 2002.

[7] N. Cesa-Bianchi, Y. Mansour, and G. Stoltz. Improved second-order bounds for prediction with expert advice. Machine Learning, 66(2–3):321–352, 2007.

[8] N. Cesa-Bianchi and G. Lugosi. Potential-based algorithms in on-line prediction and game theory. Machine Learning, 51:239–261, 2003.

[9] N. Cesa-Bianchi and G. Lugosi. Prediction, Learning and Games. Cambridge University Press, 2006.

[10] E. Hazan and S. Kale. Extracting certainty from uncertainty: Regret bounded by variation in costs. In COLT, 2008.

[11] C. Gentile. The robustness of p-norm algorithms. Machine Learning, 53(3):265–299, 2003.

[12] A. J. Grove, N. Littlestone, and D. Schuurmans. General convergence results for linear discriminant updates. Machine Learning, 43(3):173–210, 2001.

[13] N. Cesa-Bianchi, Y. Freund, D. Haussler, D. P. Hembold, R. E. Schapire, and M. Warmuth. How to use expert advice. Journal of the ACM, 44(3):427–485, 1997.

[14] R. Yaroshinsky, R. El-Yaniv, , and S. Seiden. How to better use expert advice. Machine Learning, 55(3):271–309, 2004.

[15] M. Hutter and J. Poland. Adaptive online prediction by following the perturbed leader. Journal of Machine Learning Research, 6:639–660, 2005.

[16] J. Hannan. Approximation to bayes risk in repeated play. Contributions to the Theory of Games, 3:97– 139, 1957.

[17] A. Kalai and S. Vempala. Efﬁcient algorithms for the online optimization. Journal of Computer and System Sciences, 71(3):291–307, 2005. 9