nips nips2006 nips2006-146 nips2006-146-reference knowledge-graph by maker-knowledge-mining

146 nips-2006-No-regret Algorithms for Online Convex Programs

Source: pdf

Author: Geoffrey J. Gordon

Abstract: Online convex programming has recently emerged as a powerful primitive for designing machine learning algorithms. For example, OCP can be used for learning a linear classiﬁer, dynamically rebalancing a binary search tree, ﬁnding the shortest path in a graph with unknown edge lengths, solving a structured classiﬁcation problem, or ﬁnding a good strategy in an extensive-form game. Several researchers have designed no-regret algorithms for OCP. But, compared to algorithms for special cases of OCP such as learning from expert advice, these algorithms are not very numerous or ﬂexible. In learning from expert advice, one tool which has proved particularly valuable is the correspondence between no-regret algorithms and convex potential functions: by reasoning about these potential functions, researchers have designed algorithms with a wide variety of useful guarantees such as good performance when the target hypothesis is sparse. Until now, there has been no such recipe for the more general OCP problem, and therefore no ability to tune OCP algorithms to take advantage of properties of the problem or data. In this paper we derive a new class of no-regret learning algorithms for OCP. These Lagrangian Hedging algorithms are based on a general class of potential functions, and are a direct generalization of known learning rules like weighted majority and external-regret matching. In addition to proving regret bounds, we demonstrate our algorithms learning to play one-card poker. 1

reference text

[1] Geoffrey J. Gordon. Approximate Solutions to Markov Decision Processes. PhD thesis, Carnegie Mellon University, 1999.

[2] James F. Hannan. Approximation to Bayes risk in repeated play. In M. Dresher, A. Tucker, and P. Wolfe, editors, Contributions to the Theory of Games, volume 3, pages 97–139. Princeton University Press, 1957.

[3] Martin Zinkevich. Online convex programming and generalized inﬁnitesimal gradient ascent. In Proceedings of the Twentieth International Conference on Machine Learning. AAAI Press, 2003.

[4] Adam Kalai and Santosh Vempala. Geometric algorithms for online optimization. Technical Report MIT-LCS-TR-861, Massachusetts Institute of Technology, 2002.

[5] Yoav Freund and Robert E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. In EuroCOLT 95, pages 23–37. Springer-Verlag, 1995.

[6] Geoffrey J. Gordon. No-regret algorithms for structured prediction problems. Technical Report CMU-CALD-05-112, Carnegie Mellon University, 2005.

[7] Nicol` Cesa-Bianchi and G´ bor Lugosi. Potential-based algorithms in on-line prediction and o a game theory. Machine Learning, 51:239–261, 2003.

[8] David Blackwell. An analogue of the minimax theorem for vector payoffs. Paciﬁc Journal of Mathematics, 6(1):1–8, 1956.

[9] Shai Shalev-Shwartz and Yoram Singer. Convex repeated games and Fenchel duality. In B. Sch¨ lkopf, J.C. Platt, and T. Hofmann, editors, Advances in Neural Information Processing o Systems, volume 19, Cambridge, MA, 2007. MIT Press.

[10] David P. Helmbold and Robert E. Schapire. Predicting nearly as well as the best pruning of a decision tree. In Proceedings of COLT, pages 61–68, 1995.

[11] Eiji Takimoto and Manfred Warmuth. Path kernels and multiplicative updates. In COLT, 2002.

[12] R. Tyrell Rockafellar. Convex Analysis. Princeton University Press, New Jersey, 1970.

[13] Nick Littlestone and Manfred Warmuth. The weighted majority algorithm. Technical Report UCSC-CRL-91-28, University of California Santa Cruz, 1992.

[14] Sergiu Hart and Andreu Mas-Colell. A simple adaptive procedure leading to correlated equilibrium. Econometrica, 68(5):1127–1150, 2000.

[15] H. Brendan McMahan, Geoffrey J. Gordon, and Avrim Blum. Planning in the presence of cost functions controlled by an adversary. In Proceedings of the Twentieth International Conference on Machine Learning, 2003.

[16] David P. Helmbold and Robert E. Schapire. Predicting nearly as well as the best pruning of a decision tree. In COLT, 1995.

[17] D. Koller, N. Meggido, and B. von Stengel. Efﬁcient computation of equilibria for extensive two-person games. Games and Economic Behaviour, 14(2), 1996.