nips nips2012 nips2012-293 nips2012-293-reference knowledge-graph by maker-knowledge-mining

293 nips-2012-Relax and Randomize : From Value to Algorithms

Source: pdf

Author: Sasha Rakhlin, Ohad Shamir, Karthik Sridharan

Abstract: We show a principled way of deriving online learning algorithms from a minimax analysis. Various upper bounds on the minimax value, previously thought to be non-constructive, are shown to yield algorithms. This allows us to seamlessly recover known methods and to derive new ones, also capturing such “unorthodox” methods as Follow the Perturbed Leader and the R2 forecaster. Understanding the inherent complexity of the learning problem thus leads to the development of algorithms. To illustrate our approach, we present several new algorithms, including a family of randomized methods that use the idea of a “random playout”. New versions of the Follow-the-Perturbed-Leader algorithms are presented, as well as methods based on the Littlestone’s dimension, efﬁcient methods for matrix completion with trace norm, and algorithms for the problems of transductive learning and prediction with static experts. 1

reference text

[1] J. Abernethy, A. Agarwal, P. L. Bartlett, and A. Rakhlin. A stochastic view of optimal regret through minimax duality. In COLT, 2009.

[2] J. Abernethy, P. L. Bartlett, A. Rakhlin, and A. Tewari. Optimal strategies and minimax lower bounds for online convex games. In COLT, 2008.

[3] J. Abernethy, M.K. Warmuth, and J. Yellin. Optimal strategies from random walks. In COLT, pages 437–445, 2008.

[4] A. Beck and M. Teboulle. Mirror descent and nonlinear projected subgradient methods for convex optimization. Operations Research Letters, 31(3):167–175, 2003.

[5] S. Ben-David, D. P´ l, and S. Shalev-Shwartz. Agnostic online learning. In COLT, 2009. a

[6] N. Cesa-Bianchi and G. Lugosi. Prediction, Learning, and Games. Cambridge University Press, 2006.

[7] N. Cesa-Bianchi and O. Shamir. Efﬁcient online learning via randomized rounding. In NIPS, 2011.

[8] J. Hannan. Approximation to Bayes risk in repeated play. Contributions to the Theory of Games, 3:97–139, 1957.

[9] E. Hazan, S. Kale, and S. Shalev-Shwartz. Near-optimal algorithms for online matrix prediction. In COLT, 2012.

[10] A. Kalai and S. Vempala. Efﬁcient algorithms for online decision problems. J. Comput. Syst. Sci., 71(3):291–307, 2005.

[11] N. Littlestone. Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Machine Learning, 2(4):285–318, 04 1988.

[12] N. Littlestone and M. K. Warmuth. The weighted majority algorithm. Information and Computation, 108(2):212–261, 1994.

[13] J.F. Mertens, S. Sorin, and S. Zamir. Repeated games. Univ. Catholique de Louvain, Center for Operations Research & Econometrics, 1994.

[14] A.S. Nemirovsky and D.B. Yudin. Problem complexity and method efﬁciency in optimization. 1983.

[15] A. Rakhlin, K. Sridharan, and A. Tewari. Online learning: Random averages, combinatorial parameters, and learnability. In NIPS, 2010. Available at http://arxiv.org/abs/1006.1138.

[16] A. Rakhlin, K. Sridharan, and A. Tewari. Online learning: Beyond regret. In COLT, 2011. Available at http://arxiv.org/abs/1011.3168.

[17] A. Rakhlin, K. Sridharan, and A. Tewari. Online learning: Stochastic, constrained, and smoothed adversaries. In NIPS, 2011. Available at http://arxiv.org/abs/1104.5070.

[18] O. Shamir and S. Shalev-Shwartz. Collaborative ﬁltering with the trace norm: Learning, bounding, and transducing. In COLT, 2011.

[19] S. Sorin. The operator approach to zero-sum stochastic games. Stochastic Games and Applications, NATO Science Series C, Mathematical and Physical Sciences, 570:417–426, 2003.

[20] K. Sridharan and A. Tewari. Convex games in banach spaces. In COLT, 2010.

[21] V.G. Vovk. Aggregating strategies. In Proc. Third Workshop on Computational Learning Theory, pages 371–383. Morgan Kaufmann, 1990.

[22] M. Zinkevich. Online convex programming and generalized inﬁnitesimal gradient ascent. 2003. 9