nips nips2013 nips2013-89 nips2013-89-reference knowledge-graph by maker-knowledge-mining

89 nips-2013-Dimension-Free Exponentiated Gradient

Source: pdf

Author: Francesco Orabona

Abstract: I present a new online learning algorithm that extends the exponentiated gradient framework to inﬁnite dimensional spaces. My analysis shows that the algorithm is implicitly able to estimate the L2 norm of the unknown competitor, U , achieving √ a regret bound of the order of O(U log(U T + 1)) T ), instead of the standard √ O((U 2 + 1) T ), achievable without knowing U . For this analysis, I introduce novel tools for algorithms with time-varying regularizers, through the use of local smoothness. Through a lower bound, I also show that the algorithm is optimal up to log(U T ) term for linear and Lipschitz losses. 1

reference text

[1] J. Abernethy, A. Agarwal, P. L. Bartlett, and A. Rakhlin. A stochastic view of optimal regret through minimax duality. In COLT, 2009.

[2] P. Auer, N. Cesa-Bianchi, and C. Gentile. Adaptive and self-conﬁdent on-line learning algorithms. J. Comput. Syst. Sci., 64(1):48–75, 2002.

[3] N. Cesa-Bianchi and G. Lugosi. Prediction, learning, and games. Cambridge University Press, 2006.

[4] J. C. Duchi, E. Hazan, and Y. Singer. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12:2121–2159, 2011.

[5] E. Even-Dar, M. Kearns, Y. Mansour, and J. Wortman. Regret to the best vs. regret to the average. In N. H. Bshouty and C. Gentile, editors, COLT, volume 4539 of Lecture Notes in Computer Science, pages 233–247. Springer, 2007.

[6] Y. Freund and R. E. Schapire. Large margin classiﬁcation using the Perceptron algorithm. Machine Learning, pages 277–296, 1999.

[7] C. Gentile. The robustness of the p-norm algorithms. Machine Learning, 53(3):265–299, 2003.

[8] S. M. Kakade, S. Shalev-Shwartz, and A. Tewari. Regularization techniques for learning with matrices. CoRR, abs/0910.0610, 2009.

[9] J. Kivinen and M. Warmuth. Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132(1):1–63, January 1997.

[10] N. Littlestone. Learning quickly when irrelevant attributes abound: a new linear-threshold algorithm. Machine Learning, 2(4):285–318, 1988.

[11] Y. Nesterov. Introductory lectures on convex optimization: A basic course, volume 87. Springer, 2003.

[12] F. Orabona and K. Crammer. New adaptive algorithms for online classiﬁcation. In J. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R.S. Zemel, and A. Culotta, editors, Advances in Neural Information Processing Systems 23, pages 1840–1848. 2010.

[13] F. Orabona, K. Crammer, and N. Cesa-Bianchi. A generalized online mirror descent with applications to classiﬁcation and regression, 2013. arXiv:1304.2994.

[14] R. T. Rockafellar. Convex Analysis (Princeton Mathematical Series). Princeton University Press, 1970.

[15] S. Shalev-Shwartz. Online learning: Theory, algorithms, and applications. Technical report, The Hebrew University, 2007. PhD thesis.

[16] S. Shalev-Shwartz. Online learning and online convex optimization. Foundations and Trends in Machine Learning, 4(2), 2012.

[17] S. Shalev-Shwartz and Y. Singer. A primal-dual perspective of online learning algorithms. Machine Learning Journal, 2007.

[18] N. Srebro, K. Sridharan, and A. Tewari. Smoothness, low noise and fast rates. In J. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. S. Zemel, and A. Culotta, editors, Advances in Neural Information Processing Systems 23, pages 2199–2207. 2010.

[19] M. Streeter and B. McMahan. No-regret algorithms for unconstrained online convex optimization. In P. Bartlett, F.C.N. Pereira, C.J.C. Burges, L. Bottou, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems 25, pages 2411–2419. 2012.

[20] V. Vovk. On-line regression competitive with reproducing kernel hilbert spaces. In Jin-Yi Cai, S.Barry Cooper, and Angsheng Li, editors, Theory and Applications of Models of Computation, volume 3959 of Lecture Notes in Computer Science, pages 452–463. Springer Berlin Heidelberg, 2006.

[21] V. G. Vovk. Aggregating strategies. In COLT, pages 371–386, 1990. 9