nips nips2013 nips2013-325 nips2013-325-reference knowledge-graph by maker-knowledge-mining

325 nips-2013-The Pareto Regret Frontier

Source: pdf

Author: Wouter M. Koolen

Abstract: Performance guarantees for online learning algorithms typically take the form of regret bounds, which express that the cumulative loss overhead compared to the best expert in hindsight is small. In the common case of large but structured expert sets we typically wish to keep the regret especially small compared to simple experts, at the cost of modest additional overhead compared to more complex others. We study which such regret trade-offs can be achieved, and how. We analyse regret w.r.t. each individual expert as a multi-objective criterion in the simple but fundamental case of absolute loss. We characterise the achievable and Pareto optimal trade-offs, and the corresponding optimal strategies for each sample size both exactly for each ﬁnite horizon and asymptotically. 1

reference text

[1] Yoav Freund and Robert E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55:119–139, 1997.

[2] Marcus Hutter and Jan Poland. Adaptive online prediction by following the perturbed leader. Journal of Machine Learning Research, 6:639–660, 2005.

[3] Jacob Abernethy, Manfred K. Warmuth, and Joel Yellin. When random play is optimal against an adversary. In Rocco A. Servedio and Tong Zhang, editors, COLT, pages 437–446. Omnipress, 2008.

[4] Wouter M. Koolen. Combining Strategies Efﬁciently: High-quality Decisions from Conﬂicting Advice. PhD thesis, Institute of Logic, Language and Computation (ILLC), University of Amsterdam, January 2011.

[5] Nicol` Cesa-Bianchi and Ohad Shamir. Efﬁcient online learning via randomized rounding. o In J. Shawe-Taylor, R.S. Zemel, P. Bartlett, F.C.N. Pereira, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems 24, pages 343–351, 2011.

[6] Nicol` Cesa-Bianchi, Yoav Freund, David Haussler, David P. Helmbold, Robert E. Schapire, o and Manfred K. Warmuth. How to use expert advice. Journal of the ACM, 44(3):427–485, 1997.

[7] Jacob Abernethy, John Langford, and Manfred K Warmuth. Continuous experts and the Binning algorithm. In Learning Theory, pages 544–558. Springer, 2006.

[8] Jacob Abernethy and Manfred K. Warmuth. Repeated games against budgeted adversaries. In J. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R.S. Zemel, and A. Culotta, editors, Advances in Neural Information Processing Systems 23, pages 1–9, 2010.

[9] Sasha Rakhlin, Ohad Shamir, and Karthik Sridharan. Relax and randomize : From value to algorithms. In P. Bartlett, F.C.N. Pereira, C.J.C. Burges, L. Bottou, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems 25, pages 2150–2158, 2012.

[10] Eyal Even-Dar, Michael Kearns, Yishay Mansour, and Jennifer Wortman. Regret to the best vs. regret to the average. Machine Learning, 72(1-2):21–37, 2008.

[11] Michael Kapralov and Rina Panigrahy. Prediction strategies without loss. In J. Shawe-Taylor, R.S. Zemel, P. Bartlett, F.C.N. Pereira, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems 24, pages 828–836, 2011.

[12] Kamalika Chaudhuri, Yoav Freund, and Daniel Hsu. A parameter-free hedging algorithm. In Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems 22, pages 297–305, 2009.

[13] Alexey V. Chernov and Vladimir Vovk. Prediction with advice of unknown number of experts. In Peter Gr¨ nwald and Peter Spirtes, editors, UAI, pages 117–125. AUAI Press, 2010. u

[14] Nicol` Cesa-Bianchi and G´ bor Lugosi. Prediction, Learning, and Games. Cambridge Unio a versity Press, 2006.

[15] Peter Auer, Nicol` Cesa-Bianchi, and Claudio Gentile. Adaptive and self-conﬁdent on-line o learning algorithms. Journal of Computer and System Sciences, 64(1):48–75, 2002.

[16] Nicol` Cesa-Bianchi, Yishay Mansour, and Gilles Stoltz. Improved second-order bounds for o prediction with expert advice. Machine Learning, 66(2-3):321–352, 2007.

[17] Elad Hazan and Satyen Kale. Extracting certainty from uncertainty: Regret bounded by variation in costs. Machine learning, 80(2-3):165–188, 2010.

[18] Chao-Kai Chiang, Tianbao Yangand Chia-Jung Leeand Mehrdad Mahdaviand Chi-Jen Luand Rong Jin, and Shenghuo Zhu. Online optimization with gradual variations. In Proceedings of the 25th Annual Conference on Learning Theory, number 23 in JMLR W&CP;, pages 6.1 – 6.20, June 2012.

[19] Steven de Rooij, Tim van Erven, Peter D. Gr¨ nwald, and Wouter M. Koolen. Follow the leader u if you can, Hedge if you must. ArXiv, 1301.0534, January 2013. 9