jmlr jmlr2012 jmlr2012-69 jmlr2012-69-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Tim van Erven, Mark D. Reid, Robert C. Williamson
Abstract: Mixability of a loss characterizes fast rates in the online learning setting of prediction with expert advice. The determination of the mixability constant for binary losses is straightforward but opaque. In the binary case we make this transparent and simpler by characterising mixability in terms of the second derivative of the Bayes risk of proper losses. We then extend this result to multiclass proper losses where there are few existing results. We show that mixability is governed by the maximum eigenvalue of the Hessian of the Bayes risk, relative to the Hessian of the Bayes risk for log loss. We conclude by comparing our result to other work that bounds prediction performance in terms of the geometry of the Bayes risk. Although all calculations are for proper losses, we also show how to carry the results across to improper losses. Keywords: mixability, multiclass, prediction with expert advice, proper loss, learning rates
Jacob Abernethy, Alekh Agarwal, Peter L. Bartlett, and Alexander Rakhlin. A stochastic view of optimal regret through minimax duality. In Proceedings of the 22nd Annual Conference on Learning Theory, 2009. Dominique Az´ and Jean-Paul Penot. Uniformly convex and uniformly smooth convex functions. e Annales de la facult´ des sciences de Toulouse, 4(4):705–730, 1995. e Nicol` Cesa-Bianchi and Gabor Lugosi. Prediction, Learning, and Games. Cambridge University o Press, 2006. Alexey Chernov and Vladimir Vovk. Prediction with advice of unknown number of experts. In Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence (UAI 2010), 2010. Alexey Chernov, Yuri Kalnishkan, Fedor Zhdanov, and Vladimir Vovk. Supermartingales in prediction with expert advice. Theoretical Computer Science, 411:2647–2669, 2010. Paul K. Fackler. Notes on matrix calculus. North Carolina State University, 2005. Wendell H. Fleming. Functions of Several Variables. Springer, 1977. Peter D. Gr¨ nwald. The Minimum Description Length Principle. MIT Press, Cambridge, MA, u 2007. David Haussler, Jyrki Kivinen, and Manfred K. Warmuth. Sequential prediction of individual sequences under general loss functions. IEEE Transactions on Information Theory, 44(5):1906– 1925, 1998. Jean-Baptiste Hiriart-Urruty and Claude Lemar´ chal. Convex Analysis and Minimization Algoe rithms: Part I: Fundamentals. Springer, Berlin, 1993. Roger A. Horn and Charles R. Johnson. Matrix Analysis. Cambridge University Press, 1985. Sham M. Kakade, Shai Shalev-Shwartz, and Ambuj Tewari. Regularization techniques for learning with matrices. arXiv:0910.0610v2, October 2010. 1662 M IXABILITY IS BAYES R ISK C URVATURE R ELATIVE TO L OG L OSS Yuri Kalnishkan and Michael V. Vyugin. The weak aggregating algorithm and weak mixability. Journal of Computer and System Sciences, 74:1228–1244, 2008. Yuri Kalnishkan, Volodya Vovk, and Michael V. Vyugin. Loss functions, complexities, and the Legendre transformation. Theoretical Computer Science, 313:195–207, 2004. Jan R. Magnus and Heinz Neudecker. Matrix Differential Calculus with Applications in Statistics and Econometrics (revised edition). John Wiley & Sons, Ltd., 1999. Mark D. Reid and Robert C. Williamson. Composite binary losses. Journal of Machine Learning Research, 11:2387–2422, 2010. Mark D. Reid and Robert C. Williamson. Information, divergence and risk for binary experiments. Journal of Machine Learning Research, 12:731–817, March 2011. John A. Thorpe. Elementary Topics in Differential Geometry. Springer, 1979. Elodie Vernet, Robert C. Williamson, and Mark D. Reid. Composite multiclass losses. Journal of Machine Learning Research, March 2012. To be submitted. Volodya Vovk. Aggregating strategies. In Proceedings of the Third Annual Workshop on Computational Learning Theory (COLT), pages 371–383, 1990. Volodya Vovk. A game of prediction with expert advice. In Proceedings of the Eighth Annual Conference on Computational Learning Theory, pages 51–60. ACM, 1995. Volodya Vovk. Competitive on-line statistics. International Statistical Review, 69(2):213–248, 2001. Volodya Vovk and Fedor Zhdanov. Prediction with expert advice for the Brier game. Journal of Machine Learning Research, 10:2445–2471, 2009. Constantin Zˇ linescu. On uniformly convex functions. Journal of Mathematical Analysis and a Applications, 95:344–374, 1983. 1663