jmlr jmlr2009 jmlr2009-49 jmlr2009-49-reference knowledge-graph by maker-knowledge-mining

49 jmlr-2009-Learning Permutations with Exponential Weights

Source: pdf

Author: David P. Helmbold, Manfred K. Warmuth

Abstract: We give an algorithm for the on-line learning of permutations. The algorithm maintains its uncertainty about the target permutation as a doubly stochastic weight matrix, and makes predictions using an efﬁcient method for decomposing the weight matrix into a convex combination of permutations. The weight matrix is updated by multiplying the current matrix entries by exponential factors, and an iterative procedure is needed to restore double stochasticity. Even though the result of this procedure does not have a closed form, a new analysis approach allows us to prove an optimal (up to small constant factors) bound on the regret of our algorithm. This regret bound is signiﬁcantly better than that of either Kalai and Vempala’s more efﬁcient Follow the Perturbed Leader algorithm or the computationally expensive method of explicitly representing each permutation as an expert. Keywords: permutation, ranking, on-line learning, Hedge algorithm, doubly stochastic matrix, relative entropy projection, Sinkhorn balancing

reference text

P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire. The nonstochastic multiarmed bandit problem. SIAM Journal on Computing, 32(1):48–77, 2002. K. Azoury and M. K. Warmuth. Relative loss bounds for on-line density estimation with the exponential family of distributions. Journal of Machine Learning, 43(3):211–246, June 2001. Special issue on Theoretical Advances in On-line Learning, Game Theory and Boosting, edited by Yoram Singer. H. Balakrishnan, I. Hwang, and C. Tomlin. Polynomial approximation algorithms for belief matrix maintenance in identity management. In 43rd IEEE Conference on Decision and Control, pages 4874–4879, December 2004. M. S. Bazaraa, J. J. Jarvis, and H. D. Sherali. Linear Programming and Network Flows. Wiley, second edition, 1977. R. Bhatia. Matrix Analysis. Springer-Verlag, Berlin, 1997. A. Blum, S. Chawla, and A. Kalai. Static optimality and dynamic search-optimality in lists and trees. Algorithmica, 36:249–260, 2003. O. Bousquet and M. K. Warmuth. Tracking a small set of experts by mixing past posteriors. Journal of Machine Learning Research, 3:363–396, 2002. L. M. Bregman. The relaxation method of ﬁnding the common point of convex sets and its application to the solution of problems in convex programming. USSR Computational Mathematics and Physics, 7:200–217, 1967. Y. Censor and A. Lent. An iterative row-action method for interval convex programming. Journal of Optimization Theory and Applications, 34(3):321–353, July 1981. N. Cesa-Bianchi and G. Lugosi. Prediction, Learning, and Games. Cambridge University Press, 2006. N. Cesa-Bianchi and G. Lugosi. Combinatorial bandits. In Proceedings of the 22nd Annual Conference on Learning Theory (COLT 09), 2009. 1734 L EARNING P ERMUTATIONS N. Cesa-Bianchi, Y. Freund, D. Haussler, D. P. Helmbold, R. E. Schapire, and M. K. Warmuth. How to use expert advice. Journal of the ACM, 44(3):427–485, May 1997. Y. Chen, L. Fortnow, N. Lambert, D. Pennock, and J. Wortman. Complexity of combinatorial market makers. In Ninth ACM Conference on Electronic Commerce (EC ’08). ACM Press, July 2008. J. Franklin and J. Lorenz. On the scaling of multidimensional matrices. Linear Algebra and its applications, 114/115:717–735, 1989. Y. Freund and R. E. Schapire. A decision-theoretic generalization of on-line learning and an application to Boosting. Journal of Computer and System Sciences, 55(1):119–139, August 1997. M. F¨ rer. Quadratic convergence for scaling of matrices. In Proceedings of ALENEX/ANALCO, u pages 216–223. SIAM, 2004. Geoffrey J. Gordon. No-regret algorithms for online convex programs. In Bernhard Sch¨ lkopf, o John C. Platt, and Thomas Hoffman, editors, NIPS, pages 489–496. MIT Press, 2006. D. P. Helmbold and R. E. Schapire. Predicting nearly as well as the best pruning of a decision tree. Machine Learning, 27(01):51–68, 1997. D. P. Helmbold, J. Kivinen, and M. K. Warmuth. Relative loss bounds for single neurons. IEEE Transactions on Neural Networks, 10(6):1291–1304, November 1999. M. Herbster and M. K. Warmuth. Tracking the best expert. Machine Learning, 32(2):151–178, 1998. Earlier version in 12th ICML, 1995. M. Herbster and M. K. Warmuth. Tracking the best linear predictor. Journal of Machine Learning Research, 1:281–309, 2001. J. Huang, C. Guestrin, and L. Guibas. Fourier theoretic probabilistic inference over permutations. Journal of Machine Learning Research, 10:997–1070, 2009. M. Jerrum, A. Sinclair, and E. Vigoda. A polynomial-time approximation algorithm for the permanent of a matrix with nonnegative entries. Journal of the ACM, 51(4):671–697, July 2004. A. Kalai. Simulating weighted majority with FPL. Private communication, 2005. A. Kalai and S. Vempala. Efﬁcient algorithms for online decision problems. J. Comput. Syst. Sci., 71(3):291–307, 2005. Special issue Learning Theory 2003. B. Kalantari and L. Khachiyan. On the complexity of nonnegative-matrix scaling. Linear Algebra and its applications, 240:87–103, 1996. J. Kivinen and M. K. Warmuth. Additive versus exponentiated gradient updates for linear prediction. Information and Computation, 132(1):1–64, January 1997. J. Kivinen and M. K. Warmuth. Averaging expert predictions. In Computational Learning Theory, 4th European Conference (EuroCOLT ’99), Nordkirchen, Germany, March 29-31, 1999, Proceedings, volume 1572 of Lecture Notes in Artiﬁcial Intelligence, pages 153–167. Springer, 1999. 1735 H ELMBOLD AND WARMUTH R. Kondor, A. Howard, and T. Jebara. Multi-object tracking with representations of the symmetric group. In Proc. of the 11th International Conference on Artiﬁcial Intelligence and Statistics, March 2007. D. Kuzmin and M. K. Warmuth. Optimum follow the leader algorithm. In Proceedings of the 18th Annual Conference on Learning Theory (COLT ’05), pages 684–686. Springer-Verlag, June 2005. Open problem. D. Kuzmin and M. K. Warmuth. Online Kernel PCA with entropic matrix updates. In Proceedings of the 24rd international conference on Machine learning (ICML ’07), pages 465–471. ACM International Conference Proceedings Series, June 2007. N. Linial, A. Samorodnitsky, and A. Wigderson. A deterministic strongly polynomial algorithm for matrix scaling and approximate permanents. Combinatorica, 20(4):545–568, 2000. N. Littlestone and M. K. Warmuth. The weighted majority algorithm. Inform. Comput., 108(2): 212–261, 1994. Preliminary version in FOCS ’89. D. McAllester. PAC-Bayesian stochastic model selection. Machine Learning, 51(1):5–21, 2003. R. Sinkhorn. A relationship between arbitrary positive matrices and doubly stochastic matrices. The Annals of Mathematical Staticstics, 35(2):876–879, June 1964. E. Takimoto and M. K. Warmuth. Path kernels and multiplicative updates. Journal of Machine Learning Research, 4:773–818, 2003. V. Vovk. Aggregating strategies. In Proceedings of the Third Annual Workshop on Computational Learning Theory, pages 371–383. Morgan Kaufmann, 1990. M. K. Warmuth and D. Kuzmin. Randomized PCA algorithms with regret bounds that are logarithmic in the dimension. Journal of Machine Learning Research, 9:2217–2250, 2008. M. K. Warmuth and S.V.N. Vishwanathan. Leaving the span. In Proceedings of the 18th Annual Conference on Learning Theory (COLT ’05), Bertinoro, Italy, June 2005. Springer-Verlag. Journal version in progress. 1736