nips nips2011 nips2011-145 nips2011-145-reference knowledge-graph by maker-knowledge-mining

145 nips-2011-Learning Eigenvectors for Free

Source: pdf

Author: Wouter M. Koolen, Wojciech Kotlowski, Manfred K. Warmuth

Abstract: We extend the classical problem of predicting a sequence of outcomes from a ﬁnite alphabet to the matrix domain. In this extension, the alphabet of n outcomes is replaced by the set of all dyads, i.e. outer products uu where u is a vector in Rn of unit length. Whereas in the classical case the goal is to learn (i.e. sequentially predict as well as) the best multinomial distribution, in the matrix case we desire to learn the density matrix that best explains the observed sequence of dyads. We show how popular online algorithms for learning a multinomial distribution can be extended to learn density matrices. Intuitively, learning the n2 parameters of a density matrix is much harder than learning the n parameters of a multinomial distribution. Completely surprisingly, we prove that the worst-case regrets of certain classical algorithms and their matrix generalizations are identical. The reason is that the worst-case sequence of dyads share a common eigensystem, i.e. the worst case regret is achieved in the classical case. So these matrix algorithms learn the eigenvectors without any regret. 1

reference text

[1] S. Arora, E. Hazan, and S. Kale. Fast algorithms for approximate semideﬁnite programming using the multiplicative weights update method. In FOCS, pages 339–348, 2005.

[2] S. Arora and S. Kale. A combinatorial, primal-dual approach to semideﬁnite programs. In STOC, pages 227–236, 2007.

[3] K. S. Azoury and M. K. Warmuth. Relative loss bounds for on-line density estimation with the exponential family of distributions. Machine Learning, 43(3):211–246, 2001.

[4] J. M. Bernardo and A. F. M. Smith. Bayesian Theory. Wiley, 1994.

[5] K. Bostroem and T. Felbinger. Lossless quantum data compression and variable-length coding. Phys. Rev. A, 65(3):032313, 2002.

[6] N. Cesa-Bianchi and G. Lugosi. Prediction, Learning, and Games. Cambridge University Press, New York, NY, USA, 2006.

[7] T. M. Cover and J. A. Thomas. Elements of Information Theory. John Wiley & Sons, 1991.

[8] R. Jain, Z. Ji, S. Upadhyay, and J. Watrous. QIP = PSPACE. In Proceedings of the 42nd ACM Symposium on Theory of Computing, STOC, pages 573–582, 2010.

[9] P. Kontkanen and P. Myllym¨ ki. A fast normalized maximum likelihood algorithm for multia nomial data. In Proceedings of the Nineteenth International Joint Conference on Artiﬁcial Intelligence (IJCAI-05), pages 1613–1616, 2005.

[10] R. E. Krichevsky and V. K. Troﬁmov. The performance of universal encoding. IEEE Transactions on Information Theory, 27(2):199–207, Mar. 1981.

[11] J. T. Kwok and H. Zhao. Incremental eigen decomposition. In IN PROC. ICANN, pages 270–273, 2003.

[12] M. A. Nielsen and I. L. Chuang. Quantum Computation and Quantum Information. Cambridge University Press, 2000.

[13] B. Schumacher and M. D. Westmoreland. Indeterminate-length quantum coding. Phys. Rev. A, 64(4):042304, 2001.

[14] Y. M. Shtarkov. Universal sequential coding of single messages. Problems of Information Transmission, 23(3):3–17, 1987.

[15] M. Sion. On general minimax theorems. Paciﬁc Jouronal of Mathematics, 8(1):171–176, 1958.

[16] E. Takimoto and M. Warmuth. The last-step minimax algorithm. In Proceedings of the 13th Annual Conference on Computational Learning Theory, pages 100–106, 2000.

[17] K. Tsuda, G. R¨ tsch, and M. K. Warmuth. Matrix exponentiated gradient updates for on-line a learning and Bregman projections. Journal of Machine Learning Research, 6:995–1018, June 2005.

[18] M. K. Warmuth. When is there a free matrix lunch. In Proc. of the 20th Annual Conference on Learning Theory (COLT ’07). Springer-Verlag, June 2007. Open problem.

[19] M. K. Warmuth and D. Kuzmin. Online variance minimization. In Proceedings of the 19th Annual Conference on Learning Theory (COLT ’06), Pittsburg, June 2006. Springer-Verlag.

[20] M. K. Warmuth and D. Kuzmin. Bayesian generalized probability calculus for density matrices. Journal of Machine Learning, 78(1-2):63–101, January 2010. 9