nips nips2003 nips2003-65 nips2003-65-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Gerald Tesauro
Abstract: Recent multi-agent extensions of Q-Learning require knowledge of other agents’ payoffs and Q-functions, and assume game-theoretic play at all times by all other agents. This paper proposes a fundamentally different approach, dubbed “Hyper-Q” Learning, in which values of mixed strategies rather than base actions are learned, and in which other agents’ strategies are estimated from observed actions via Bayesian inference. Hyper-Q may be effective against many different types of adaptive agents, even if they are persistently dynamic. Against certain broad categories of adaptation, it is argued that Hyper-Q may converge to exact optimal time-varying policies. In tests using Rock-Paper-Scissors, Hyper-Q learns to significantly exploit an Infinitesimal Gradient Ascent (IGA) player, as well as a Policy Hill Climber (PHC) player. Preliminary analysis of Hyper-Q against itself is also presented. 1
[1] M. Bowling. Convergence problems of general-sum multiagent reinforcement learning. In Proceedings of ICML-00, pages 89–94, 2000.
[2] M. Bowling and M. Veloso. Multiagent learning using a variable learning rate. Artificial Intelligence, 136:215–250, 2002.
[3] Y.-H. Chang and L. P. Kaelbling. Playing is believing: the role of beliefs in multi-agent learning. In Proceedings of NIPS-2001. MIT Press, 2002.
[4] S. J. Hong, J. Hosking, and R. Natarajan. Multiplicative adjustment of class probability: educating naive Bayes. Technical Report RC-22393, IBM Research, 2002.
[5] J. Hu and M. P. Wellman. Multiagent reinforcement learning: theoretical framework and an algorithm. In Proceedings of ICML-98, pages 242–250. Morgan Kaufmann, 1998.
[6] M. Kearns and Y. Mansour. Efficient Nash computation in large population games with bounded influence. In Proceedings of UAI-02, pages 259–266, 2002.
[7] M. L. Littman. Markov games as a framework for multi-agent reinforcement learning. In Proceedings of ICML-94, pages 157–163. Morgan Kaufmann, 1994.
[8] M. L. Littman. Friend-or-Foe Q-learning in general-sum games. In Proceedings of ICML-01. Morgan Kaufmann, 2001.
[9] R. Munos. A convergent reinforcement learning algorithm in the continuous case based on a finite difference method. In Proceedings of IJCAI-97, pages 826–831. Morgan Kaufman, 1997.
[10] S. Singh, M. Kearns, and Y. Mansour. Nash convergence of gradient dynamics in general-sum games. In Proceedings of UAI-2000, pages 541–548. Morgan Kaufman, 2000.
[11] W. D. Smart and L. P. Kaelbling. Practical reinforcement learning in continuous spaces. In Proceedings of ICML-00, pages 903–910, 2000.
[12] W. T. B. Uther and M. M. Veloso. Tree based discretization for continuous state space reinforcement learning. In Proceedings of AAAI-98, pages 769–774, 1998.
[13] C. Watkins. Learning from Delayed Rewards. PhD thesis, Cambridge University, 1989.
[14] J. W. Weibull. Evolutionary Game Theory. The MIT Press, 1995.