nips nips2005 nips2005-53 nips2005-53-reference knowledge-graph by maker-knowledge-mining

53 nips-2005-Cyclic Equilibria in Markov Games

Source: pdf

Author: Martin Zinkevich, Amy Greenwald, Michael L. Littman

Abstract: Although variants of value iteration have been proposed for ﬁnding Nash or correlated equilibria in general-sum Markov games, these variants have not been shown to be effective in general. In this paper, we demonstrate by construction that existing variants of value iteration cannot ﬁnd stationary equilibrium policies in arbitrary general-sum Markov games. Instead, we propose an alternative interpretation of the output of value iteration based on a new (non-stationary) equilibrium concept that we call “cyclic equilibria.” We prove that value iteration identiﬁes cyclic equilibria in a class of games in which it fails to ﬁnd stationary equilibria. We also demonstrate empirically that value iteration ﬁnds cyclic equilibria in nearly all examples drawn from a random distribution of Markov games. 1

reference text

Bellman, R. (1957). Dynamic programming. Princeton, NJ: Princeton University Press. Brafman, R. I., & Tennenholtz, M. (2002). R-MAX—a general polynomial time algorithm for near-optimal reinforcement learning. Journal of Machine Learning Research, 3, 213–231. Greenwald, A., & Hall, K. (2003). Correlated Q-learning. Proceedings of the Twentieth International Conference on Machine Learning (pp. 242–249). Hu, J., & Wellman, M. (1998). Multiagent reinforcement learning:theoretical framework and an algorithm. Proceedings of the Fifteenth International Conference on Machine Learning (pp. 242–250). Morgan Kaufman. Littman, M. (2001). Friend-or-foe Q-learning in general-sum games. Proceedings of the Eighteenth International Conference on Machine Learning (pp. 322–328). Morgan Kaufmann. Littman, M. L., & Szepesv´ ri, C. (1996). A generalized reinforcement-learning model: a Convergence and applications. Proceedings of the Thirteenth International Conference on Machine Learning (pp. 310–318). Osborne, M. J., & Rubinstein, A. (1994). A Course in Game Theory. The MIT Press. Puterman, M. (1994). Markov decision processes: Discrete stochastic dynamic programming. Wiley-Interscience. Shapley, L. (1953). Stochastic games. Proceedings of the National Academy of Sciences of the United States of America, 39, 1095–1100. Tesauro, G., & Kephart, J. (1999). Pricing in agent economies using multi-agent Qlearning. Proceedings of Fifth European Conference on Symbolic and Quantitative Approaches to Reasoning with Uncertainty (pp. 71–86).