nips nips2009 nips2009-221 nips2009-221-reference knowledge-graph by maker-knowledge-mining

221 nips-2009-Solving Stochastic Games

Source: pdf

Author: Liam M. Dermed, Charles L. Isbell

Abstract: Solving multi-agent reinforcement learning problems has proven difﬁcult because of the lack of tractable algorithms. We provide the ﬁrst approximation algorithm which solves stochastic games with cheap-talk to within absolute error of the optimal game-theoretic solution, in time polynomial in 1/ . Our algorithm extends Murray’s and Gordon’s (2007) modiﬁed Bellman equation which determines the set of all possible achievable utilities; this provides us a truly general framework for multi-agent learning. Further, we empirically validate our algorithm and ﬁnd the computational cost to be orders of magnitude less than what the theory predicts. 1

reference text

Barber, C. B., Dobkin, D. P., & Huhdanpaa, H. (1995). The quickhull algorithm for convex hulls. ACM Transactions on Mathematical Software, 22, 469–483. Branke, J., Deb, K., Miettinen, K., & Steuer, R. E. (Eds.). (2005). Practical approaches to multiobjective optimization, 7.-12. november 2004, vol. 04461 of Dagstuhl Seminar Proceedings. Internationales Begegnungs- und Forschungszentrum (IBFI), Schloss Dagstuhl, Germany IBFI, Schloss Dagstuhl, Germany. Chan, T. M. (2003). Faster core-set constructions and data stream algorithms in ﬁxed dimensions. Comput. Geom. Theory Appl (pp. 152–159). Chen, L. (2005). New analysis of the sphere covering problems and optimal polytope approximation of convex bodies. J. Approx. Theory, 133, 134–145. Clarkson, K. L. (1993). Algorithms for polytope covering and approximation, and for approximate closest-point queries. Greenwald, A., & Hall, K. (2003). Correlated-q learning. Proceedings of the Twentieth International Conference on Machine Learning (pp. 242–249). Littman, M. L. (2001). Friend-or-foe Q-learning in general-sum games. Proc. 18th International Conf. on Machine Learning (pp. 322–328). Morgan Kaufmann, San Francisco, CA. Littman, M. Z. . A. G. . M. L. (2005). Cyclic equilibria in markov games. Proceedings of Neural Information Processing Systems. Vancouver, BC, Canada. Lopez, M. A., & Reisner, S. (2002). Linear time approximation of 3d convex polytopes. Comput. Geom. Theory Appl., 23, 291–301. Murray, C., & Gordon, G. (June 2007). Finding correlated equilibria in general sum stochastic games (Technical Report). School of Computer Science, Carnegie Mellon University. Murray, C., & Gordon, G. J. (2007). Multi-robot negotiation: Approximating the set of subgame perfect equilibria in general-sum stochastic games. In B. Sch¨ lkopf, J. Platt and T. Hoffman o (Eds.), Advances in neural information processing systems 19, 1001–1008. Cambridge, MA: MIT Press. Shoham, Yoav, P., & Grenager (2006). If multi-agent learning is the answer, what is the question? Artiﬁcial Intelligence. Shoham, Y., Powers, R., & Grenager, T. (2003). Multi-agent reinforcement learning: a critical survey (Technical Report). Steuer, R. E. (2006). Adbase: A multiple objective linear programming solver for efﬁcient extreme points and unbounded efﬁcient edges. 9