nips nips2009 nips2009-221 nips2009-221-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Liam M. Dermed, Charles L. Isbell
Abstract: Solving multi-agent reinforcement learning problems has proven difficult because of the lack of tractable algorithms. We provide the first approximation algorithm which solves stochastic games with cheap-talk to within absolute error of the optimal game-theoretic solution, in time polynomial in 1/ . Our algorithm extends Murray’s and Gordon’s (2007) modified Bellman equation which determines the set of all possible achievable utilities; this provides us a truly general framework for multi-agent learning. Further, we empirically validate our algorithm and find the computational cost to be orders of magnitude less than what the theory predicts. 1
Barber, C. B., Dobkin, D. P., & Huhdanpaa, H. (1995). The quickhull algorithm for convex hulls. ACM Transactions on Mathematical Software, 22, 469–483. Branke, J., Deb, K., Miettinen, K., & Steuer, R. E. (Eds.). (2005). Practical approaches to multiobjective optimization, 7.-12. november 2004, vol. 04461 of Dagstuhl Seminar Proceedings. Internationales Begegnungs- und Forschungszentrum (IBFI), Schloss Dagstuhl, Germany IBFI, Schloss Dagstuhl, Germany. Chan, T. M. (2003). Faster core-set constructions and data stream algorithms in fixed dimensions. Comput. Geom. Theory Appl (pp. 152–159). Chen, L. (2005). New analysis of the sphere covering problems and optimal polytope approximation of convex bodies. J. Approx. Theory, 133, 134–145. Clarkson, K. L. (1993). Algorithms for polytope covering and approximation, and for approximate closest-point queries. Greenwald, A., & Hall, K. (2003). Correlated-q learning. Proceedings of the Twentieth International Conference on Machine Learning (pp. 242–249). Littman, M. L. (2001). Friend-or-foe Q-learning in general-sum games. Proc. 18th International Conf. on Machine Learning (pp. 322–328). Morgan Kaufmann, San Francisco, CA. Littman, M. Z. . A. G. . M. L. (2005). Cyclic equilibria in markov games. Proceedings of Neural Information Processing Systems. Vancouver, BC, Canada. Lopez, M. A., & Reisner, S. (2002). Linear time approximation of 3d convex polytopes. Comput. Geom. Theory Appl., 23, 291–301. Murray, C., & Gordon, G. (June 2007). Finding correlated equilibria in general sum stochastic games (Technical Report). School of Computer Science, Carnegie Mellon University. Murray, C., & Gordon, G. J. (2007). Multi-robot negotiation: Approximating the set of subgame perfect equilibria in general-sum stochastic games. In B. Sch¨ lkopf, J. Platt and T. Hoffman o (Eds.), Advances in neural information processing systems 19, 1001–1008. Cambridge, MA: MIT Press. Shoham, Yoav, P., & Grenager (2006). If multi-agent learning is the answer, what is the question? Artificial Intelligence. Shoham, Y., Powers, R., & Grenager, T. (2003). Multi-agent reinforcement learning: a critical survey (Technical Report). Steuer, R. E. (2006). Adbase: A multiple objective linear programming solver for efficient extreme points and unbounded efficient edges. 9