nips nips2004 nips2004-48 nips2004-48-reference knowledge-graph by maker-knowledge-mining

48 nips-2004-Convergence and No-Regret in Multiagent Learning

Source: pdf

Author: Michael Bowling

Abstract: Learning in a multiagent system is a challenging problem due to two key factors. First, if other agents are simultaneously learning then the environment is no longer stationary, thus undermining convergence guarantees. Second, learning is often susceptible to deception, where the other agents may be able to exploit a learner’s particular dynamics. In the worst case, this could result in poorer performance than if the agent was not learning at all. These challenges are identiﬁable in the two most common evaluation criteria for multiagent learning algorithms: convergence and regret. Algorithms focusing on convergence or regret in isolation are numerous. In this paper, we seek to address both criteria in a single algorithm by introducing GIGA-WoLF, a learning algorithm for normalform games. We prove the algorithm guarantees at most zero average regret, while demonstrating the algorithm converges in many situations of self-play. We prove convergence in a limited setting and give empirical results in a wider variety of situations. These results also suggest a third new learning criterion combining convergence and regret, which we call negative non-convergence regret (NNR). 1

reference text

[1] Michael Bowling. Convergence and no-regret in multiagent learning. Technical Report TR04-11, Department of Computing Science, University of Alberta, 2004.

[2] Michael L. Littman. Markov games as a framework for multi-agent reinforcement learning. In Proceedings of the Eleventh International Conference on Machine Learning, pages 157–163, 1994.

[3] Junling Hu and Michael P. Wellman. Multiagent reinforcement learning: Theoretical framework and an algorithm. In Proceedings of the Fifteenth International Conference on Machine Learning, pages 242–250, 1998.

[4] Amy Greenwald and Keith Hall. Correlated Q-learning. In Proceedings of the AAAI Spring Symposium Workshop on Collaborative Learning Agents, 2002.

[5] Caroline Claus and Craig Boutilier. The dynamics of reinforcement learning in cooperative multiagent systems. In Proceedings of the Fifteenth National Conference on Artiﬁcial Intelligence, pages 746–752, 1998.

[6] Satinder Singh, Michael Kearns, and Yishay Mansour. Nash convergence of gradient dynamics in general-sum games. In Proceedings of the Sixteenth Conference on Uncertainty in Artiﬁcial Intelligence, pages 541–548, 2000.

[7] Michael Bowling and Manuela Veloso. Multiagent learning using a variable learning rate. Artiﬁcial Intelligence, 136:215–250, 2002.

[8] Yu-Han Chang and Leslie Pack Kaelbling. Playing is believing: the role of beliefs in multi-agent learning. In Advances in Neural Information Processing Systems 14, 2001.

[9] Sergiu Hart and Andreu Mas-Colell. A simple adaptive procedure leading to correlated equilibrium. Econometrica, 68:1127–1150, 2000.

[10] Peter Auer, Nicol` Cesa-Bianchi, Yoav Freund, and Robert E. Schapire. Gambling in o a rigged casino: The adversarial multi-arm bandit problem. In 36th Annual Symposium on Foundations of Computer Science, pages 322–331, 1995.

[11] Martin Zinkevich. Online convex programming and generalized inﬁnitesimal gradient ascent. In Proceedings of the Twentieth International Conference on Machine Learning, pages 928–925, 2003.

[12] Amir Jafari, Amy Greenwald, David Gondek, and Gunes Ercal. On no-regret learning, ﬁctitious play, and nash equilibrium. In Proceedings of the Eighteenth International Conference on Machine Learning, pages 226–223, 2001.

[13] G. M. Korpelevich. The extragradient method for ﬁnding saddle points and other problems. Matecon, 12:747–756, 1976.