nips nips2007 nips2007-55 nips2007-55-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Michael Johanson, Martin Zinkevich, Michael Bowling
Abstract: Adaptation to other initially unknown agents often requires computing an effective counter-strategy. In the Bayesian paradigm, one must find a good counterstrategy to the inferred posterior of the other agents’ behavior. In the experts paradigm, one may want to choose experts that are good counter-strategies to the other agents’ expected behavior. In this paper we introduce a technique for computing robust counter-strategies for adaptation in multiagent scenarios under a variety of paradigms. The strategies can take advantage of a suspected tendency in the decisions of the other agents, while bounding the worst-case performance when the tendency is not observed. The technique involves solving a modified game, and therefore can make use of recently developed algorithms for solving very large extensive games. We demonstrate the effectiveness of the technique in two-player Texas Hold’em. We show that the computed poker strategies are substantially more robust than best response counter-strategies, while still exploiting a suspected tendency. We also compose the generated strategies in an experts algorithm showing a dramatic improvement in performance over using simple best responses. 1
[ACBF02] P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite time analysis of the multiarmed bandit problem. Machine Learning, 47:235–256, 2002. [BBD+ 03] D. Billings, N. Burch, A. Davidson, R. Holte, J. Schaeffer, T. Schauenberg, and D. Szafron. Approximating game-theoretic optimal strategies for full-scale poker. In International Joint Conference on Artificial Intelligence, pages 661–668, 2003. [CM96] David Carmel and Shaul Markovitch. Learning models of intelligent agents. In Proceedings of the Thirteenth National Conference on Artificial Intelligence, Menlo Park, CA, 1996. AAAI Press. [GHPS07] A. Gilpin, S. Hoda, J. Pena, and T. Sandholm. Gradient-based algorithms for finding nash equilibria in extensive form games. In Proceedings of the Eighteenth International Conference on Game Theory, 2007. [GS06] A. Gilpin and T. Sandholm. A competitive texas hold’em poker player via automated abstraction and real-time equilibrium computation. In National Conference on Artificial Intelligence, 2006. [JZB07] Michael Johanson, Martin Zinkevich, and Michael Bowling. Computing robust counter-strategies. Technical Report TR07-15, Department of Computing Science, University of Alberta, 2007. [MB04] Peter McCracken and Michael Bowling. Safe strategies for agent modelling in games. In AAAI Fall Symposium on Artificial Multi-agent Learning, October 2004. [Rob51] Julia Robinson. An iterative method of solving a game. Annals of Mathematics, 54:296–301, 1951. [RV02] Patrick Riley and Manuela Veloso. Planning for distributed execution through use of probabilistic opponent models. In Proceedings of the Sixth International Conference on AI Planning and Scheduling, pages 77–82, April 2002. [Sch06] T.C. Schauenberg. Opponent modelling and search in poker. Master’s thesis, University of Alberta, 2006. [ZBB07] M. Zinkevich, M. Bowling, and N. Burch. A new algorithm for generating strong strategies in massive zero-sum games. In Proceedings of the Twenty-Seventh Conference on Artificial Intelligence (AAAI), 2007. To Appear. [ZJBP08] M. Zinkevich, M. Johanson, M. Bowling, and C. Piccione. Regret minimization in games with incomplete information. In Neural Information Processing Systems 21, 2008. [ZL06] M. Zinkevich and M. Littman. The AAAI computer poker competition. Journal of the International Computer Games Association, 29, 2006. News item. 8