nips nips2011 nips2011-300 nips2011-300-reference knowledge-graph by maker-knowledge-mining

300 nips-2011-Variance Reduction in Monte-Carlo Tree Search


Source: pdf

Author: Joel Veness, Marc Lanctot, Michael Bowling

Abstract: Monte-Carlo Tree Search (MCTS) has proven to be a powerful, generic planning technique for decision-making in single-agent and adversarial environments. The stochastic nature of the Monte-Carlo simulations introduces errors in the value estimates, both in terms of bias and variance. Whilst reducing bias (typically through the addition of domain knowledge) has been studied in the MCTS literature, comparatively little effort has focused on reducing variance. This is somewhat surprising, since variance reduction techniques are a well-studied area in classical statistics. In this paper, we examine the application of some standard techniques for variance reduction in MCTS, including common random numbers, antithetic variates and control variates. We demonstrate how these techniques can be applied to MCTS and explore their efficacy on three different stochastic, single-agent settings: Pig, Can’t Stop and Dominion. 1


reference text

[1] Peter Auer. Using confidence bounds for exploitation-exploration trade-offs. JMLR, 3:397– 422, 2002.

[2] Radha-Krishna Balla and Alan Fern. UCT for Tactical Assault Planning in Real-Time Strategy Games. In IJCAI, pages 40–45, 2009.

[3] Dimitri P. Bertsekas and David A. Castanon. Rollout algorithms for stochastic scheduling problems. Journal of Heuristics, 5(1):89–108, 1999.

[4] Dimitri P. Bertsekas and John N. Tsitsiklis. Neuro-Dynamic Programming. Athena Scientific, 1st edition, 1996.

[5] Hyeong S. Chang, Michael C. Fu, Jiaqiao Hu, and Steven I. Marcus. An Adaptive Sampling Algorithm for Solving Markov Decision Processes. Operations Research, 53(1):126–139, January 2005.

[6] Guillaume Chaslot, Sander Bakkes, Istvan Szita, and Pieter Spronck. Monte-Carlo Tree Search: A New Framework for Game AI. In Fourth Artificial Intelligence and Interactive Digital Entertainment Conference (AIIDE 2008), 2008.

[7] Guillaume M. Chaslot, Mark H. Winands, and H. Jaap Herik. Parallel Monte-Carlo Tree Search. In Proceedings of the 6th International Conference on Computers and Games, pages 60–71, Berlin, Heidelberg, 2008. Springer-Verlag.

[8] Guillaume M.J-B. Chaslot, Mark H.M. Winands, Istvan Szita, and H. Jaap van den Herik. Cross-entropy for Monte-Carlo Tree Search. ICGA, 31(3):145–156, 2008.

[9] R´ mi Coulom. Efficient selectivity and backup operators in Monte-Carlo tree search. In Proe ceedings Computers and Games 2006. Springer-Verlag, 2006.

[10] Hilmar Finnsson and Yngvi Bjornsson. Simulation-based Approach to General Game Playing. In Twenty-Third AAAI Conference on Artificial Intelligence (AAAI 2008), pages 259–264, 2008.

[11] S. Gelly and D. Silver. Combining online and offline learning in UCT. In Proceedings of the 17th International Conference on Machine Learning, pages 273–280, 2007.

[12] Sylvain Gelly and Yizao Wang. Exploration exploitation in Go: UCT for Monte-Carlo Go. In NIPS Workshop on On-line trading of Exploration and Exploitation, 2006.

[13] Sylvain Gelly, Yizao Wang, R´ mi Munos, and Olivier Teytaud. Modification of UCT with e patterns in Monte-Carlo Go. Technical Report 6062, INRIA, France, November 2006.

[14] Michael J. Kearns, Yishay Mansour, and Andrew Y. Ng. A sparse sampling algorithm for near-optimal planning in large Markov Decision Processes. In IJCAI, pages 1324–1331, 1999.

[15] Levente Kocsis and Csaba Szepesv´ ri. Bandit based Monte-Carlo planning. In ECML, pages a 282–293, 2006.

[16] Todd W. Neller and Clifton G.M. Presser. Practical play of the dice game pig. Undergraduate Mathematics and Its Applications, 26(4):443–458, 2010.

[17] Barry L. Nelson. Control variate remedies. Operations Research, 38(6):pp. 974–992, 1990.

[18] Stuart Russell and Peter Norvig. Artificial Intelligence: A Modern Approach. Prentice-Hall, Englewood Cliffs, NJ, 2nd edition edition, 2003.

[19] Sid Sackson. Can’t Stop. Ravensburger, 1980.

[20] John Scarne. Scarne on dice. Harrisburg, PA: Military Service Publishing Co, 1945.

[21] David Silver and Gerald Tesauro. Monte-Carlo simulation balancing. In ICML, page 119, 2009.

[22] David Silver and Joel Veness. Monte-Carlo Planning in Large POMDPs. In Advances in Neural Information Processing Systems 23, pages 2164–2172, 2010.

[23] Csaba Szepesv´ ri. Reinforcement learning algorithms for MDPs, 2009. a

[24] Donald X. Vaccarino. Dominion. Rio Grande Games, 2008.

[25] Martha White and Michael Bowling. Learning a value analysis tool for agent evaluation. In Proceedings of the Twenty-First International Joint Conference on Artificial Intelligence (IJCAI), pages 1976–1981, 2009. 9