nips nips2004 nips2004-171 nips2004-171-reference knowledge-graph by maker-knowledge-mining

171 nips-2004-Solitaire: Man Versus Machine


Source: pdf

Author: Xiang Yan, Persi Diaconis, Paat Rusmevichientong, Benjamin V. Roy

Abstract: In this paper, we use the rollout method for policy improvement to analyze a version of Klondike solitaire. This version, sometimes called thoughtful solitaire, has all cards revealed to the player, but then follows the usual Klondike rules. A strategy that we establish, using iterated rollouts, wins about twice as many games on average as an expert human player does. 1


reference text

[1] R. Bellman. Applied Dynamic Programming. Princeton University Press, 1957.

[2] D. Bertsekas and J.N. Tsitsiklis. Neuro-Dynamic Programming. Athena Scientific, 1996.

[3] D. P. Bertsekas, J. N. Tsitsiklis, and C. Wu, Rollout Algorithms for Combinatorial Optimization. Journal of Heuristics, 3:245-262, 1997.

[4] D. P. Bertsekas and D. A. Casta˜ on. Rollout Algorithms for Stochastic Scheduling n Problems. Journal of Heuristics, 5:89-108, 1999.

[5] D. Bertsimas and R. Demir. An Approximate Dynamic Programming Approach to Multi-dimensional Knapsack Problems. Management Science, 4:550-565, 2002.

[6] D. Bertsimas and I. Popescu. Revenue Management in a Dynamic Network Environment. Transportation Science, 37:257-277, 2003.

[7] R. Howard. Dynamic Programming and Markov Processes. MIT Press, 1960.

[8] A. McGovern, E. Moss, and A. Barto. Building a Basic Block Instruction Scheduler Using Reinforcement Learning and Rollouts. Machine Learning, 49:141-160, 2002.

[9] Y. Mansour and S. Singh. On the Complexity of Policy Iteration. In Fifteenth Conference on Uncertainty in Artificial Intelligence, 1999.

[10] D. Parlett. A History of Card Games. Oxford University Press, 1991.

[11] N. Secomandi. Analysis of a Rollout Approach to Sequencing Problems with Stochastic Routing Applications. Journal of Heuristics, 9:321-352, 2003.

[12] N. Secomandi. A Rollout Policy for the Vehicle Routing Problem with Stochastic Demands. Operations Research, 49:796-802, 2001.

[13] G. Tesauro and G. Galperin. On-line Policy Improvement Using Monte-Carlo Search. In Advances in Neural Information Processing Systems, 9:1068-1074, 1996.