nips nips2009 nips2009-16 nips2009-16-reference knowledge-graph by maker-knowledge-mining

16 nips-2009-A Smoothed Approximate Linear Program


Source: pdf

Author: Vijay Desai, Vivek Farias, Ciamac C. Moallemi

Abstract: We present a novel linear program for the approximation of the dynamic programming cost-to-go function in high-dimensional stochastic control problems. LP approaches to approximate DP naturally restrict attention to approximations that are lower bounds to the optimal cost-to-go function. Our program – the ‘smoothed approximate linear program’ – relaxes this restriction in an appropriate fashion while remaining computationally tractable. Doing so appears to have several advantages: First, we demonstrate superior bounds on the quality of approximation to the optimal cost-to-go function afforded by our approach. Second, experiments with our approach on a challenging problem (the game of Tetris) show that the approach outperforms the existing LP approach (which has previously been shown to be competitive with several ADP algorithms) by an order of magnitude. 1


reference text

[1] D. P. de Farias and B. Van Roy. The linear programming approach to approximate dynamic programming. Operations Research, 51(6):850–865, 2003.

[2] D. P. de Farias and B. Van Roy. On constraint sampling in the linear programming approach to approximate dynamic programming. Mathematics of Operations Research, 293(3):462–478, 2004.

[3] V. F. Farias and B. Van Roy. Tetris: A study of randomized constraint sampling. In Probabilistic and Randomized Methods for Design Under Uncertainty. Springer-Verlag, 2006.

[4] J. Brzustowski. Can you win at Tetris? Master’s thesis, University of British Columbia, 1992.

[5] H. Burgiel. How to lose at Tetris. Mathematical Gazette, page 194, 1997.

[6] E. D. Demaine, S. Hohenberger, and D. Liben-Nowell. Tetris is hard, even to approximate. In Proceedings of the 9th International Computing and Combinatorics Conference, 2003.

[7] D. P. Bertsekas and S. Ioffe. Temporal differences–based policy iteration and applications in neuro–dynamic programming. Technical Report LIDS–P–2349, MIT Laboratory for Information and Decision Systems, 1996.

[8] D. P. Bertsekas and J. N. Tsitsiklis. Neuro-Dynamic Programming. Athena Scientific, Belmont, MA, 1996.

[9] S. Kakade. A natural policy gradient. In Advances in Neural Information Processing Systems 14, Cambridge, MA, 2002. MIT Press.

[10] I. Szita and A. L˝rincz. Learning Tetris using the noisy cross-entropy method. Neural o Computation, 18:2936–2941, 2006.

[11] D. Haussler. Decision theoretic generalizations of the PAC model for neural net and other learning applications. Information and Computation, 100:78–150, 1992. 9