nips nips2007 nips2007-162 nips2007-162-reference knowledge-graph by maker-knowledge-mining

162 nips-2007-Random Sampling of States in Dynamic Programming

Source: pdf

Author: Chris Atkeson, Benjamin Stephens

Abstract: We combine three threads of research on approximate dynamic programming: sparse random sampling of states, value function and policy approximation using local models, and using local trajectory optimizers to globally optimize a policy and associated value function. Our focus is on ﬁnding steady state policies for deterministic time invariant discrete time control problems with continuous states and actions often found in robotics. In this paper we show that we can now solve problems we couldn’t solve previously. 1

reference text

[1] J. Rust. Using randomization to break the curse of dimensionality. Econometrica, 65(3):487– 516, 1997.

[2] M. Hauskrecht. Incremental methods for computing bounds in partially observable Markov decision processes. In Proceedings of the 14th National Conference on Artiﬁcial Intelligence (AAAI-97), pages 734–739, Providence, Rhode Island, 1997. AAAI Press / MIT Press.

[3] N.L. Zhang and W. Zhang. Speeding up the convergence of value iteration in partially observable Markov decision processes. JAIR, 14:29–51, 2001.

[4] J. Pineau, G. Gordon, and S. Thrun. Point-based value iteration: An anytime algorithm for POMDPs. In International Joint Conference on Artiﬁcial Intelligence (IJCAI), 2003.

[5] T. Smith and R. Simmons. Heuristic search value iteration for POMDPs. In Uncertainty in Artiﬁcial Intelligence, 2004.

[6] M.T.J. Spaan and Nikos V. A point-based POMDP algorithm for robot planning. In Proceedings of the IEEE International Conference on Robotics and Automation, pages 2399–2404, New Orleans, Louisiana, April 2004.

[7] S. Thrun. Monte Carlo POMDPs. In S.A. Solla, T.K. Leen, and K.-R. M¨ ller, editors, Advances u in Neural Information Processing 12, pages 1064–1070. MIT Press, 2000.

[8] C. G. Atkeson. Using local trajectory optimizers to speed up global optimization in dynamic programming. In Jack D. Cowan, Gerald Tesauro, and Joshua Alspector, editors, Advances in Neural Information Processing Systems, volume 6, pages 663–670. Morgan Kaufmann Publishers, Inc., 1994.

[9] F. L. Lewis and V. L. Syrmos. Optimal Control, 2nd Edition. Wiley-Interscience, 1995.

[10] C. Szepesv´ ri. Efﬁcient approximate planning in continuous space Markovian decision proba lems. AI Communications, 13(3):163–176, 2001.

[11] J. N. Tsitsiklis and Van B. Roy. Regression methods for pricing complex American-style options. IEEE-NN, 12:694–703, July 2001.

[12] V. D. Blondel and J. N. Tsitsiklis. A survey of computational complexity results in systems and control, 2000.

[13] R. Munos and A. W. Moore. Variable resolution discretization in optimal control. Machine Learning Journal, 49:291–323, 2002.

[14] S. M. LaValle. Planning Algorithms. Cambridge University Press, 2006.

[15] C. G. Atkeson. Randomly sampling actions in dynamic programming. In 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning (ADPRL), 2007.

[16] C. G. Atkeson and J. Morimoto. Nonparametric representation of a policies and value functions: A trajectory based approach. In Advances in Neural Information Processing Systems 15. MIT Press, 2003.

[17] P. Dyer and S. R. McReynolds. The Computation and Theory of Optimal Control. Academic Press, New York, NY, 1970.

[18] D. H. Jacobson and D. Q. Mayne. Differential Dynamic Programming. Elsevier, New York, NY, 1970.

[19] C. G. Atkeson and B. Stephens. Multiple balance strategies from one optimization criterion. In Humanoids, 2007. 8