nips nips2007 nips2007-163 nips2007-163-reference knowledge-graph by maker-knowledge-mining

163 nips-2007-Receding Horizon Differential Dynamic Programming

Source: pdf

Author: Yuval Tassa, Tom Erez, William D. Smart

Abstract: The control of high-dimensional, continuous, non-linear dynamical systems is a key problem in reinforcement learning and control. Local, trajectory-based methods, using techniques such as Differential Dynamic Programming (DDP), are not directly subject to the curse of dimensionality, but generate only local controllers. In this paper,we introduce Receding Horizon DDP (RH-DDP), an extension to the classic DDP algorithm, which allows us to construct stable and robust controllers based on a library of local-control trajectories. We demonstrate the effectiveness of our approach on a series of high-dimensional problems using a simulated multi-link swimming robot. These experiments show that our approach effectively circumvents dimensionality issues, and is capable of dealing with problems of (at least) 24 state and 9 action dimensions. 1

reference text

[1] Remi Munos and Andrew W. Moore. Variable Resolution Discretization for High-Accuracy Solutions of Optimal Control Problems. In International Joint Conference on Artiﬁcial Intelligence, pages 1348–1355, 1999.

[2] M. Stilman, C. G. Atkeson, J. J. Kuffner, and G. Zeglin. Dynamic programming in reduced dimensional spaces: Dynamic planning for robust biped locomotion. In Proceedings of the 2005 IEEE International Conference on Robotics and Automation (ICRA 2005), pages 2399–2404, 2005.

[3] Christopher G. Atkeson. Using local trajectory optimizers to speed up global optimization in dynamic programming. In NIPS, pages 663–670, 1993.

[4] C. G. Atkeson and J. Morimoto. Non-parametric representation of a policies and value functions: A trajectory based approach. In Advances in Neural Information Processing Systems 15, 2003.

[5] P. Abbeel, A. Coates, M. Quigley, and A. Y. Ng. An application of reinforcement learning to aerobatic helicopter ﬂight. In Advances in Neural Information Processing Systems 19, 2007.

[6] J. Morimoto and C. G. Atkeson. Minimax differential dynamic programming: An application to robust bipedwalking. In Advances in Neural Information Processing Systems 14, 2002.

[7] Emanuel Todorov and Wei-Wei Li. Optimal control methods suitable for biomechanical systems. In 25th Annual Int. Conf. IEE Engineering in Medicine and Biology Society, 2003.

[8] R. Munos. Policy gradient in continuous time. Journal of Machine Learning Research, 7:771–791, 2006.

[9] J. Peters and S. Schaal. Reinforcement learning for parameterized motor primitives. In Proceedings of the IEEE International Joint Conference on Neural Networks (IJCNN 2006), 2006.

[10] Tom Erez and William D. Smart. Bipedal walking on rough terrain using manifold control. In IEEE/RSJ International Conference on Robots and Systems (IROS), 2007.

[11] A. Crespi and A. Ijspeert. AmphiBot II: An amphibious snake robot that crawls and swims using a central pattern generator. In Proceedings of the 9th International Conference on Climbing and Walking Robots (CLAWAR 2006), pages 19–27, 2006.

[12] D. Q. Mayne. A second order gradient method for determining optimal trajectories for non-linear discretetime systems. International Journal of Control, 3:85–95, 1966.

[13] D. H. Jacobson and D. Q. Mayne. Differential Dynamic Programming. Elsevier, 1970.

[14] L.-Z. Liao and C. A. Shoemaker. Convergence in unconstrained discrete-time differential dynamic programming. IEEE Transactions on Automatic Control, 36(6):692–706, 1991.

[15] S. Yakowitz. Algorithms and computational techniques in differential dynamic programming. Control and Dynamic Systems: Advances in Theory and Applications, 31:75–91, 1989.

[16] L.-Z. Liao and C. A. Shoemaker. Advantages of differential dynamic programming over newton’s method for discrete-time optimal control problems. Technical Report 92-097, Cornell Theory Center, 1992.

[17] E. Todorov. Iterative local dynamic programming. Manuscript under review, available at www.cogsci.ucsd.edu/∼todorov/papers/ildp.pdf, 2007.

[18] S. J. Julier and J. K. Uhlmann. A new extension of the kalman ﬁlter to nonlinear systems. In Proceedings of AeroSense: The 11th Int. Symp. on Aerospace/Defence Sensing, Simulation and Controls, 1997.

[19] C. E. Garcia, D. M. Prett, and M. Morari. Model predictive control: theory and practice. Automatica, 25: 335–348, 1989.

[20] M. Stolle and C. G. Atkeson. Policies based on trajectory libraries. In Proceedings of the International Conference on Robotics and Automation (ICRA 2006), 2006.

[21] R. Coulom. Reinforcement Learning Using Neural Networks, with Applications to Motor Control. PhD thesis, Institut National Polytechnique de Grenoble, 2002. 8