nips nips2008 nips2008-181 nips2008-181-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Jens Kober, Jan R. Peters
Abstract: Many motor skills in humanoid robotics can be learned using parametrized motor primitives as done in imitation learning. However, most interesting motor learning problems are high-dimensional reinforcement learning problems often beyond the reach of current methods. In this paper, we extend previous work on policy learning from the immediate reward case to episodic reinforcement learning. We show that this results in a general, common framework also connected to policy gradient methods and yielding a novel algorithm for policy learning that is particularly well-suited for dynamic motor primitives. The resulting algorithm is an EM-inspired algorithm applicable to complex motor learning tasks. We compare this algorithm to several well-known parametrized policy search methods and show that it outperforms them. We apply it in the context of motor learning and show that it can learn a complex Ball-in-a-Cup task using a real Barrett WAMTM robot arm. 1
[1] R. Sutton and A. Barto. Reinforcement Learning. MIT Press, 1998.
[2] J. Bagnell, S. Kadade, A. Ng, and J. Schneider. Policy search by dynamic programming. In Advances in Neural Information Processing Systems (NIPS), 2003.
[3] A. Ng and M. Jordan. PEGASUS: A policy search method for large MDPs and POMDPs. In International Conference on Uncertainty in Artificial Intelligence (UAI), 2000.
[4] F. Guenter, M. Hersch, S. Calinon, and A. Billard. Reinforcement learning for imitating constrained reaching movements. RSJ Advanced Robotics, 21, 1521-1544, 2007.
[5] M. Toussaint and C. Goerick. Probabilistic inference for structured planning in robotics. In International Conference on Intelligent Robots and Systems (IROS), 2007.
[6] M. Hoffman, A. Doucet, N. de Freitas, and A. Jasra. Bayesian policy learning with transdimensional MCMC. In Advances in Neural Information Processing Systems (NIPS), 2007.
[7] R. J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8:229–256, 1992.
[8] R. S. Sutton, D. McAllester, S. Singh, and Y. Mansour. Policy gradient methods for reinforcement learning with function approximation. In Advances in Neural Information Processing Systems (NIPS), 2000.
[9] J. Bagnell and J. Schneider. Covariant policy search. In International Joint Conference on Artificial Intelligence (IJCAI), 2003.
[10] J. Peters and S. Schaal. Policy gradient methods for robotics. In International Conference on Intelligent Robots and Systems (IROS), 2006.
[11] G. Lawrence, N. Cowan, and S. Russell. Efficient gradient estimation for motor control learning. In International Conference on Uncertainty in Artificial Intelligence (UAI), 2003.
[12] H. Attias. Planning by probabilistic inference. In Ninth International Workshop on Artificial Intelligence and Statistics (AISTATS), 2003.
[13] J. Binder, D. Koller, S. Russell, and K. Kanazawa. Adaptive probabilistic networks with hidden variables. Machine Learning, 29:213–244, 1997.
[14] G. Wulf. Attention and motor skill learning. Human Kinetics, Champaign, IL, 2007.
[15] D. E. Kirk. Optimal control theory. Prentice-Hall, Englewood Cliffs, New Jersey, 1970.
[16] G. J. McLachan and T. Krishnan. The EM Algorithm and Extensions. Wiley Series in Probability and Statistics. John Wiley & Sons, 1997.
[17] P. Dayan and G. E. Hinton. Using expectation-maximization for reinforcement learning. Neural Computation, 9(2):271–278, 1997.
[18] J. Peters and S. Schaal. Reinforcement learning by reward-weighted regression for operational space control. In International Conference on Machine Learning (ICML), 2007.
[19] T. Rückstieß, M. Felder, and J. Schmidhuber. State-dependent exploration for policy gradient methods. In European Conference on Machine Learning (ECML), 2008.
[20] M. Kawato, F. Gandolfo, H. Gomi, and Y. Wada. Teaching by showing in kendama based on optimization principle. In International Conference on Artificial Neural Networks, 1994.
[21] C. G. Atkeson. Using local trajectory optimizers to speed up global optimization in dynamic programming. In Advances in Neural Information Processing Systems (NIPS), 1994.
[22] A. Ijspeert, J. Nakanishi, and S. Schaal. Learning attractor landscapes for learning motor primitives. In Advances in Neural Information Processing Systems (NIPS), 2003.
[23] S. Schaal, P. Mohajerian, and A. Ijspeert. Dynamics systems vs. optimal control — a unifying view. Progress in Brain Research, 165(1):425–445, 2007.
[24] Wikipedia, May 31, 2008. http://en.wikipedia.org/wiki/Ball_in_a_cup
[25] J. Kober, B. Mohler, and J. Peters. Learning perceptual coupling for motor primitives. In International Conference on Intelligent RObots and Systems (IROS), 2008. 8