nips nips2007 nips2007-98 nips2007-98-reference knowledge-graph by maker-knowledge-mining

98 nips-2007-Hierarchical Apprenticeship Learning with Application to Quadruped Locomotion

Source: pdf

Author: J. Z. Kolter, Pieter Abbeel, Andrew Y. Ng

Abstract: We consider apprenticeship learning—learning from expert demonstrations—in the setting of large, complex domains. Past work in apprenticeship learning requires that the expert demonstrate complete trajectories through the domain. However, in many problems even an expert has difﬁculty controlling the system, which makes this approach infeasible. For example, consider the task of teaching a quadruped robot to navigate over extreme terrain; demonstrating an optimal policy (i.e., an optimal set of foot locations over the entire terrain) is a highly non-trivial task, even for an expert. In this paper we propose a method for hierarchical apprenticeship learning, which allows the algorithm to accept isolated advice at different hierarchical levels of the control task. This type of advice is often feasible for experts to give, even if the expert is unable to demonstrate complete trajectories. This allows us to extend the apprenticeship learning paradigm to much larger, more challenging domains. In particular, in this paper we apply the hierarchical apprenticeship learning algorithm to the task of quadruped locomotion over extreme terrain, and achieve, to the best of our knowledge, results superior to any previously published work. 1

reference text

[1] Pieter Abbeel and Andrew Y. Ng. Apprenticeship learning via inverse reinforcement learning. In Proceedings of the International Conference on Machine Learning, 2004.

[2] Andrew G. Barto and Sridhar Mahadevan. Recent advances in hierarchical reinforcement learning. Discrete Event Dynamic Systems: Theory and Applications, 13:41–77, 2003.

[3] Joel Chestnutt, James Kuffner, Koichi Nishiwaki, and Satoshi Kagami. Planning biped navigation strategies in complex environments. In Proceedings of the International Conference on Humanoid Robotics, 2003.

[4] Thomas G. Dietterich. Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artiﬁcial Intelligence Research, 13:227–303, 2000.

[5] Nicholas K. Jong and Peter Stone. State abstraction discovery from irrelevant state variables. In Proceedings of the International Joint Conference on Artiﬁcial Intelligence, 2005.

[6] H. Kim, T. Kang, V. G. Loc, and H. R. Choi. Gait planning of quadruped walking and climbing robot for locomotion in 3D environment. In Proceedings of the International Conference on Robotics and Automation, 2005.

[7] Nate Kohl and Peter Stone. Machine learning for fast quadrupedal locomotion. In Proceedings of AAAI, 2004.

[8] J. Zico Kolter, Mike P. Rodgers, and Andrew Y. Ng. A complete control architecture for quadruped locomotion over rough terrain. In Proceedings of the International Conference on Robotics and Automation (to appear), 2008.

[9] Honglak Lee, Yirong Shen, Chih-Han Yu, Gurjeet Singh, and Andrew Y. Ng. Quadruped robot obstacle negotiation via reinforcement learning. In Proceedings of the International Conference on Robotics and Automation, 2006.

[10] Jun Morimoto and Christopher G. Atkeson. Minimax differential dynamic programming: An application to robust biped walking. In Neural Information Processing Systems 15, 2002.

[11] Gergeley Neu and Csaba Szepesv´ ri. Apprenticeship learning using inverse reinforcement learning and a gradient methods. In Proceedings of Uncertainty in Artiﬁcial Intelligence, 2007.

[12] Ronald Parr and Stuart Russell. Reinforcement learning with hierarchcies of machines. In Neural Information Processing Systems 10, 1998.

[13] Nathan Ratliff, J. Andrew Bagnell, and Martin Zinkevich. Maximum margin planning. In Proceedings of the International Conference on Machine Learning, 2006.

[14] Nathan Ratliff, David Bradley, J. Andrew Bagnell, and Joel Chestnutt. Boosting structured prediction for imitation learning. In Neural Information Processing Systems 19, 2007.

[15] Richard S. Sutton, Doina Precup, and Satinder Singh. Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning. Artiﬁcial Intelligence, 112:181–211, 1999.

[16] Ben Taskar, Vassil Chatalbashev, Daphne Koller, and Carlos Guestrin. Learning structured prediction models: A large margin approach. In Proceedings of the International Conference on Machine Learning, 2005.

[17] I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun. Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research, 6:1453–1484, 2005. 8