nips nips2010 nips2010-43 nips2010-43-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Abdeslam Boularias, Brahim Chaib-draa
Abstract: We consider the problem of apprenticeship learning where the examples, demonstrated by an expert, cover only a small part of a large state space. Inverse Reinforcement Learning (IRL) provides an efficient tool for generalizing the demonstration, based on the assumption that the expert is maximizing a utility function that is a linear combination of state-action features. Most IRL algorithms use a simple Monte Carlo estimation to approximate the expected feature counts under the expert’s policy. In this paper, we show that the quality of the learned policies is highly sensitive to the error in estimating the feature counts. To reduce this error, we introduce a novel approach for bootstrapping the demonstration by assuming that: (i), the expert is (near-)optimal, and (ii), the dynamics of the system is known. Empirical results on gridworlds and car racing problems show that our approach is able to learn good policies from a small number of demonstrations. 1
Abbeel, Pieter and Ng, Andrew Y. Apprenticeship Learning via Inverse Reinforcement Learning. In Proceedings of the Twenty-first International Conference on Machine Learning (ICML’04), pp. 1–8, 2004. Boularias, Abdeslam and Chaib-draa, Brahim. Apprenticeship Learning via Soft Local Homomorphisms. In Proceedings of 2010 IEEE International Conference on Robotics and Automation (ICRA’10), pp. 2971–2976, 2010. Neu, Gergely and Szepesvri, Csaba. Apprenticeship Learning using Inverse Reinforcement Learning and Gradient Methods. In Conference on Uncertainty in Artificial Intelligence (UAI’07), pp. 295– 302, 2007. Ng, Andrew and Russell, Stuart. Algorithms for Inverse Reinforcement Learning. In Proceedings of the Seventeenth International Conference on Machine Learning (ICML’00), pp. 663–670, 2000. Ramachandran, Deepak and Amir, Eyal. Bayesian Inverse Reinforcement Learning. In Proceedings of The twentieth International Joint Conference on Artificial Intelligence (IJCAI’07), pp. 2586– 2591, 2007. Ratliff, N., Bagnell, J., and Zinkevich, M. Maximum Margin Planning. In Proceedings of the Twenty-third International Conference on Machine Learning (ICML’06), pp. 729–736, 2006. Ratliff, Nathan, Bradley, David, Bagnell, J. Andrew, and Chestnutt, Joel. Boosting Structured Prediction for Imitation Learning. In Advances in Neural Information Processing Systems 19 (NIPS’07), pp. 1153–1160, 2007. Syed, Umar and Schapire, Robert. A Game-Theoretic Approach to Apprenticeship Learning. In Advances in Neural Information Processing Systems 20 (NIPS’08), pp. 1449–1456, 2008. Syed, Umar, Bowling, Michael, and Schapire, Robert E. Apprenticeship Learning using Linear Programming. In Proceedings of the Twenty-fifth International Conference on Machine Learning (ICML’08), pp. 1032–1039, 2008. 9