nips nips2012 nips2012-162 nips2012-162-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Edouard Klein, Matthieu Geist, Bilal Piot, Olivier Pietquin
Abstract: This paper adresses the inverse reinforcement learning (IRL) problem, that is inferring a reward for which a demonstrated expert behavior is optimal. We introduce a new algorithm, SCIRL, whose principle is to use the so-called feature expectation of the expert as the parameterization of the score function of a multiclass classifier. This approach produces a reward function for which the expert policy is provably near-optimal. Contrary to most of existing IRL algorithms, SCIRL does not require solving the direct RL problem. Moreover, with an appropriate heuristic, it can succeed with only trajectories sampled according to the expert behavior. This is illustrated on a car driving simulator. 1
[1] Pieter Abbeel and Andrew Y. Ng. Apprenticeship learning via inverse reinforcement learning. In Proceedings of the 21st International Conference on Machine learning (ICML), 2004.
[2] Dimitri P. Bertsekas and John N. Tsitsiklis. Neuro-Dynamic Programming (Optimization and Neural Computation Series, 3). Athena Scientific, 1996.
[3] Abdeslam Boularias, Jens Kober, and Jan Peters. Relative entropy inverse reinforcement learning. In JMLR Workshop and Conference Proceedings Volume 15: AISTATS 2011, 2011.
[4] Steven J. Bradtke and Andrew G. Barto. Linear Least-Squares algorithms for temporal difference learning. Machine Learning, 22(1-3):33–57, 1996.
[5] Krishnamurthy Dvijotham and Emanuel Todorov. Inverse Optimal Control with LinearlySolvable MDPs. In Proceedings of the 27th International Conference on Machine Learning (ICML), 2010.
[6] Yann Guermeur. VC thoery of large margin multi-category classifiers. Journal of Machine Learning Research, 8:2551–2594, 2007.
[7] Edouard Klein, Matthieu Geist, and Olivier Pietquin. Batch, Off-policy and Model-free Apprenticeship Learning. In Proceedings of the European Workshop on Reinforcement Learning (EWRL), 2011.
[8] Francisco S. Melo and Manuel Lopes. Learning from demonstration using MDP induced metrics. In Proceedings of the European Conference on Machine Learning (ECML), 2010.
[9] Rémi Munos. Performance bounds in Lp norm for approximate value iteration. SIAM journal on control and optimization, 46(2):541–561, 2007.
[10] Gergely Neu and Czaba Szepesvari. Training Parsers by Inverse Reinforcement Learning. Machine Learning, 77(2-3):303–337, 2009.
[11] Andrew Y. Ng and Stuart Russell. Algorithms for Inverse Reinforcement Learning. In Proceedings of 17th International Conference on Machine Learning (ICML), 2000.
[12] Martin L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley-Interscience, 1994.
[13] Nathan Ratliff, Andrew D. Bagnell, and Martin Zinkevich. Maximum Margin Planning. In Proceedings of the 23rd International Conference on Machine Learning (ICML), 2006.
[14] Stuart Russell. Learning agents for uncertain environments (extended abstract). In Proceedings of the 11th annual Conference on Computational Learning Theory (COLT), 1998.
[15] Richard S. Sutton and Andrew G. Barto. Reinforcement Learning: An Introduction. The MIT Press, 3rd edition, March 1998.
[16] Umar Syed and Robert Schapire. A game-theoretic approach to apprenticeship learning. In Advances in Neural Information Processing Systems 20 (NIPS), 2008.
[17] Csaba Szepesvári. Algorithms for Reinforcement Learning. Morgan and Claypool, 2010.
[18] Ben Taskar, Vassil Chatalbashev, Daphne Koller, and Carlos Guestrin. Learning Structured Prediction Models: a Large Margin Approach. In Proceedings of 22nd International Conference on Machine Learning (ICML), 2005. 9