nips nips2010 nips2010-93 nips2010-93-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Sergey Levine, Zoran Popovic, Vladlen Koltun
Abstract: The goal of inverse reinforcement learning is to find a reward function for a Markov decision process, given example traces from its optimal policy. Current IRL techniques generally rely on user-supplied features that form a concise basis for the reward. We present an algorithm that instead constructs reward features from a large collection of component features, by building logical conjunctions of those component features that are relevant to the example policy. Given example traces, the algorithm returns a reward function as well as the constructed features. The reward function can be used to recover a full, deterministic, stationary policy, and the features can be used to transplant the reward function into any novel environment on which the component features are well defined. 1
[1] P. Abbeel and A. Y. Ng. Apprenticeship learning via inverse reinforcement learning. In ICML ’04: Proceedings of the 21st International Conference on Machine Learning. ACM, 2004.
[2] C. L. Baker, J. B. Tenenbaum, and R. R. Saxe. Goal inference as inverse planning. In Proceedings of the 29th Annual Conference of the Cognitive Science Society, 2007.
[3] L. Breiman, J. Friedman, R. Olshen, and C. Stone. Classification and Regression Trees. Wadsworth and Brooks, Monterey, CA, 1984.
[4] P. Dayan and B. W. Balleine. Reward, motivation, and reinforcement learning. Neuron, 36(2):285–298, 2002.
[5] D. P. de Farias and B. Van Roy. The linear programming approach to approximate dynamic programming. Operations Research, 51(6):850–865, 2003.
[6] M. Grant and S. Boyd. CVX: Matlab Software for Disciplined Convex Programming (web page and software), 2008. http://stanford.edu/˜boyd/cvx.
[7] A. Y. Ng and S. J. Russell. Algorithms for inverse reinforcement learning. In ICML ’00: Proceedings of the 17th International Conference on Machine Learning, pages 663–670. Morgan Kaufmann Publishers Inc., 2000.
[8] D. Ramachandran and E. Amir. Bayesian inverse reinforcement learning. In IJCAI’07: Proceedings of the 20th International Joint Conference on Artifical Intelligence, pages 2586–2591. Morgan Kaufmann Publishers Inc., 2007.
[9] N. D. Ratliff, J. A. Bagnell, and M. A. Zinkevich. Maximum margin planning. In ICML ’06: Proceedings of the 23rd International Conference on Machine Learning, pages 729–736. ACM, 2006.
[10] U. Syed, M. Bowling, and R. E. Schapire. Apprenticeship learning using linear programming. In ICML ’08: Proceedings of the 25th International Conference on Machine Learning, pages 1032–1039. ACM, 2008. 9