nips nips2011 nips2011-190 nips2011-190-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Sergey Levine, Zoran Popovic, Vladlen Koltun
Abstract: We present a probabilistic algorithm for nonlinear inverse reinforcement learning. The goal of inverse reinforcement learning is to learn the reward function in a Markov decision process from expert demonstrations. While most prior inverse reinforcement learning algorithms represent the reward as a linear combination of a set of features, we use Gaussian processes to learn the reward as a nonlinear function, while also determining the relevance of each feature to the expert’s policy. Our probabilistic algorithm allows complex behaviors to be captured from suboptimal stochastic demonstrations, while automatically balancing the simplicity of the learned reward structure against its consistency with the observed actions. 1
[1] P. Abbeel and A. Y. Ng. Apprenticeship learning via inverse reinforcement learning. In ICML ’04: Proceedings of the 21st International Conference on Machine Learning, 2004.
[2] M. P. Deisenroth, C. E. Rasmussen, and J. Peters. Gaussian process dynamic programming. Neurocomputing, 72(7–9):1508–1524, 2009.
[3] K. Dvijotham and E. Todorov. Inverse optimal control with linearly-solvable MDPs. In ICML ’10: Proceedings of the 27th International Conference on Machine Learning, pages 335–342, 2010.
[4] Y. Engel, S. Mannor, and R. Meir. Reinforcement learning with Gaussian processes. In ICML ’05: Proceedings of the 22nd International Conference on Machine learning, pages 201–208, 2005.
[5] S. Levine, Z. Popovi´ , and V. Koltun. Feature construction for inverse reinforcement learning. c In Advances in Neural Information Processing Systems 23. 2010.
[6] G. Neu and C. Szepesv´ ri. Apprenticeship learning using inverse reinforcement learning and a gradient methods. In Uncertainty in Artificial Intelligence (UAI), 2007.
[7] A. Y. Ng and S. J. Russell. Algorithms for inverse reinforcement learning. In ICML ’00: Proceedings of the 17th International Conference on Machine Learning, pages 663–670, 2000.
[8] J. Qui˜ onero Candela and C. E. Rasmussen. A unifying view of sparse approximate Gaussian n process regression. Journal of Machine Learning Research, 6:1939–1959, 2005.
[9] D. Ramachandran and E. Amir. Bayesian inverse reinforcement learning. In IJCAI’07: Proceedings of the 20th International Joint Conference on Artifical Intelligence, pages 2586–2591, 2007.
[10] C. E. Rasmussen and M. Kuss. Gaussian processes in reinforcement learning. In Advances in Neural Information Processing Systems 16, 2003.
[11] C. E. Rasmussen and C. K. I. Williams. Gaussian Processes for Machine Learning. The MIT Press, 2005.
[12] N. Ratliff, J. A. Bagnell, and M. A. Zinkevich. Maximum margin planning. In ICML ’06: Proceedings of the 23rd International Conference on Machine Learning, pages 729–736, 2006.
[13] N. Ratliff, D. Bradley, J. A. Bagnell, and J. Chestnutt. Boosting structured prediction for imitation learning. In Advances in Neural Information Processing Systems 19, 2007.
[14] N. Ratliff, D. Silver, and J. A. Bagnell. Learning to search: Functional gradient techniques for imitation learning. Autonomous Robots, 27(1):25–53, 2009.
[15] U. Syed and R. Schapire. A game-theoretic approach to apprenticeship learning. In Advances in Neural Information Processing Systems 20, 2008.
[16] B. D. Ziebart. Modeling Purposeful Adaptive Behavior with the Principle of Maximum Causal Entropy. PhD thesis, Carnegie Mellon University, 2010.
[17] B. D. Ziebart, A. Maas, J. A. Bagnell, and A. K. Dey. Maximum entropy inverse reinforcement learning. In AAAI Conference on Artificial Intelligence (AAAI 2008), pages 1433–1438, 2008. 9