nips nips2006 nips2006-44 nips2006-44-reference knowledge-graph by maker-knowledge-mining

44 nips-2006-Bayesian Policy Gradient Algorithms

Source: pdf

Author: Mohammad Ghavamzadeh, Yaakov Engel

Abstract: Policy gradient methods are reinforcement learning algorithms that adapt a parameterized policy by following a performance gradient estimate. Conventional policy gradient methods use Monte-Carlo techniques to estimate this gradient. Since Monte Carlo methods tend to have high variance, a large number of samples is required, resulting in slow convergence. In this paper, we propose a Bayesian framework that models the policy gradient as a Gaussian process. This reduces the number of samples needed to obtain accurate gradient estimates. Moreover, estimates of the natural gradient as well as a measure of the uncertainty in the gradient estimates are provided at little extra cost. 1

reference text

[1] R. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8:229–256, 1992.

[2] P. Marbach. Simulated-Based Methods for Markov Decision Processes. PhD thesis, MIT, 1998.

[3] J. Baxter and P. Bartlett. Inﬁnite-horizon policy-gradient estimation. JAIR, 15:319–350, 2001.

[4] R. Sutton, D. McAllester, S. Singh, and Y. Mansour. Policy gradient methods for reinforcement learning with function approximation. In Proceedings of NIPS 12, pages 1057–1063, 2000.

[5] S. Kakade. A natural policy gradient. In Proceedings of NIPS 14, 2002.

[6] J. Bagnell and J. Schneider. Covariant policy search. In Proceedings of the 18th IJCAI, 2003.

[7] J. Peters, S. Vijayakumar, and S. Schaal. Reinforcement learning for humanoid robotics. In Proceedings of the Third IEEE-RAS International Conference on Humanoid Robots, 2003.

[8] J. Berger and R. Wolpert. The Likelihood Principle. Inst. of Mathematical Statistics, Hayward, CA, 1984.

[9] A. O’Hagan. Monte Carlo is fundamentally unsound. The Statistician, 36:247–249, 1987.

[10] A. O’Hagan. Bayes-Hermite quadrature. Journal of Statistical Planning and Inference, 29, 1991.

[11] D. Bertsekas and J. Tsitsiklis. Neuro-Dynamic Programming. Athena Scientiﬁc, 1996.

[12] R. Sutton and A. Barto. An Introduction to Reinforcement Learning. MIT Press, 1998.

[13] T. Jaakkola and D. Haussler. Exploiting generative models in discriminative classiﬁers. In Proceedings of NIPS 11. MIT Press, 1998.

[14] J. Shawe-Taylor and N. Cristianini. Kernel Methods for Pattern Analysis. Cambridge Univ. Press, 2004.

[15] Y. Engel. Algorithms and Representations for Reinforcement Learning. PhD thesis, The Hebrew University of Jerusalem, Israel, 2005.

[16] C. Rasmussen and Z. Ghahramani. Bayesian Monte Carlo. In Proceedings of NIPS 15. MIT Press, 2003.