nips nips2001 nips2001-13 nips2001-13-reference knowledge-graph by maker-knowledge-mining

13 nips-2001-A Natural Policy Gradient

Source: pdf

Author: Sham M. Kakade

Abstract: We provide a natural gradient method that represents the steepest descent direction based on the underlying structure of the parameter space. Although gradient methods cannot make large changes in the values of the parameters, we show that the natural gradient is moving toward choosing a greedy optimal action rather than just a better action. These greedy optimal actions are those that would be chosen under one improvement step of policy iteration with approximate, compatible value functions, as defined by Sutton et al. [9]. We then show drastic performance improvements in simple MDPs and in the more challenging MDP of Tetris. 1

reference text

[I] S. Amari. Natural gradient works efficiently in learning. 10(2):251- 276, 1998. Neural Computation,

[2] J. Baxter and P. Bartlett. Direct gradient-based reinforcement learning. Technical report, Australian National University, Research School of Information Sciences and Engineering, July 1999.

[3] D. P. Bertsekas and J. N. Tsitsiklis. Neuro-Dynamic Programming. Athena Scientific, 1996.

[4] P. Dayan and G. Hinton. Using em for reinforcement learning. Neural Computation, 9:271- 278 , 1997.

[5] S. Kakade. Optimizing average reward using discounted reward. COLT. in press., 200l.

[6] V. Konda and J. Tsitsiklis. Actor-critic algorithms. Advances in N eural Information Processing Systems, 12, 2000.

[7] D . MacKay. Maximum likelihood and covariant algorithms for independent component analysis. Technical report , University of Cambridge, 1996.

[8] P. Marbach and J . Tsitsiklis. Simulation-based optimization of markov reward processes. Technical report, Massachusetts Institute of Technology, 1998.

[9] R. Sutton, D. McAllester, S. Singh, and Y. Mansour. Policy gradient methods for reinforcement learning with function approximation. Neural Information Processing Systems, 13, 2000.

[10] L. Xu and M. 1. Jordan. On convergence properties of the EM algorithm for gaussian mixtures. Neural Computation, 8(1):129- 151, 1996.