nips nips2010 nips2010-208 nips2010-208-reference knowledge-graph by maker-knowledge-mining

208 nips-2010-Policy gradients in linearly-solvable MDPs

Source: pdf

Author: Emanuel Todorov

Abstract: We present policy gradient results within the framework of linearly-solvable MDPs. For the ﬁrst time, compatible function approximators and natural policy gradients are obtained by estimating the cost-to-go function, rather than the (much larger) state-action advantage function as is necessary in traditional MDPs. We also develop the ﬁrst compatible function approximators and natural policy gradients for continuous-time stochastic systems.

reference text

[1] S. Amari. Natural gradient works efﬁciently in learning. Neural Computation, 10:251–276, 1998.

[2] J. Bagnell and J. Schneider. Covariant policy search. In International Joint Conference on Artiﬁcial Intelligence, 2003.

[3] J. Boyan. Least-squares temporal difference learning. In International Conference on Machine Learning, 1999.

[4] W. Fleming and S. Mitter. Optimal control and nonlinear ﬁltering for nondegenerate diffusion processes. Stochastics, 8:226–261, 1982.

[5] S. Kakade. A natural policy gradient. In Advances in Neural Information Processing Systems, 2002.

[6] S. Kakade. On the Sample Complexity of Reinforcement Learning. PhD thesis, University College London, 2003.

[7] H. Kappen. Linear theory for control of nonlinear stochastic systems. Physical Review Letters, 95, 2005.

[8] V. Konda and J. Tsitsiklis. Actor-critic algorithms. SIAM Journal on Control and Optimization, pages 1008–1014, 2001.

[9] R. Munos. Policy gradient in continuous time. The Journal of Machine Learning Research, 7:771–791, 2006.

[10] B. Oksendal. Stochastic Differential Equations (4th Ed). Springer-Verlag, Berlin, 1995.

[11] J. Peters and S. Schaal. Natural actor-critic. Neurocomputing, 71:1180–1190, 2008.

[12] M. Schmidt. minfunc. online material, 2005.

[13] R. Stengel. Optimal Control and Estimation. Dover, New York, 1994.

[14] R. Sutton, D. Mcallester, S. Singh, and Y. Mansour. Policy gradient methods for reinforcement learning with function approximation. In Advances in Neural Information Processing Systems, 2000.

[15] E. Todorov. Linearly-solvable Markov decision problems. Advances in Neural Information Processing Systems, 2006.

[16] E. Todorov. Efﬁcient computation of optimal actions. PNAS, 106:11478–11483, 2009.

[17] E. Todorov. Eigen-function approximation methods for linearly-solvable optimal control problems. IEEE ADPRL, 2009.

[18] R. Williams. Simple statistical gradient following algorithms for connectionist reinforcement learning. Machine Learning, pages 229–256, 1992. 9