nips nips2010 nips2010-208 nips2010-208-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Emanuel Todorov
Abstract: We present policy gradient results within the framework of linearly-solvable MDPs. For the first time, compatible function approximators and natural policy gradients are obtained by estimating the cost-to-go function, rather than the (much larger) state-action advantage function as is necessary in traditional MDPs. We also develop the first compatible function approximators and natural policy gradients for continuous-time stochastic systems.
[1] S. Amari. Natural gradient works efficiently in learning. Neural Computation, 10:251–276, 1998.
[2] J. Bagnell and J. Schneider. Covariant policy search. In International Joint Conference on Artificial Intelligence, 2003.
[3] J. Boyan. Least-squares temporal difference learning. In International Conference on Machine Learning, 1999.
[4] W. Fleming and S. Mitter. Optimal control and nonlinear filtering for nondegenerate diffusion processes. Stochastics, 8:226–261, 1982.
[5] S. Kakade. A natural policy gradient. In Advances in Neural Information Processing Systems, 2002.
[6] S. Kakade. On the Sample Complexity of Reinforcement Learning. PhD thesis, University College London, 2003.
[7] H. Kappen. Linear theory for control of nonlinear stochastic systems. Physical Review Letters, 95, 2005.
[8] V. Konda and J. Tsitsiklis. Actor-critic algorithms. SIAM Journal on Control and Optimization, pages 1008–1014, 2001.
[9] R. Munos. Policy gradient in continuous time. The Journal of Machine Learning Research, 7:771–791, 2006.
[10] B. Oksendal. Stochastic Differential Equations (4th Ed). Springer-Verlag, Berlin, 1995.
[11] J. Peters and S. Schaal. Natural actor-critic. Neurocomputing, 71:1180–1190, 2008.
[12] M. Schmidt. minfunc. online material, 2005.
[13] R. Stengel. Optimal Control and Estimation. Dover, New York, 1994.
[14] R. Sutton, D. Mcallester, S. Singh, and Y. Mansour. Policy gradient methods for reinforcement learning with function approximation. In Advances in Neural Information Processing Systems, 2000.
[15] E. Todorov. Linearly-solvable Markov decision problems. Advances in Neural Information Processing Systems, 2006.
[16] E. Todorov. Efficient computation of optimal actions. PNAS, 106:11478–11483, 2009.
[17] E. Todorov. Eigen-function approximation methods for linearly-solvable optimal control problems. IEEE ADPRL, 2009.
[18] R. Williams. Simple statistical gradient following algorithms for connectionist reinforcement learning. Machine Learning, pages 229–256, 1992. 9