nips nips2000 nips2000-113 nips2000-113-reference knowledge-graph by maker-knowledge-mining

113 nips-2000-Robust Reinforcement Learning

Source: pdf

Author: Jun Morimoto, Kenji Doya

Abstract: This paper proposes a new reinforcement learning (RL) paradigm that explicitly takes into account input disturbance as well as modeling errors. The use of environmental models in RL is quite popular for both off-line learning by simulations and for on-line action planning. However, the difference between the model and the real environment can lead to unpredictable, often unwanted results. Based on the theory of H oocontrol, we consider a differential game in which a 'disturbing' agent (disturber) tries to make the worst possible disturbance while a 'control' agent (actor) tries to make the best control input. The problem is formulated as finding a minmax solution of a value function that takes into account the norm of the output deviation and the norm of the disturbance. We derive on-line learning algorithms for estimating the value function and for calculating the worst disturbance and the best control in reference to the value function. We tested the paradigm, which we call

reference text

[1] A. G . Barto, R. S. Sutton, and C. W. Anderson. Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics, 13:834- 846, 1983.

[2] S. J. Bradtke. Reinforcement learning Applied to Linear Quadratic Regulation. In S. J. Hanson, J. D . Cowan, and C. L. Giles, editors, Advances in Neural Information Processing Systems 5, pages 295- 302. Morgan Kaufmann , San Mateo, CA, 1993.

[3] K. Doya. Reinforcement Learning in Continuous Time and Space. Neural Computation, 12(1):219-245, 2000.

[4] J . Morimoto and K. Doya. Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning. In Proceedings of Seventeenth International Conference on Machine Learning, pages 623- 630, San Francisco, CA, 2000. Morgan Kaufmann.

[5] S. Weiland. Linear Quadratic Games, H co , and the Riccati Equation. In Proceedings of the Workshop on the Riccati Equation in Control, Systems, and Signals, pages 156- 159. 1989.

[6] K. Zhou, J . C. Doyle, and K. Glover. Robust Optimal Control. PRENTICE HALL, New J ersey, 1996.