nips nips2000 nips2000-105 nips2000-105-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: David Andre, Stuart J. Russell
Abstract: We present an expressive agent design language for reinforcement learning that allows the user to constrain the policies considered by the learning process.The language includes standard features such as parameterized subroutines, temporary interrupts, aborts, and memory variables, but also allows for unspecified choices in the agent program. For learning that which isn't specified, we present provably convergent learning algorithms. We demonstrate by example that agent programs written in the language are concise as well as modular. This facilitates state abstraction and the transferability of learned skills.
[1] D. Andre. Programmable HAMs. www.cs.berkeley.edwdandre/pham.ps. 2000.
[2] S. Benson and N. Nilsson. Reacting, planning and learning in an autonomous agent. In K. Furukawa, D. Michie, and S. Muggleton, editors, Machine Intelligence 14. 1995.
[3] G. Berry and G. Gonthier. The Esterel synchronous programming language: Design, semantics, implementation. Science oj Computer Programming, 19(2):87-152, 1992.
[4] T. G. Dietterich. State abstraction in MAXQ hierarchical RL. In NIPS 12, 2000.
[5] R.I. Firby. Modularity issues in reactive planning. In AlPS 96, pages 78-85. AAAI Press, 1996.
[6] L. P. Kaelbling, M. L. Littman, and A. W. Moore. Reinforcement learning: A survey. lAIR, 4:237-285, 1996.
[7] N. I. Nilsson. Teleo-reactive programs for agent control. lAIR, 1:139-158, 1994.
[8] R. Parr and S. I. Russell. Reinforcement learning with hierarchies of machines. In NIPS 10, 1998.
[9] R. Parr. Hierarchical Control and Learning jor MDPs. PhD thesis, UC Berkeley, 1998.
[10] L. Peshkin, N. Meuleau, and L. Kaelbling. Learning policies with external memory. In ICML, 1999.
[11] R. Sutton. Temporal abstraction in reinforcement learning. In ICML, 1995.
[12] R. Sutton, D. Precup, and S. Singh. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence , 112(1):181- 211 , February 1999.