nips nips2003 nips2003-34 nips2003-34-reference knowledge-graph by maker-knowledge-mining

34 nips-2003-Approximate Policy Iteration with a Policy Language Bias

Source: pdf

Author: Alan Fern, Sungwook Yoon, Robert Givan

Abstract: We explore approximate policy iteration, replacing the usual costfunction learning step with a learning step in policy space. We give policy-language biases that enable solution of very large relational Markov decision processes (MDPs) that no previous technique can solve. In particular, we induce high-quality domain-speciﬁc planners for classical planning domains (both deterministic and stochastic variants) by solving such domains as extremely large MDPs. 1

reference text

[1] Ricardo Aler, Daniel Borrajo, and Pedro Isasi. Using genetic programming to learn and improve control knowledge. AIJ, 141(1-2):29–56, 2002.

[2] Fahiem Bacchus. The AIPS ’00 planning competition. AI Magazine, 22(3)(3):57–62, 2001.

[3] Fahiem Bacchus and Froduald Kabanza. Using temporal logics to express search control knowledge for planning. AIJ, 16:123–191, 2000.

[4] R. Bellman. Dynamic Programming. Princeton University Press, 1957.

[5] D. P. Bertsekas and J. N. Tsitsiklis. Neuro-Dynamic Programming. Athena Scientiﬁc, 1996.

[6] Craig Boutilier and Richard Dearden. Approximating value trees in structured dynamic programming. In Lorenza Saitta, editor, ICML, 1996.

[7] Craig Boutilier, Richard Dearden, and Moises Goldszmidt. Stochastic dynamic programming with factored representations. AIJ, 121(1-2):49–107, 2000.

[8] Craig Boutilier, Raymond Reiter, and Bob Price. Symbolic dynamic programming for ﬁrstorder MDPs. In IJCAI, 2001.

[9] S. Dzeroski, L. DeRaedt & K. Driessens. Relational reinforcement learning. MLJ, 43:7–52, 2001.

[10] Tara A. Estlin and Raymond J. Mooney. Multi-strategy learning of search control for partialorder planning. In AAAI, 1996.

[11] Robert Givan, Thomas Dean, and Matt Greig. Equivalence notions and model minimization in Markov decision processes. AIJ, 147(1-2):163–223, 2003.

[12] Carlos Guestrin, Daphne Koller, and Ronald Parr. Max-norm projections for factored MDPs. In IJCAI, pages 673–680, 2001.

[13] Jorg Hoffmann and Bernhard Nebel. The FF planning system: Fast plan generation through heuristic search. JAIR, 14:263–302, 2001.

[14] R. Howard. Dynamic Programming and Markov Decision Processes. MIT Press, 1960.

[15] Yi-Cheng Huang, Bart Selman, and Henry Kautz. Learning declarative control rules for constraint-based planning. In ICML, pages 415–422, 2000.

[16] Michael J. Kearns, Yishay Mansour, and Andrew Y. Ng. A sparse sampling algorithm for nearoptimal planning in large markov decision processes. MLJ, 49(2–3):193–208, 2002.

[17] Roni Khardon. Learning action strategies for planning domains. AIJ, 113(1-2):125–148, 1999.

[18] M. Lagoudakis and R. Parr. Reinforcement learning as classiﬁcation: Leveraging modern classiﬁers. In ICML, 2003.

[19] Mario Martin and Hector Geffner. Learning generalized policies in planning domains using concept languages. In KRR, 2000.

[20] D. McAllester & R. Givan. Taxonomic syntax for 1st-order inference. JACM, 40:246–83, 1993.

[21] S. Minton. Quantitative results on the utility of explanation-based learning. In AAAI, 1988.

[22] S. Minton, editor. Machine Learning Methods for Planning. Morgan Kaufmann, 1993.

[23] S. Minton, J. Carbonell, C. A. Knoblock, D. R. Kuokka, O. Etzioni, and Y. Gil. Explanationbased learning: A problem solving perspective. AIJ, 40:63–118, 1989.

[24] G. Tesauro. Practical issues in temporal difference learning. MLJ, 8:257–277, 1992.

[25] G. Tesauro & G. Galperin. Online policy improvement via monte-carlo search. In NIPS, 1996.

[26] J. Tsitsiklis and B. Van Roy. Feature-based methods for large scale DP. MLJ, 22:59–94, 1996.

[27] M. Veloso, J. Carbonell, A. Perez, D. Borrajo, E. Fink, and J. Blythe. Integrating planning and learning: The PRODIGY architecture. Journal of Experimental and Theoretical AI, 7(1), 1995.

[28] S. Yoon, A. Fern, and R. Givan. Inductive policy selection for ﬁrst-order MDPs. In UAI, 2002. 13 The most complex blocks-world goal for RRL was to achieve on(A, B) in an n block environment. We consider blocks-world goals that involve all n blocks.