nips nips2000 nips2000-1 nips2000-1-reference knowledge-graph by maker-knowledge-mining

1 nips-2000-APRICODD: Approximate Policy Construction Using Decision Diagrams

Source: pdf

Author: Robert St-Aubin, Jesse Hoey, Craig Boutilier

Abstract: We propose a method of approximate dynamic programming for Markov decision processes (MDPs) using algebraic decision diagrams (ADDs). We produce near-optimal value functions and policies with much lower time and space requirements than exact dynamic programming. Our method reduces the sizes of the intermediate value functions generated during value iteration by replacing the values at the terminals of the ADD with ranges of values. Our method is demonstrated on a class of large MDPs (with up to 34 billion states), and we compare the results with the optimal value functions.

reference text

[1] R. Iris Bahar, Erica A. Frohm, Charles M. Gaona, Gary D. Hachtel, Enrico Macii, Abelardo Pardo, and Fabio Somenzi. Algebraic decision diagrams and their applications. In International Conference on Computer-Aided Design, pages 188- 191. IEEE, 1993.

[2] Craig Boutilier and Richard Dearden. Approximating value trees in structured dynamic programming. In Proceedings ICML-96, Bari, Italy, 1996.

[3] Craig Boutilier, Richard Dearden, and Moises Goldszmidt. Exploiting structure in policy construction. In Proceedings Fourteenth Inter. Conf on AI (IJCAI-95), 1995.

[4] Randal E. Bryant. Graph-based algorithms for boolean function manipulation. IEEE Transactions on Computers, C-35(8):677--691, 1986.

[5] E. M. Clarke, K. L. McMillan, X. Zhao, M. Fujita, and J. Yang. Spectral transforms for large boolean functions with applications to technology mapping. In DAC, 54-60. ACMIIEEE, 1993.

[6] Thomas Dean and Keiji Kanazawa. A model for reasoning about persistence and causation. Computational Intelligence, 5(3):142- 150, 1989.

[7] Richard Dearden and Craig Boutilier. Abstraction and approximate decision theoretic planning. Artificial Intelligence, 89:219- 283, 1997.

[8] Jesse Hoey, Robert St-Aubin, Alan Hu, and Craig Boutilier. SPUDD: Stochastic planning using decision diagrams. In Proceedings of UAI99, Stockholm, 1999.

[9] Jesse Hoey, Robert St-Aubin, Alan Hu, and Craig Boutilier. Optimal and approximate planning using decision diagrams. Technical Report TR-OO-05 , UBC, June 2000.

[10] Martin L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, New York, NY., 1994.

[11] Satinder P. Singh and Richard C. Yee. An upper bound on the loss from approximate optimalvalue function. Machine Learning, 16:227- 233, 1994.

[12] Fabio Somenzi. CUDD: CU decision ft p : / /vl s i . c o l o r ado. edu/pub /, 1998. diagram package. Available from