nips nips2002 nips2002-205 nips2002-205-reference knowledge-graph by maker-knowledge-mining

205 nips-2002-Value-Directed Compression of POMDPs

Source: pdf

Author: Pascal Poupart, Craig Boutilier

Abstract: We examine the problem of generating state-space compressions of POMDPs in a way that minimally impacts decision quality. We analyze the impact of compressions on decision quality, observing that compressions that allow accurate policy evaluation (prediction of expected future reward) will not affect decision quality. We derive a set of sufﬁcient conditions that ensure accurate prediction in this respect, illustrate interesting mathematical properties these confer on lossless linear compressions, and use these to derive an iterative procedure for ﬁnding good linear lossy compressions. We also elaborate on how structured representations of a POMDP can be used to ﬁnd such compressions.

reference text

[1] C. Boutilier, R. Dearden, and M. Goldszmidt. Stochastic dynamic programming with factored representations. Artiﬁcial Intelligence, 121:49–107, 2000.

[2] C. Boutilier and D. Poole. Computing optimal policies for partially observable decision processes using compact representations. Proc. AAAI-96, pp.1168–1175, Portland, OR, 1996.

[3] R. Givan, T. Dean, and M. Greig. Equivalence notions and model minimization in Markov decision processes. Artiﬁcial Intelligence, to appear, 2002.

[4] C. Guestrin, D. Koller, and R. Parr. Max-norm projections for factored MDPs. Proc. IJCAI-01, pp.673–680, Seattle, WA, 2001.

[5] C. Guestrin, D. Koller, and R. Parr. Solving factored POMDPs with linear value functions. IJCAI-01 Worksh. on Planning under Uncertainty and Inc. Info., Seattle, WA, 2001.

[6] C. Guestrin and D. Ormoneit. Information-theoretic features for reinforcement learning. Unpublished manuscript.

[7] J. Hoey, R. St-Aubin, A. Hu, and C. Boutilier. SPUDD: Stochastic planning using decision diagrams. Proc. UAI-99, pp.279–288, Stockholm, 1999.

[8] M. L. Littman. Memoryless policies: theoretical limitations and practical results. In D. Cliff, P. Husbands, J. Meyer, S. W. Wilson, eds., Proc. 3rd Intl. Conf. Sim. of Adaptive Behavior, Cambridge, 1994. MIT Press.

[9] M. L. Littman, R. S. Sutton, and S. Singh. Predictive representations of state. Proc.NIPS-02, Vancouver, 2001.

[10] R. A. McCallum. Hidden state and reinforcement learning with instance-based state identiﬁcation. IEEE Transations on Systems, Man, and Cybernetics, 26(3):464–473, 1996.

[11] K. Murphy. A survey of POMDP solution techniques. Technical Report, U.C. Berkeley, 2000.

[12] R. Patrascu, P. Poupart, D. Schuurmans, C. Boutilier, C. Guestrin. Greedy linear valueapproximation for factored Markov decision processes. AAAI-02, pp.285–291, Edmonton, 2002.

[13] A. Pfeffer. Sufﬁciency, separability and temporal probabilistic models. Proc. UAI-01, pp.421– 428, Seattle, WA, 2001.

[14] P. Poupart, C. Boutilier, R. Patrascu, and D. Schuurmans. Piecewise linear value function approximation for factored MDPs. AAAI-02, pp.292–299, Edmonton, 2002.

[15] Y. Saad. Iterative Methods for Sparse Linear Systems. PWS, Boston, 1996.

[16] D. Schuurmans and R. Patrascu. Direct value-approximation for factored MDPs. Proc. NIPS01, Vancouver, 2001.

[17] N. Tishby, F. C. Pereira, and W. Bialek. The information bottleneck method. 37th Annual Allerton Conf. on Comm., Contr. and Computing, pp.368–377, 1999.