nips nips2003 nips2003-42 nips2003-42-reference knowledge-graph by maker-knowledge-mining

42 nips-2003-Bounded Finite State Controllers


Source: pdf

Author: Pascal Poupart, Craig Boutilier

Abstract: We describe a new approximation algorithm for solving partially observable MDPs. Our bounded policy iteration approach searches through the space of bounded-size, stochastic finite state controllers, combining several advantages of gradient ascent (efficiency, search through restricted controller space) and policy iteration (less vulnerability to local optima).


reference text

[1] D. Aberdeen and J. Baxter. Scaling internal-state policy-gradient methods for POMDPs. Proc. ICML-02, pp.3–10, Sydney, Australia, 2002.

[2] C. Boutilier and D. Poole. Computing optimal policies for partially observable decision processes using compact representations. Proc. AAAI-96, pp.1168–1175, Portland, OR, 1996.

[3] D. Braziunas. Stochastic local search for POMDP controllers. Master’s thesis, University of Toronto, Toronto, 2003.

[4] A. R. Cassandra, M. L. Littman, and N. L. Zhang. Incremental pruning: A simple, fast, exact method for POMDPs. Proc.UAI-97, pp.54–61, Providence, RI, 1997.

[5] H.-T. Cheng. Algorithms for Partially Observable Markov Decision Processes. PhD thesis, University of British Columbia, Vancouver, 1988.

[6] Z. Feng and E. A. Hansen. Approximate planning for factored POMDPs. Proc. ECP-01, Toledo, Spain, 2001.

[7] E. A. Hansen. Solving POMDPs by searching in policy space. Proc. UAI-98, pp.211–219, Madison, Wisconsin, 1998.

[8] M. Hauskrecht. Value-function approximations for partially observable Markov decision processes. Journal of Artificial Intelligence Research, 13:33–94, 2000.

[9] L. P. Kaelbling, M. Littman, and A. R. Cassandra. Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101:99–134, 1998.

[10] N. Meuleau, K.-E. Kim, L. P. Kaelbling, and A. R. Cassandra. Solving POMDPs by searching the space of finite policies. Proc. UAI-99, pp.417–426, Stockholm, 1999.

[11] N. Meuleau, L. Peshkin, K.-E. Kim, and L. P. Kaelbling. Learning finite-state controllers for partially observable environments. Proc. UAI-99, pp.427–436, Stockholm, 1999.

[12] J. Pineau, G. Gordon, and S. Thrun. Point-based value iteration: an anytime algorithm for POMDPs. In Proc. IJCAI-03, Acapulco, Mexico, 2003.

[13] P. Poupart and C. Boutilier. Value-directed compressions of POMDPs. Proc. NIPS-02, pp.1547– 1554, Vancouver, Canada, 2002.

[14] N. L. Zhang and W. Zhang. Speeding up the convergence of value-iteration in partially observable Markov decision processes. Journal of Artificial Intelligence Research, 14:29–51, 2001.