nips nips2007 nips2007-215 nips2007-215-reference knowledge-graph by maker-knowledge-mining

215 nips-2007-What makes some POMDP problems easy to approximate?


Source: pdf

Author: Wee S. Lee, Nan Rong, Daniel J. Hsu

Abstract: Point-based algorithms have been surprisingly successful in computing approximately optimal solutions for partially observable Markov decision processes (POMDPs) in high dimensional belief spaces. In this work, we seek to understand the belief-space properties that allow some POMDP problems to be approximated efficiently and thus help to explain the point-based algorithms’ success often observed in the experiments. We show that an approximately optimal POMDP solution can be computed in time polynomial in the covering number of a reachable belief space, which is the subset of the belief space reachable from a given belief point. We also show that under the weaker condition of having a small covering number for an optimal reachable space, which is the subset of the belief space reachable under an optimal policy, computing an approximately optimal solution is NP-hard. However, given a suitable set of points that “cover” an optimal reachable space well, an approximate solution can be computed in polynomial time. The covering number highlights several interesting properties that reduce the complexity of POMDP planning in practice, e.g., fully observed state variables, beliefs with sparse support, smooth beliefs, and circulant state-transition matrices. 1


reference text

[1] R.M. Gray. Toeplitz and Circulant Matrices: A Review. Now Publishers Inc, 2006.

[2] M. Hauskrecht. Value-function approximations for partially observable Markov decision processes. J. Artificial Intelligence Research, 13:33–94, 2000.

[3] J. Hoey, A. von Bertoldi, P. Poupart, and A. Mihailidis. Assisting persons with dementia during handwashing using a partially observable Markov decision process. In Proc. Int. Conf. on Vision Systems, 2007.

[4] D. Hsu, W.S. Lee, and N. Rong. Accelerating point-based POMDP algorithms through successive approximations of the optimal reachable space. Technical Report TRA4/07, National University of Singapore. School of Computing, April 2007.

[5] L.P. Kaelbling, M.L. Littman, and A.R. Cassandra. Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101(1–2):99–134, 1998.

[6] M. Kearns, Y. Mansour, and A.Y. Ng. A sparse sampling algorithm for near optimal planning in large Markov decision processes. Machine Learning, 49(2-3):193–208, 2002.

[7] M.L. Littman. Algorithms for sequential decision making. PhD thesis, Dept. of Computer Science, Brown University, 1996.

[8] C. Lusena, J. Goldsmith, and M. Mundhenk. Nonapproximability results for partially observable Markov decision processes. J. Artificial Intelligence Research, 14:83–103, 2002.

[9] O. Madani, S. Hanks, and A. Condon. On the undecidability of probabilistic planning and infinite-horizon partially observable Markov decision problems. In Proc. Nat. Conf. on Artificial Intelligence, pages 541– 548, 1999.

[10] C. Papadimitriou and J.N. Tsisiklis. The complexity of Markov decision processes. Mathematics of Operations Research, 12(3):441–450, 1987.

[11] J. Pineau, G. Gordon, and S. Thrun. Point-based value iteration: An anytime algorithm for POMDPs. In Proc. Int. Jnt. Conf. on Artificial Intelligence, pages 477–484, 2003.

[12] S.P. Singh and R.C. Yee. An upper bound on the loss from approximate optimal-value functions. Machine Learning, 16(3):227–233, 1994.

[13] R.D. Smallwood and E.J. Sondik. The optimal control of partially observable Markov processes over a finite horizon. Operations Research, 21:1071–1088, 1973.

[14] T. Smith and R. Simmons. Point-based POMDP algorithms: Improved analysis and implementation. In Proc. Uncertainty in Artificial Intelligence, 2005.

[15] M.T.J. Spaan and N. Vlassis. A point-based POMDP algorithm for robot planning. In Proc. IEEE Int. Conf. on Robotics & Automation, 2004.

[16] N.L. Zhang and W. Zhang. Speeding up the convergence of value iteration in partially observable Markov decision processes. Journal of Artificial Intelligence Research, 14:29–51, 2002.