nips nips2013 nips2013-79 nips2013-79-reference knowledge-graph by maker-knowledge-mining

79 nips-2013-DESPOT: Online POMDP Planning with Regularization


Source: pdf

Author: Adhiraj Somani, Nan Ye, David Hsu, Wee Sun Lee

Abstract: POMDPs provide a principled framework for planning under uncertainty, but are computationally intractable, due to the “curse of dimensionality” and the “curse of history”. This paper presents an online POMDP algorithm that alleviates these difficulties by focusing the search on a set of randomly sampled scenarios. A Determinized Sparse Partially Observable Tree (DESPOT) compactly captures the execution of all policies on these scenarios. Our Regularized DESPOT (R-DESPOT) algorithm searches the DESPOT for a policy, while optimally balancing the size of the policy and its estimated value obtained under the sampled scenarios. We give an output-sensitive performance bound for all policies derived from a DESPOT, and show that R-DESPOT works well if a small optimal policy exists. We also give an anytime algorithm that approximates R-DESPOT. Experiments show strong results, compared with two of the fastest online POMDP algorithms. Source code along with experimental settings are available at http://bigbird.comp. nus.edu.sg/pmwiki/farm/appl/. 1


reference text

[1] J. Asmuth and M.L. Littman. Approaching Bayes-optimality using Monte-Carlo tree search. In Proc. Int. Conf. on Automated Planning & Scheduling, 2011. 2

[2] D.P. Bertsekas. Dynamic Programming and Optimal Control, volume 1. Athena Scientific, 3rd edition, 2005. 2

[3] E.K.P. Chong, R.L. Givan, and H.S. Chang. A framework for simulation-based network control via hindsight optimization. In Proc. IEEE Conf. on Decision & Control, volume 2, pages 1433– 1438, 2000. 3

[4] P.-A. Coquelin and R. Munos. Bandit algorithms for tree search. In Proc. Uncertainty in Artificial Intelligence, 2007. 1

[5] S. Gelly and D. Silver. Combining online and offline knowledge in UCT. In Proc. Int. Conf. on Machine Learning, 2007. 2

[6] D. Haussler. Decision theoretic generalizations of the PAC model for neural net and other learning applications. Information and Computation, 100(1):78–150, 1992. 4

[7] R. He, E. Brunskill, and N. Roy. Efficient planning under uncertainty with macro-actions. J. Artificial Intelligence Research, 40(1):523–570, 2011. 2

[8] M. Kearns, Y. Mansour, and A.Y. Ng. Approximate planning in large POMDPs via reusable trajectories. In Advances in Neural Information Processing Systems (NIPS), volume 12, pages 1001–1007. 1999. 2

[9] L. Kocsis and C. Szepesvari. Bandit based Monte-Carlo planning. In Proc. Eur. Conf. on Machine Learning, pages 282–293, 2006. 1

[10] H. Kurniawati, D. Hsu, and W.S. Lee. SARSOP: Efficient point-based POMDP planning by approximating optimally reachable belief spaces. In Proc. Robotics: Science and Systems, 2008. 2, 5, 7

[11] O. Madani, S. Hanks, and A. Condon. On the undecidability of probabilistic planning and infinite-horizon partially observable Markov decision problems. In Proc. AAAI Conf. on Artificial Intelligence, pages 541–548, 1999. 1

[12] D. McAllester and S. Singh. Approximate planning for factored POMDPs using belief state simplification. In Proc. Uncertainty in Artificial Intelligence, 1999. 2

[13] A.Y. Ng and M. Jordan. PEGASUS: A policy search method for large MDPs and POMDPs. In Proc. Uncertainty in Artificial Intelligence, pages 406–415, 2000. 1, 2

[14] S.C.W. Ong, S.W. Png, D. Hsu, and W.S. Lee. Planning under uncertainty for robotic tasks with mixed observability. Int. J. Robotics Research, 29(8):1053–1068, 2010. 8

[15] J. Pineau, G. Gordon, and S. Thrun. Point-based value iteration: An anytime algorithm for POMDPs. In Proc. Int. Jnt. Conf. on Artificial Intelligence, pages 477–484, 2003. 2, 7

[16] S. Ross and B. Chaib-Draa. AEMS: An anytime online search algorithm for approximate policy refinement in large POMDPs. In Proc. Int. Jnt. Conf. on Artificial Intelligence, pages 2592–2598. 2007. 7

[17] S. Ross, J. Pineau, S. Paquet, and B. Chaib-Draa. Online planning algorithms for POMDPs. J. Artificial Intelligence Research, 32(1):663–704, 2008. 1, 2, 7, 8

[18] D. Silver and J. Veness. Monte-Carlo planning in large POMDPs. In Advances in Neural Information Processing Systems (NIPS). 2010. 1, 2, 6, 7, 8

[19] T. Smith and R. Simmons. Heuristic search value iteration for POMDPs. In Proc. Uncertainty in Artificial Intelligence, pages 520–527, 2004. 5, 7

[20] T. Smith and R. Simmons. Point-based POMDP algorithms: Improved analysis and implementation. In Proc. Uncertainty in Artificial Intelligence, 2005. 2

[21] M.T.J. Spaan and N. Vlassis. Perseus: Randomized point-based value iteration for POMDPs. J. Artificial Intelligence Research, 24:195–220, 2005. 2

[22] S.W. Yoon, A. Fern, R. Givan, and S. Kambhampati. Probabilistic planning via determinization in hindsight. In AAAI, pages 1010–1016, 2008. 3 9