nips nips2013 nips2013-79 nips2013-79-reference knowledge-graph by maker-knowledge-mining

79 nips-2013-DESPOT: Online POMDP Planning with Regularization

Source: pdf

Author: Adhiraj Somani, Nan Ye, David Hsu, Wee Sun Lee

Abstract: POMDPs provide a principled framework for planning under uncertainty, but are computationally intractable, due to the “curse of dimensionality” and the “curse of history”. This paper presents an online POMDP algorithm that alleviates these difﬁculties by focusing the search on a set of randomly sampled scenarios. A Determinized Sparse Partially Observable Tree (DESPOT) compactly captures the execution of all policies on these scenarios. Our Regularized DESPOT (R-DESPOT) algorithm searches the DESPOT for a policy, while optimally balancing the size of the policy and its estimated value obtained under the sampled scenarios. We give an output-sensitive performance bound for all policies derived from a DESPOT, and show that R-DESPOT works well if a small optimal policy exists. We also give an anytime algorithm that approximates R-DESPOT. Experiments show strong results, compared with two of the fastest online POMDP algorithms. Source code along with experimental settings are available at http://bigbird.comp. nus.edu.sg/pmwiki/farm/appl/. 1

reference text

[1] J. Asmuth and M.L. Littman. Approaching Bayes-optimality using Monte-Carlo tree search. In Proc. Int. Conf. on Automated Planning & Scheduling, 2011. 2

[2] D.P. Bertsekas. Dynamic Programming and Optimal Control, volume 1. Athena Scientiﬁc, 3rd edition, 2005. 2

[3] E.K.P. Chong, R.L. Givan, and H.S. Chang. A framework for simulation-based network control via hindsight optimization. In Proc. IEEE Conf. on Decision & Control, volume 2, pages 1433– 1438, 2000. 3

[4] P.-A. Coquelin and R. Munos. Bandit algorithms for tree search. In Proc. Uncertainty in Artiﬁcial Intelligence, 2007. 1

[5] S. Gelly and D. Silver. Combining online and ofﬂine knowledge in UCT. In Proc. Int. Conf. on Machine Learning, 2007. 2

[6] D. Haussler. Decision theoretic generalizations of the PAC model for neural net and other learning applications. Information and Computation, 100(1):78–150, 1992. 4

[7] R. He, E. Brunskill, and N. Roy. Efﬁcient planning under uncertainty with macro-actions. J. Artiﬁcial Intelligence Research, 40(1):523–570, 2011. 2

[8] M. Kearns, Y. Mansour, and A.Y. Ng. Approximate planning in large POMDPs via reusable trajectories. In Advances in Neural Information Processing Systems (NIPS), volume 12, pages 1001–1007. 1999. 2

[9] L. Kocsis and C. Szepesvari. Bandit based Monte-Carlo planning. In Proc. Eur. Conf. on Machine Learning, pages 282–293, 2006. 1

[10] H. Kurniawati, D. Hsu, and W.S. Lee. SARSOP: Efﬁcient point-based POMDP planning by approximating optimally reachable belief spaces. In Proc. Robotics: Science and Systems, 2008. 2, 5, 7

[11] O. Madani, S. Hanks, and A. Condon. On the undecidability of probabilistic planning and inﬁnite-horizon partially observable Markov decision problems. In Proc. AAAI Conf. on Artiﬁcial Intelligence, pages 541–548, 1999. 1

[12] D. McAllester and S. Singh. Approximate planning for factored POMDPs using belief state simpliﬁcation. In Proc. Uncertainty in Artiﬁcial Intelligence, 1999. 2

[13] A.Y. Ng and M. Jordan. PEGASUS: A policy search method for large MDPs and POMDPs. In Proc. Uncertainty in Artiﬁcial Intelligence, pages 406–415, 2000. 1, 2

[14] S.C.W. Ong, S.W. Png, D. Hsu, and W.S. Lee. Planning under uncertainty for robotic tasks with mixed observability. Int. J. Robotics Research, 29(8):1053–1068, 2010. 8

[15] J. Pineau, G. Gordon, and S. Thrun. Point-based value iteration: An anytime algorithm for POMDPs. In Proc. Int. Jnt. Conf. on Artiﬁcial Intelligence, pages 477–484, 2003. 2, 7

[16] S. Ross and B. Chaib-Draa. AEMS: An anytime online search algorithm for approximate policy reﬁnement in large POMDPs. In Proc. Int. Jnt. Conf. on Artiﬁcial Intelligence, pages 2592–2598. 2007. 7

[17] S. Ross, J. Pineau, S. Paquet, and B. Chaib-Draa. Online planning algorithms for POMDPs. J. Artiﬁcial Intelligence Research, 32(1):663–704, 2008. 1, 2, 7, 8

[18] D. Silver and J. Veness. Monte-Carlo planning in large POMDPs. In Advances in Neural Information Processing Systems (NIPS). 2010. 1, 2, 6, 7, 8

[19] T. Smith and R. Simmons. Heuristic search value iteration for POMDPs. In Proc. Uncertainty in Artiﬁcial Intelligence, pages 520–527, 2004. 5, 7

[20] T. Smith and R. Simmons. Point-based POMDP algorithms: Improved analysis and implementation. In Proc. Uncertainty in Artiﬁcial Intelligence, 2005. 2

[21] M.T.J. Spaan and N. Vlassis. Perseus: Randomized point-based value iteration for POMDPs. J. Artiﬁcial Intelligence Research, 24:195–220, 2005. 2

[22] S.W. Yoon, A. Fern, R. Givan, and S. Kambhampati. Probabilistic planning via determinization in hindsight. In AAAI, pages 1010–1016, 2008. 3 9