nips nips2012 nips2012-350 nips2012-350-reference knowledge-graph by maker-knowledge-mining

350 nips-2012-Trajectory-Based Short-Sighted Probabilistic Planning

Source: pdf

Author: Felipe Trevizan, Manuela Veloso

Abstract: Probabilistic planning captures the uncertainty of plan execution by probabilistically modeling the effects of actions in the environment, and therefore the probability of reaching different states from a given state and action. In order to compute a solution for a probabilistic planning problem, planners need to manage the uncertainty associated with the different paths from the initial state to a goal state. Several approaches to manage uncertainty were proposed, e.g., consider all paths at once, perform determinization of actions, and sampling. In this paper, we introduce trajectory-based short-sighted Stochastic Shortest Path Problems (SSPs), a novel approach to manage uncertainty for probabilistic planning problems in which states reachable with low probability are substituted by artiﬁcial goals that heuristically estimate their cost to reach a goal state. We also extend the theoretical results of Short-Sighted Probabilistic Planner (SSiPP) [1] by proving that SSiPP always ﬁnishes and is asymptotically optimal under sufﬁcient conditions on the structure of short-sighted SSPs. We empirically compare SSiPP using trajectorybased short-sighted SSPs with the winners of the previous probabilistic planning competitions and other state-of-the-art planners in the triangle tireworld problems. Trajectory-based SSiPP outperforms all the competitors and is the only planner able to scale up to problem number 60, a problem in which the optimal solution contains approximately 1070 states. 1

reference text

[1] F. W. Trevizan and M. M. Veloso. Short-sighted stochastic shortest path problems. In In Proc. of the 22nd International Conference on Automated Planning and Scheduling (ICAPS), 2012.

[2] D. Bertsekas and J. N. Tsitsiklis. Neuro-Dynamic Programming. Athena Scientiﬁc, 1996.

[3] A.G. Barto, S.J. Bradtke, and S.P. Singh. Learning to act using real-time dynamic programming. Artiﬁcial Intelligence, 72(1-2):81–138, 1995.

[4] B. Bonet and H. Geffner. Labeled RTDP: Improving the convergence of real-time dynamic programming. In Proc. of the 13th International Conference on Automated Planning and Scheduling (ICAPS), 2003.

[5] H.B. McMahan, M. Likhachev, and G.J. Gordon. Bounded real-time dynamic programming: RTDP with monotone upper bounds and performance guarantees. In Proc. of the 22nd International Conference on Machine Learning (ICML), 2005.

[6] Trey Smith and Reid G. Simmons. Focused Real-Time Dynamic Programming for MDPs: Squeezing More Out of a Heuristic. In Proc. of the 21st National Conference on Artiﬁcial Intelligence (AAAI), 2006.

[7] S. Sanner, R. Goetschalckx, K. Driessens, and G. Shani. Bayesian real-time dynamic programming. In Proc. of the 21st International Joint Conference on Artiﬁcial Intelligence (IJCAI), 2009.

[8] S. Yoon, A. Fern, and R. Givan. FF-Replan: A baseline for probabilistic planning. In Proc. of the 17th International Conference on Automated Planning and Scheduling (ICAPS), 2007.

[9] H.L.S. Younes, M.L. Littman, D. Weissman, and J. Asmuth. The ﬁrst probabilistic track of the international planning competition. Journal of Artiﬁcial Intelligence Research, 24(1):851–887, 2005.

[10] F. Teichteil-Koenigsbuch, G. Infantes, and U. Kuter. RFF: A robust, FF-based mdp planning algorithm for generating policies with low probability of failure. 3rd International Planning Competition (IPPC-ICAPS), 2008.

[11] D. Bryce and O. Buffet. 6th International Planning Competition: Uncertainty Track. In 3rd International Probabilistic Planning Competition (IPPC-ICAPS), 2008.

[12] S. Yoon, A. Fern, R. Givan, and S. Kambhampati. Probabilistic planning via determinization in hindsight. In Proc. of the 23rd National Conference on Artiﬁcial Intelligence (AAAI), 2008.

[13] S. Yoon, W. Ruml, J. Benton, and M. B. Do. Improving Determinization in Hindsight for Online Probabilistic Planning. In Proc. of the 20th International Conference on Automated Planning and Scheduling (ICAPS), 2010.

[14] I. Little and S. Thi´ baux. Probabilistic planning vs replanning. In Proc. of ICAPS Workshop e on IPC: Past, Present and Future, 2007.

[15] J. Pearl. Heuristics: Intelligent Search Strategies for Computer Problem Solving. AddisonWesley, Menlo Park, California, 1985.

[16] Levente Kocsis and Csaba Szepesvri. Bandit based Monte-Carlo Planning. In Proc. of the European Conference on Machine Learning (ECML), 2006.

[17] T. Dean, L.P. Kaelbling, J. Kirman, and A. Nicholson. Planning under time constraints in stochastic domains. Artiﬁcial Intelligence, 76(1-2):35–74, 1995.

[18] D.P. Bertsekas and J.N. Tsitsiklis. An analysis of stochastic shortest path problems. Mathematics of Operations Research, 16(3):580–595, 1991.

[19] D.P. Bertsekas. Dynamic Programming and Optimal Control. Athena Scientiﬁc, 1995.

[20] Blai Bonet and Robert Givan. 2th International Probabilistic Planning Competition (IPPCICAPS). http://www.ldc.usb.ve/˜bonet/ipc5/ (accessed on Dec 13, 2011), 2007. 9