nips nips2012 nips2012-350 nips2012-350-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Felipe Trevizan, Manuela Veloso
Abstract: Probabilistic planning captures the uncertainty of plan execution by probabilistically modeling the effects of actions in the environment, and therefore the probability of reaching different states from a given state and action. In order to compute a solution for a probabilistic planning problem, planners need to manage the uncertainty associated with the different paths from the initial state to a goal state. Several approaches to manage uncertainty were proposed, e.g., consider all paths at once, perform determinization of actions, and sampling. In this paper, we introduce trajectory-based short-sighted Stochastic Shortest Path Problems (SSPs), a novel approach to manage uncertainty for probabilistic planning problems in which states reachable with low probability are substituted by artificial goals that heuristically estimate their cost to reach a goal state. We also extend the theoretical results of Short-Sighted Probabilistic Planner (SSiPP) [1] by proving that SSiPP always finishes and is asymptotically optimal under sufficient conditions on the structure of short-sighted SSPs. We empirically compare SSiPP using trajectorybased short-sighted SSPs with the winners of the previous probabilistic planning competitions and other state-of-the-art planners in the triangle tireworld problems. Trajectory-based SSiPP outperforms all the competitors and is the only planner able to scale up to problem number 60, a problem in which the optimal solution contains approximately 1070 states. 1
[1] F. W. Trevizan and M. M. Veloso. Short-sighted stochastic shortest path problems. In In Proc. of the 22nd International Conference on Automated Planning and Scheduling (ICAPS), 2012.
[2] D. Bertsekas and J. N. Tsitsiklis. Neuro-Dynamic Programming. Athena Scientific, 1996.
[3] A.G. Barto, S.J. Bradtke, and S.P. Singh. Learning to act using real-time dynamic programming. Artificial Intelligence, 72(1-2):81–138, 1995.
[4] B. Bonet and H. Geffner. Labeled RTDP: Improving the convergence of real-time dynamic programming. In Proc. of the 13th International Conference on Automated Planning and Scheduling (ICAPS), 2003.
[5] H.B. McMahan, M. Likhachev, and G.J. Gordon. Bounded real-time dynamic programming: RTDP with monotone upper bounds and performance guarantees. In Proc. of the 22nd International Conference on Machine Learning (ICML), 2005.
[6] Trey Smith and Reid G. Simmons. Focused Real-Time Dynamic Programming for MDPs: Squeezing More Out of a Heuristic. In Proc. of the 21st National Conference on Artificial Intelligence (AAAI), 2006.
[7] S. Sanner, R. Goetschalckx, K. Driessens, and G. Shani. Bayesian real-time dynamic programming. In Proc. of the 21st International Joint Conference on Artificial Intelligence (IJCAI), 2009.
[8] S. Yoon, A. Fern, and R. Givan. FF-Replan: A baseline for probabilistic planning. In Proc. of the 17th International Conference on Automated Planning and Scheduling (ICAPS), 2007.
[9] H.L.S. Younes, M.L. Littman, D. Weissman, and J. Asmuth. The first probabilistic track of the international planning competition. Journal of Artificial Intelligence Research, 24(1):851–887, 2005.
[10] F. Teichteil-Koenigsbuch, G. Infantes, and U. Kuter. RFF: A robust, FF-based mdp planning algorithm for generating policies with low probability of failure. 3rd International Planning Competition (IPPC-ICAPS), 2008.
[11] D. Bryce and O. Buffet. 6th International Planning Competition: Uncertainty Track. In 3rd International Probabilistic Planning Competition (IPPC-ICAPS), 2008.
[12] S. Yoon, A. Fern, R. Givan, and S. Kambhampati. Probabilistic planning via determinization in hindsight. In Proc. of the 23rd National Conference on Artificial Intelligence (AAAI), 2008.
[13] S. Yoon, W. Ruml, J. Benton, and M. B. Do. Improving Determinization in Hindsight for Online Probabilistic Planning. In Proc. of the 20th International Conference on Automated Planning and Scheduling (ICAPS), 2010.
[14] I. Little and S. Thi´ baux. Probabilistic planning vs replanning. In Proc. of ICAPS Workshop e on IPC: Past, Present and Future, 2007.
[15] J. Pearl. Heuristics: Intelligent Search Strategies for Computer Problem Solving. AddisonWesley, Menlo Park, California, 1985.
[16] Levente Kocsis and Csaba Szepesvri. Bandit based Monte-Carlo Planning. In Proc. of the European Conference on Machine Learning (ECML), 2006.
[17] T. Dean, L.P. Kaelbling, J. Kirman, and A. Nicholson. Planning under time constraints in stochastic domains. Artificial Intelligence, 76(1-2):35–74, 1995.
[18] D.P. Bertsekas and J.N. Tsitsiklis. An analysis of stochastic shortest path problems. Mathematics of Operations Research, 16(3):580–595, 1991.
[19] D.P. Bertsekas. Dynamic Programming and Optimal Control. Athena Scientific, 1995.
[20] Blai Bonet and Robert Givan. 2th International Probabilistic Planning Competition (IPPCICAPS). http://www.ldc.usb.ve/˜bonet/ipc5/ (accessed on Dec 13, 2011), 2007. 9