nips nips2013 nips2013-347 nips2013-347-reference knowledge-graph by maker-knowledge-mining

347 nips-2013-Variational Planning for Graph-based MDPs

Source: pdf

Author: Qiang Cheng, Qiang Liu, Feng Chen, Alex Ihler

Abstract: Markov Decision Processes (MDPs) are extremely useful for modeling and solving sequential decision making problems. Graph-based MDPs provide a compact representation for MDPs with large numbers of random variables. However, the complexity of exactly solving a graph-based MDP usually grows exponentially in the number of variables, which limits their application. We present a new variational framework to describe and solve the planning problem of MDPs, and derive both exact and approximate planning algorithms. In particular, by exploiting the graph structure of graph-based MDPs, we propose a factored variational value iteration algorithm in which the value function is ﬁrst approximated by the multiplication of local-scope value functions, then solved by minimizing a Kullback-Leibler (KL) divergence. The KL divergence is optimized using the belief propagation algorithm, with complexity exponential in only the cluster size of the graph. Experimental comparison on different models shows that our algorithm outperforms existing approximation algorithms at ﬁnding good policies. 1

reference text

R Iris Bahar, Erica A Frohm, Charles M Gaona, Gary D Hachtel, Enrico Macii, Abelardo Pardo, and Fabio Somenzi. Algebraic decision diagrams and their applications. In IEEE/ACM International Conference on Computer-Aided Design, pages 188–191, 1993. David Barber. Bayesian reasoning and machine learning. Cambridge University Press, 2011. Craig Boutilier, Richard Dearden, and Mois´ s Goldszmidt. Stochastic dynamic programming with factored e representations. Artiﬁcial Intelligence, 121(1):49–107, 2000. Nicklas Forsell and R´ gis Sabbadin. Approximate linear-programming algorithms for graph-based markov e decision processes. Frontiers in Artiﬁcial Intelligence and Applications, 141:590, 2006. Carlos Guestrin, Daphne Koller, and Ronald Parr. Multiagent planning with factored MDPs. Advances in Neural Information Processing Systems, 14:1523–1530, 2001. Carlos Guestrin, Daphne Koller, Ronald Parr, and Shobha Venkataraman. Efﬁcient solution algorithms for factored mdps. Journal of Artiﬁcial Intelligence Research, 19:399–468, 2003. Jesse Hoey, Robert St-Aubin, Alan Hu, and Craig Boutilier. SPUDD: Stochastic planning using decision diagrams. In Proceedings of the Fifteenth conference on Uncertainty in Artiﬁcial Intelligence, pages 279– 288, 1999. Daphne Koller and Ronald Parr. Policy iteration for factored mdps. In Proceedings of the Sixteenth Conference on Uncertainty in Artiﬁcial Intelligence, pages 326–334, 2000. Qiang Liu and Alexander Ihler. Variational algorithms for marginal MAP. In Uncertainty in Artiﬁcial Intelligence (UAI), 2011. Qiang Liu and Alexander Ihler. Belief propagation for structured decision making. In Uncertainty in Artiﬁcial Intelligence (UAI), pages 523–532, August 2012. A. Nath and P. Domingos. Efﬁcient belief propagation for utility maximization and repeated inference. In The Proceeding of the Twenty-Fourth AAAI Conference on Artiﬁcial Intelligence, 2010. Nathalie Peyrard and R´ gis Sabbadin. Mean ﬁeld approximation of the policy iteration algorithm for graphe based markov decision processes. Frontiers in Artiﬁcial Intelligence and Applications, 141:595, 2006. Martin L Puterman. Markov decision processes: discrete stochastic dynamic programming. Wiley-Interscience, 2009. Matthew Richardson and Pedro Domingos. Mining knowledge-sharing sites for viral marketing. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 61– 70, 2002. R. Sabbadin, N. Peyrard, and N. Forsell. A framework and a mean-ﬁeld algorithm for the local control of spatial processes. International Journal of Approximate Reasoning, 53(1):66 – 86, 2012. Ross D Shachter. Model building with belief networks and inﬂuence diagrams. Advances in decision analysis: from foundations to applications, pages 177–201, 2007. Olivier Sigaud, Olivier Buffet, et al. Markov decision processes in artiﬁcial intelligence. ISTE-Jonh Wiley & Sons, 2010. M.J. Wainwright and M.I. Jordan. Graphical models, exponential families, and variational inference. Foundations and Trends in Machine Learning, 1(1-2):1–305, 2008. 9