jmlr jmlr2006 jmlr2006-19 jmlr2006-19-reference knowledge-graph by maker-knowledge-mining

19 jmlr-2006-Causal Graph Based Decomposition of Factored MDPs

Source: pdf

Author: Anders Jonsson, Andrew Barto

Abstract: We present Variable Inﬂuence Structure Analysis, or VISA, an algorithm that performs hierarchical decomposition of factored Markov decision processes. VISA uses a dynamic Bayesian network model of actions, and constructs a causal graph that captures relationships between state variables. In tasks with sparse causal graphs VISA exploits structure by introducing activities that cause the values of state variables to change. The result is a hierarchy of activities that together represent a solution to the original task. VISA performs state abstraction for each activity by ignoring irrelevant state variables and lower-level activities. In addition, we describe an algorithm for constructing compact models of the activities introduced. State abstraction and compact activity models enable VISA to apply efﬁcient algorithms to solve the stand-alone subtask associated with each activity. Experimental results show that the decomposition introduced by VISA can signiﬁcantly accelerate construction of an optimal, or near-optimal, policy. Keywords: Markov decision processes, hierarchical decomposition, state abstraction

reference text

A. Barto, S. Bradtke, and S. Singh. Learning to act using real-time dynamic programming. Artiﬁcial Intelligence, Special Volume on Computational Research on Interaction and Agency, 72(1):81– 138, 1995. R. Bellman. A Markov decision process. Journal of Mathematical Mechanics, 6:679–684, 1957. C. Boutilier, R. Dearden, and M. Goldszmidt. Exploiting structure in policy construction. Proceedings of the International Joint Conference on Artiﬁcial Intelligence, 14:1104–1113, 1995. S. Bradtke and M. Duff. Reinforcement learning methods for continuous-time Markov decision problems. Advances in Neural Information Processing Systems, 7:393–400, 1995. ¨ ¸ ¸ O. Simsek and A. Barto. Using relative novelty to identify useful temporal abstractions in reinforcement learning. Proceedings of the International Conference on Machine Learning, 21:751–758, 2004. ¨ ¸ ¸ O. Simsek, A. Wolfe, and A. Barto. Identifying useful subgoals in reinforcement learning by local graph partitioning. Proceedings of the International Conference on Machine Learning, 22, 2005. T. Dean and R. Givan. Model minimization in Markov decision processes. Proceedings of the National Conference on Artiﬁcial Intelligence, 14:106–111, 1997. T. Dean and K. Kanazawa. A model for reasoning about persistence and causation. Computational Intelligence, 5(3):142–150, 1989. 2299 J ONSSON AND BARTO T. Dean and S. Lin. Decomposition techniques for planning in stochastic domains. Proceedings of the International Joint Conference on Artiﬁcial Intelligence, 14:1121–1129, 1995. T. Dietterich. Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artiﬁcial Intelligence Research, 13:227–303, 2000a. T. Dietterich. State Abstraction in MAXQ Hierarchical Reinforcement Learning. Advances in Neural Information Processing Systems, 12:994–1000, 2000b. B. Digney. Emergent hierarchical control structures: Learning reactive/hierarchical relationships in reinforcement environments. From animals to animats, 4:363–372, 1996. Z. Feng and E. Hansen. Symbolic Heuristic Search for Factored Markov Decision Processes. Proceedings of the National Conference on Artiﬁcial Intelligence, 18:455–460, 2002. Z. Feng, E. Hansen, and S. Zilberstein. Symbolic generalization for on-line planning. Proceedings of Uncertainty in Artiﬁcial Intelligence, 19:209–216, 2003. R. Fikes and N. Nilsson. Strips: A New Approach to the Application of Theorem Proving to Problem Solving. Artiﬁcial Intelligence, 2:189–208, 1971. M. Ghavamzadeh and S. Mahadevan. Continuous-Time Hierarchical Reinforcement Learning. Proceedings of the International Conference on Machine Learning, 18:186–193, 2001. C. Guestrin, D. Koller, and R. Parr. Max-norm Projections for Factored MDPs. International Joint Conference on Artiﬁcial Intelligence, 17:673–680, 2001. D. Harel. Statecharts: A visual formalism for complex systems. Science of Computer Programming, 8:231–274, 1987. M. Hauskrecht, N. Meuleau, L. Kaelbling, T. Dean, and C. Boutilier. Hierarchical Solution of Markov Decision Processes using Macro-actions. Uncertainty in Artiﬁcial Intelligence, 14:220– 229, 1998. M. Helmert. A planning heuristic based on causal graph analysis. Proceedings of the International Conference on Automated Planning and Scheduling, 14:161–170, 2004. B. Hengst. Discovering Hierarchy in Reinforcement Learning with HEXQ. Proceedings of the International Conference on Machine Learning, 19:243–250, 2002. J. Hoey, R. St-Aubin, A. Hu, and C. Boutilier. Spudd: Stochastic Planning using Decision Diagrams. Proceedings of Uncertainty in Artiﬁcial Intelligence, 15:279–288, 1999. A. Jonsson and A. Barto. Automated State Abstractions for Options Using the U-Tree Algorithm. Advances in Neural Information Processing Systems, 13:1054–1060, 2001. A. Jonsson and A. Barto. A Causal Approach to Hierarchical Decomposition of Factored MDPs. Proceedings of the International Conference on Machine Learning, 22:401–408, 2005. M. Kearns and D. Koller. Efﬁcient Reinforcement Learning in Factored MDPs. Proceedings of the International Joint Conference on Artiﬁcial Intelligence, 16:740–747, 1999. 2300 C AUSAL G RAPH BASED D ECOMPOSITION S. Mannor, I. Menache, A. Hoze, and U. Klein. Dynamic abstraction in reinforcement learning via clustering. Proceedings of the International Conference on Machine Learning, 21:560–567, 2004. A. McGovern and A. Barto. Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density. Proceedings of the International Conference on Machine Learning, 18:361–368, 2001. I. Menache, S. Mannor, and N. Shimkin. Q-Cut – Dynamic Discovery of Sub-Goals in Reinforcement Learning. Proceedings of the European Conference on Machine Learning, 13:295–306, 2002. K. Murphy. Active Learning of Causal Bayes Net Structure. Technical Report, University of California, Berkeley, USA, 2001. R. Parr. Hierarchical Control and Learning for Markov Decision Processes. Ph.D. Thesis, University of California at Berkeley, 1998. R. Parr and S. Russell. Reinforcement Learning with Hierarchies of Machines. Advances in Neural Information Processing Systems, 10:1043–1049, 1998. M. Pickett and A. Barto. Policyblocks: An Algorithm for Creating Useful Macro-Actions in Reinforcement Learning. Proceedings of the International Conference on Machine Learning, 19: 506–513, 2002. M. Puterman. Markov Decision Processes. Wiley Interscience, New York, USA, 1994. B. Ravindran. An Algebraic Approach to Abstraction in Reinforcement Learning. Ph.D. Thesis, Department of Computer Science, University of Massachusetts, Amherst, USA, 2004. S. Singh, A. Barto, and N. Chentanez. Intrinsically Motivated Reinforcement Learning. Advances in Neural Information Processing Systems, 18:1281–1288, 2005. H. Steck and T. Jaakkola. Unsupervised Active Learning in Large Domains. Proceedings of Uncertainty in Artiﬁcial Intelligence, 18:469–476, 2002. R. Sutton and A. Barto. Reinforcement Learning: An Introduction. MIT Press, Cambridge, USA, 1998. R. Sutton, D. Precup, and S. Singh. Between MDPs and Semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artiﬁcial Intelligence, 112:181–211, 1999. S. Thrun and A. Schwartz. Finding structure in reinforcement learning. Advances in Neural Information Processing Systems, 8:385–392, 1996. S. Tong and D. Koller. Active learning for parameter estimation in Bayesian networks. Advances in Neural Information Processing Systems, 13:647–653, 2001. 2301