nips nips2008 nips2008-39 nips2008-39-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Jonathan Taylor, Doina Precup, Prakash Panagaden
Abstract: We define a metric for measuring behavior similarity between states in a Markov decision process (MDP), which takes action similarity into account. We show that the kernel of our metric corresponds exactly to the classes of states defined by MDP homomorphisms (Ravindran & Barto, 2003). We prove that the difference in the optimal value function of different states can be upper-bounded by the value of this metric, and that the bound is tighter than previous bounds provided by bisimulation metrics (Ferns et al. 2004, 2005). Our results hold both for discrete and for continuous actions. We provide an algorithm for constructing approximate homomorphisms, by using this metric to identify states that can be grouped together, as well as actions that can be matched. Previous research on this topic is based mainly on heuristics.
Arun-Kumar, S. (2006). On bisimilarities induced by relations on actions. SEFM ’06: Proceedings of the Fourth IEEE International Conference on Software Engineering and Formal Methods (pp. 41–49). Washington, DC, USA: IEEE Computer Society. Ferns, N., Castro, P. S., Precup, D., & Panangaden, P. (2006). Methods for computing state similarity in Markov Decision Processes. Proceedings of the 22nd UAI. Ferns, N., Panangaden, P., & Precup, D. (2004). Metrics for finite markov decision processes. Proceedings of the 20th UAI (pp. 162–169). Ferns, N., Panangaden, P., & Precup, D. (2005). Metrics for markov decision processes with infinite state spaces. Proceedings of the 21th UAI (pp. 201–209). Gibbs, A., & Su, F. (2002). On choosing and bounding probability metrics. Givan, R., Dean, T., & Greig, M. (2003). Equivalence notions and model minimization in Markov Decision Processes. Artificial Intelligence, 147, 163–223. Larsen, K. G., & Skou, A. (1991). Bisimulation through probabilistic testing. Inf. Comput., 94, 1–28. Li, L., Walsh, T. J., & Littman, M. L. (2006). Towards a unified theory of state abstraction for MDPs. Proceedings of the International Symposium on Artificial Intelligence and Mathematics. Milner, R. (1995). Communication and concurrency. Prentice Hall International (UK) Ltd. Munkres, J. (1999). Topology. Prentice Hall. Puterman, M. L. (1994). Markov decision processes: discrete stochastic dynamic programming. Wiley. Ravindran, B., & Barto, A. G. (2003). Relativized options: Choosing the right transformation. Proceedings of 20th ICML (pp. 608–615). Ravindran, B., & Barto, A. G. (2004). Approximate homomorphisms: A framework for non-exact minimization inn Markov Decision Processes. Proceedings of the Fifth International Conference on Knowledge Based Computer Systems. Whitt, W. (1978). Approximations of dynamic programs i. Mathematics of Operations Research, 3, 231–243. Wolfe, A. P., & Barto, A. G. (2006). Decision tree methods for finding reusable MDP homomorphisms. Proceedings of AAAI.