nips nips2008 nips2008-39 nips2008-39-reference knowledge-graph by maker-knowledge-mining

39 nips-2008-Bounding Performance Loss in Approximate MDP Homomorphisms

Source: pdf

Author: Jonathan Taylor, Doina Precup, Prakash Panagaden

Abstract: We deﬁne a metric for measuring behavior similarity between states in a Markov decision process (MDP), which takes action similarity into account. We show that the kernel of our metric corresponds exactly to the classes of states deﬁned by MDP homomorphisms (Ravindran & Barto, 2003). We prove that the difference in the optimal value function of different states can be upper-bounded by the value of this metric, and that the bound is tighter than previous bounds provided by bisimulation metrics (Ferns et al. 2004, 2005). Our results hold both for discrete and for continuous actions. We provide an algorithm for constructing approximate homomorphisms, by using this metric to identify states that can be grouped together, as well as actions that can be matched. Previous research on this topic is based mainly on heuristics.

reference text

Arun-Kumar, S. (2006). On bisimilarities induced by relations on actions. SEFM ’06: Proceedings of the Fourth IEEE International Conference on Software Engineering and Formal Methods (pp. 41–49). Washington, DC, USA: IEEE Computer Society. Ferns, N., Castro, P. S., Precup, D., & Panangaden, P. (2006). Methods for computing state similarity in Markov Decision Processes. Proceedings of the 22nd UAI. Ferns, N., Panangaden, P., & Precup, D. (2004). Metrics for ﬁnite markov decision processes. Proceedings of the 20th UAI (pp. 162–169). Ferns, N., Panangaden, P., & Precup, D. (2005). Metrics for markov decision processes with inﬁnite state spaces. Proceedings of the 21th UAI (pp. 201–209). Gibbs, A., & Su, F. (2002). On choosing and bounding probability metrics. Givan, R., Dean, T., & Greig, M. (2003). Equivalence notions and model minimization in Markov Decision Processes. Artiﬁcial Intelligence, 147, 163–223. Larsen, K. G., & Skou, A. (1991). Bisimulation through probabilistic testing. Inf. Comput., 94, 1–28. Li, L., Walsh, T. J., & Littman, M. L. (2006). Towards a uniﬁed theory of state abstraction for MDPs. Proceedings of the International Symposium on Artiﬁcial Intelligence and Mathematics. Milner, R. (1995). Communication and concurrency. Prentice Hall International (UK) Ltd. Munkres, J. (1999). Topology. Prentice Hall. Puterman, M. L. (1994). Markov decision processes: discrete stochastic dynamic programming. Wiley. Ravindran, B., & Barto, A. G. (2003). Relativized options: Choosing the right transformation. Proceedings of 20th ICML (pp. 608–615). Ravindran, B., & Barto, A. G. (2004). Approximate homomorphisms: A framework for non-exact minimization inn Markov Decision Processes. Proceedings of the Fifth International Conference on Knowledge Based Computer Systems. Whitt, W. (1978). Approximations of dynamic programs i. Mathematics of Operations Research, 3, 231–243. Wolfe, A. P., & Barto, A. G. (2006). Decision tree methods for ﬁnding reusable MDP homomorphisms. Proceedings of AAAI.