nips nips2006 nips2006-124 nips2006-124-reference knowledge-graph by maker-knowledge-mining

124 nips-2006-Linearly-solvable Markov decision problems

Source: pdf

Author: Emanuel Todorov

Abstract: We introduce a class of MPDs which greatly simplify Reinforcement Learning. They have discrete state spaces and continuous control spaces. The controls have the effect of rescaling the transition probabilities of an underlying Markov chain. A control cost penalizing KL divergence between controlled and uncontrolled transition probabilities makes the minimization problem convex, and allows analytical computation of the optimal controls given the optimal value function. An exponential transformation of the optimal value function makes the minimized Bellman equation linear. Apart from their theoretical signi cance, the new MDPs enable ef cient approximations to traditional MDPs. Shortest path problems are approximated to arbitrary precision with largest eigenvalue problems, yielding an O (n) algorithm. Accurate approximations to generic MDPs are obtained via continuous embedding reminiscent of LP relaxation in integer programming. Offpolicy learning of the optimal value function is possible without need for stateaction values; the new algorithm (Z-learning) outperforms Q-learning. This work was supported by NSF grant ECS–0524761. 1

reference text

[1]

[2]

[3]

[4] B. Scholkopf and A. Smola, Learning with kernels. MIT Press (2002) S. Boyd and L. Vandenberghe, Convex optimization. Cambridge University Press (2004) D. Bertsekas, Dynamic programming and optimal control (2nd ed). Athena Scienti c (2000) F. Chung, Spectral graph theory. CMBS Regional Conference Series in Mathematics (1997)