nips nips2001 nips2001-67 nips2001-67-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: RĂ©mi Munos
Abstract: It is desirable that a complex decision-making problem in an uncertain world be adequately modeled by a Markov Decision Process (MDP) whose structural representation is adaptively designed by a parsimonious resources allocation process. Resources include time and cost of exploration, amount of memory and computational time allowed for the policy or value function representation. Concerned about making the best use of the available resources, we address the problem of efficiently estimating where adding extra resources is highly needed in order to improve the expected performance of the resulting policy. Possible application in reinforcement learning (RL) , when real-world exploration is highly costly, concerns the detection of those areas of the state-space that need primarily to be explored in order to improve the policy. Another application concerns approximation of continuous state-space stochastic control problems using adaptive discretization techniques for which highly efficient grid points allocation is mandatory to survive high dimensionality. Maybe surprisingly these two problems can be formulated under a common framework: for a given resource allocation, which defines a belief state over possible MDPs, find where adding new resources (thus decreasing the uncertainty of some parameters -transition probabilities or rewards) will most likely increase the expected performance of the new policy. To do so, we use sampling techniques for estimating the contribution of each parameter's probability distribution function (Pdf) to the expected loss of using an approximate policy (such as the optimal policy of the most probable MDP) instead of the true (but unknown) policy.
[1] Richard Dearden, Nir Friedman, and David Andre. Model based bayesian exploration. Proceeding of Uncertainty in Artificial Intelligence, 1999.
[2] Remi Munos. Decision-making under uncertainty:. Efficiently estimating where extra ressources are needed. Technical report, Ecole Polytechnique, 2002.
[3] Remi Munos and Andrew Moore. Influence and variance of a markov chain : Application to adaptive discretizations in optimal control. Proceedings of the 38th IEEE Conference on Decision and Control, 1999.
[4] Remi Munos and Andrew W. Moore. Rates of convergence for variable resolution schemes in optimal control. International Conference on Machine Learning, 2000.
[5] Martin L. Puterman. Markov Decision Processes, Discrete Stochastic Dynamic Programming. A Wiley-Interscience Publication, 1994.
[6] John Rust. Using Randomization to Break the Curse of Dimensionality. Computational Economics. 1997.