nips nips2008 nips2008-144 nips2008-144-reference knowledge-graph by maker-knowledge-mining

144 nips-2008-Multi-resolution Exploration in Continuous Spaces

Source: pdf

Author: Ali Nouri, Michael L. Littman

Abstract: The essence of exploration is acting to try to decrease uncertainty. We propose a new methodology for representing uncertainty in continuous-state control problems. Our approach, multi-resolution exploration (MRE), uses a hierarchical mapping to identify regions of the state space that would beneﬁt from additional samples. We demonstrate MRE’s broad utility by using it to speed up learning in a prototypical model-based and value-based reinforcement-learning method. Empirical results show that MRE improves upon state-of-the-art exploration approaches. 1

reference text

Auer, P., & Ortner, R. (2006). Logarithmic online regret bounds for undiscounted reinforcement learning. Advances in Neural Information Processing Systems 20 (NIPS-06). Brafman, R. I., & Tennenholtz, M. (2002). R-max, a general polynomial time algorithm for nearoptimal reinforcement learning. Journal of Machine Learning Research, 3, 213–231. Ernst, D., Geurts, P., & Wehenkel, L. (2005). Tree-based batch mode reinforcement learning. Journal of Maching Learning Research, 6, 503–556. Gordon, G. J. (1999). Approximate solutions to Markov decision processes. Doctoral dissertation, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA. Jong, N. K., & Stone, P. (2007). Model-based function approximation for reinforcement learning. The Sixth International Joint Conference on Autonomous Agents and Multiagent Systems. Kaelbling, L. P., Littman, M. L., & Moore, A. P. (1996). Reinforcement learning: A survey. Journal of Artiﬁcial Intelligence Research, 4, 237–285. Kakade, S., Kearns, M., & Langford, J. (2003). Exploration in metric state spaces. In Proc. of the 20th International Conference on Machine Learning, 2003. Kearns, M. J., & Singh, S. P. (2002). Near-optimal reinforcement learning in polynomial time. Machine Learning, 49, 209–232. Lagoudakis, M. G., & Parr, R. (2003). Least-squares policy iteration. Journal of Machine Learning Research, 4, 1107–1149. Munos, R., & Moore, A. (2002). Variable resolution discretization in optimal control. Machine Learning, 49, 291–323. Munos, R., & Moore, A. W. (2000). Rates of convergence for variable resolution schemes in optimal control. Proceedings of the Seventeenth International Conference on Machine Learning (ICML00) (pp. 647–654). Preparata, F. P., & Shamos, M. I. (1985). Computational geometry - an introduction. Springer. Puterman, M. L. (1994). Markov decision processes: Discrete stochastic dynamic programming. New York: Wiley. Strehl, A., & Littman, M. (2007). Online linear regression and its application to model-based reinforcement learning. Advances in Neural Information Processing Systems 21 (NIPS-07). Strehl, A. L., & Littman, M. L. (2005). A theoretical analysis of model-based interval estimation. ICML-05 (pp. 857–864). Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction. Cambridge, MA: MIT Press.