nips nips2012 nips2012-149 nips2012-149-reference knowledge-graph by maker-knowledge-mining

149 nips-2012-Hierarchical Optimistic Region Selection driven by Curiosity


Source: pdf

Author: Odalric-ambrym Maillard

Abstract: This paper aims to take a step forwards making the term “intrinsic motivation” from reinforcement learning theoretically well founded, focusing on curiositydriven learning. To that end, we consider the setting where, a fixed partition P of a continuous space X being given, and a process ν defined on X being unknown, we are asked to sequentially decide which cell of the partition to select as well as where to sample ν in that cell, in order to minimize a loss function that is inspired from previous work on curiosity-driven learning. The loss on each cell consists of one term measuring a simple worst case quadratic sampling error, and a penalty term proportional to the range of the variance in that cell. The corresponding problem formulation extends the setting known as active learning for multi-armed bandits to the case when each arm is a continuous region, and we show how an adaptation of recent algorithms for that problem and of hierarchical optimistic sampling algorithms for optimization can be used in order to solve this problem. The resulting procedure, called Hierarchical Optimistic Region SElection driven by Curiosity (HORSE.C) is provided together with a finite-time regret analysis. 1


reference text

Andr` s Antos, Varun Grover, and Csaba Szepesv` ri. Active learning in heteroscedastic noise. Thea a oretical Computer Science, 411(29-30):2712–2728, 2010. A. Baranes and P.-Y. Oudeyer. R-IAC: Robust Intrinsically Motivated Exploration and Active Learning. IEEE Transactions on Autonomous Mental Development, 1(3):155–169, October 2009. S´ bastien Bubeck, R´ mi Munos, Gilles Stoltz, and Csaba Szepesv` ri. X-armed bandits. Journal of e e a Machine Learning Research, 12:1655–1695, 2011. Alexandra Carpentier, Alessandro Lazaric, Mohammad Ghavamzadeh, R´ mi Munos, and Peter e Auer. Upper-confidence-bound algorithms for active learning in multi-armed bandits. In Jyrki Kivinen, Csaba Szepesv` ri, Esko Ukkonen, and Thomas Zeugmann, editors, Algorithmic Learna ing Theory, volume 6925 of Lecture Notes in Computer Science, pages 189–203. Springer Berlin / Heidelberg, 2011. Vincent Graziano, Tobias Glasmachers, Tom Schaul, Leo Pape, Giuseppe Cuccu, J. Leitner, and J. Schmidhuber. Artificial Curiosity for Autonomous Space Exploration. Acta Futura (in press), (1), 2011. Tobias Jung, Daniel Polani, and Peter Stone. Empowerment for continuous agent-environment systems. Adaptive Behavior - Animals, Animats, Software Agents, Robots, Adaptive Systems, 19(1): 16–39, 2011. G.D. Konidaris. Autonomous robot skill acquisition. PhD thesis, University of Massachusetts Amherst, 2011. Odalric-Ambrym Maillard. Hierarchical optimistic region selection driven by curiosity. HAL, 2012. URL http://hal.archives-ouvertes.fr/hal-00740418. Georg Martius, J. Michael Herrmann, and Ralf Der. Guided self-organisation for autonomous robot development. In Proceedings of the 9th European conference on Advances in artificial life, ECAL’07, pages 766–775, Berlin, Heidelberg, 2007. Springer-Verlag. Jonathan Mugan. Autonomous Qualitative Learning of Distinctions and Actions in a Developing Agent. PhD thesis, University of Texas at Austin, 2010. Pierre-Yves Oudeyer and Frederic Kaplan. What is Intrinsic Motivation? A Typology of Computational Approaches. Frontiers in neurorobotics, 1(November):6, January 2007. J. Schmidhuber. Formal theory of creativity, fun, and intrinsic motivation (1990-2010). Autonomous Mental Development, IEEE Transactions on, 2(3):230–247, 2010. 9