jmlr jmlr2012 jmlr2012-116 jmlr2012-116-reference knowledge-graph by maker-knowledge-mining

116 jmlr-2012-Transfer in Reinforcement Learning via Shared Features

Source: pdf

Author: George Konidaris, Ilya Scheidwasser, Andrew Barto

Abstract: We present a framework for transfer in reinforcement learning based on the idea that related tasks share some common features, and that transfer can be achieved via those shared features. The framework attempts to capture the notion of tasks that are related but distinct, and provides some insight into when transfer can be usefully applied to a problem sequence and when it cannot. We apply the framework to the knowledge transfer problem, and show that an agent can learn a portable shaping function from experience in a sequence of tasks to signiﬁcantly improve performance in a later related task, even given a very brief training period. We also apply the framework to skill transfer, to show that agents can learn portable skills across a sequence of tasks that signiﬁcantly improve performance on later related tasks, approaching the performance of agents given perfectly learned problem-speciﬁc skills. Keywords: reinforcement learning, transfer, shaping, skills

reference text

P.E. Agre and D. Chapman. Pengi: An implementation of a theory of activity. In Proceedings of the Sixth National Conference on Artiﬁcial Intelligence, pages 268–272, 1987. B. Banerjee and P. Stone. General game learning using knowledge transfer. In Proceedings of the 20th International Joint Conference on Artiﬁcial Intelligence, pages 672–677, 2007. A.G. Barto and S. Mahadevan. Recent advances in hierarchical reinforcement learning. Discrete Event Dynamic Systems, 13:41–77, 2003. Special Issue on Reinforcement Learning. D.S. Bernstein. Reusing old policies to accelerate learning on new MDPs. Technical Report UMCS-1999-026, Department of Computer Science, University of Massachusetts at Amherst, April 1999. J. Boyan and A.W. Moore. Learning evaluation functions to improve optimization by local search. Journal of Machine Learning Research, 1:77–112, 2000. T. Croonenborghs, K. Driessens, and M. Bruynooghe. Learning relational options for inductive transfer in relational reinforcement learning. In Proceedings of the Seventeenth International Conference on Inductive Logic Programming, pages 88–97, 2007. T.G. Dietterich. Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artiﬁcial Intelligence Research, 13:227–303, 2000. B.L. Digney. Learning hierarchical control structures for multiple tasks and changing environments. In R. Pfeifer, B. Blumberg, J. Meyer, and S.W. Wilson, editors, From Animals to Animats 5: Proceedings of the Fifth International Conference on Simulation of Adaptive Behavior, Zurich, Switzerland, August 1998. MIT Press. M. Dorigo and M. Colombetti. Robot Shaping: An Experiment in Behavior Engineering. MIT Press/Bradford Books, 1998. K. Ferguson and S. Mahadevan. Proto-transfer learning in Markov Decision Processes using spectral methods. In Proceedings of the ICML Workshop on Structural Knowledge Transfer for Machine Learning, Pittsburgh, June 2006. K. Ferguson and S. Mahadevan. Proto-transfer learning in Markov Decision Processes using spectral methods. Technical Report TR-08-23, University of Massachusetts Amherst, 2008. F. Fern´ ndez and M. Veloso. Probabilistic policy reuse in a reinforcement learning agent. In Proa ceedings of the 5th International Joint Conference on Autonomous Agents and Multiagent Systems, pages 720–727, 2006. 1368 T RANSFER IN R EINFORCEMENT L EARNING VIA S HARED F EATURES E. Ferrante, A. Lazaric, and M. Restelli. Transfer of task representation in reinforcement learning using policy-based proto-value functions (short paper). In Proceedings of the Seventh International Conference on Autonomous Agents and Multiagent Systems, pages 1329–1332, 2008. L. Frommberger. Learning to behave in space: A qualitative spatial representation for robot navigation with reinforcement learning. International Journal on Artiﬁcial Intelligence Tools, 17(3): 465–482, 2008. A. Guazzelli, F.J. Corbacho, M. Bota, and M.A. Arbib. Affordances, motivations, and the world graph theory. Adaptive Behavior, 6(3/4):433–471, 1998. B. Hengst. Discovering hierarchy in reinforcement learning with HEXQ. In Proceedings of the Nineteenth International Conference on Machine Learning, pages 243–250, 2002. A. Jonsson and A.G. Barto. Automated state abstraction for options using the U-Tree algorithm. In Advances in Neural Information Processing Systems 13, pages 1054–1060, 2001. A. Jonsson and A.G. Barto. A causal approach to hierarchical decomposition of factored MDPs. In Proceedings of the Twenty Second International Conference on Machine Learning, 2005. G.D. Konidaris and G.M. Hayes. Estimating future reward in reinforcement learning animats using associative learning. In From Animals to Animats 8: Proceedings of the 8th International Conference on the Simulation of Adaptive Behavior, pages 297–304, July 2004. A. Laud and G. DeJong. The inﬂuence of reward on the speed of reinforcement learning: an analysis of shaping. In Proceedings of the Twentieth International Conference on Machine Learning, pages 440–447, 2003. A. Lazaric, M. Restelli, and A. Bonarini. Transfer of samples in batch reinforcement learning. In Proceedings of the Twenty-Fifth International Conference on Machine Learning, pages 544–551, 2008. S. Mannor, I. Menache, A. Hoze, and U. Klein. Dynamic abstraction in reinforcement learning via clustering. In Proceedings of the Twenty First International Conference on Machine Learning, pages 560–567, 2004. M.J. Matari´ . Reinforcement learning in the multi-robot domain. Autonomous Robots, 4(1):73–83, c 1997. A. McGovern and A.G. Barto. Automatic discovery of subgoals in reinforcement learning using diverse density. In Proceedings of the Eighteenth International Conference on Machine Learning, pages 361–368, 2001. N. Mehta, S. Ray, P. Tadepalli, and T. Dietterich. Automatic discovery and transfer of MAXQ hierarchies. In Proceedings of the Twenty Fifth International Conference on Machine Learning, 2008. A.W. Moore and C.G. Atkeson. Prioritized sweeping: Reinforcement learning with less data and less time. Machine Learning, 13(1):103–130, 1993. 1369 KONIDARIS , S CHEIDWASSER AND BARTO A.Y. Ng, D. Harada, and S. Russell. Policy invariance under reward transformations: theory and application to reward shaping. In Proceedings of the 16th International Conference on Machine Learning, pages 278–287, 1999. R. Parr and S. Russell. Reinforcement learning with hierarchies of machines. In Advances in Neural Information Processing Systems 10, pages 1043–1049, 1997. T.J. Perkins and D. Precup. Using options for knowledge transfer in reinforcement learning. Technical Report UM-CS-1999-034, Department of Computer Science, University of Massachusetts, Amherst, 1999. M. Pickett and A.G. Barto. Policyblocks: An algorithm for creating useful macro-actions in reinforcement learning. In Proceedings of the Nineteenth International Conference of Machine Learning, pages 506–513, 2002. D. Precup, R.S. Sutton, and S. Singh. Eligibility traces for off-policy policy evaluation. In Proceedings of the Seventeenth International Conference on Machine Learning, pages 759–766, 2000. M.L. Puterman. Markov Decision Processes. Wiley, 1994. J. Randløv and P. Alstrøm. Learning to drive a bicycle using reinforcement learning and shaping. In Proceedings of the 15th International Conference on Machine Learning, pages 463–471, 1998. B. Ravindran and A.G. Barto. Relativized options: Choosing the right transformation. In Proceedings of the Twentieth International Conference on Machine Learning, pages 608–615, 2003a. B. Ravindran and A.G. Barto. SMDP homomorphisms: An algebraic approach to abstraction in semi markov decision processes. In Proceedings of the Eighteenth International Joint Conference on Artiﬁcial Intelligence, pages 1011–1016, 2003b. O. Selfridge, R. S. Sutton, and A. G. Barto. Training and tracking in robotics. In Proceedings of the Ninth International Joint Conference on Artiﬁcial Intelligence, pages 670–672, 1985. ¨ ¸ ¸ O. Simsek and A. G. Barto. Using relative novelty to identify useful temporal abstractions in reinforcement learning. In Proceedings of the Twenty-First International Conference on Machine Learning, pages 751–758, 2004. ¨ ¸ ¸ O. Simsek, A. P. Wolfe, and A. G. Barto. Identifying useful subgoals in reinforcement learning by local graph partitioning. In Proceedings of the Twenty-Second International Conference on Machine Learning, 2005. S. Singh, A.G. Barto, and N. Chentanez. Intrinsically motivated reinforcement learning. In Proceedings of the 18th Annual Conference on Neural Information Processing Systems, 2004. B. F. Skinner. The Behavior of Organisms: An Experimental Analysis. Appleton-Century-Crofts, New York, 1938. M. Snel and S. Whiteson. Multi-task evolutionary shaping without pre-speciﬁed representations. In Proceedings of the Genetic and Evolutionary Computation Conference, pages 1031–1038, 2010. 1370 T RANSFER IN R EINFORCEMENT L EARNING VIA S HARED F EATURES P. Stone, R.S. Sutton, and G. Kuhlmann. Reinforcement learning for robocup soccer keepaway. Adaptive Behavior, 13(3):165–188, 2005. R.S. Sutton and A.G. Barto. Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA, 1998. R.S. Sutton, D. Precup, and S.P. Singh. Intra-option learning about temporally abstract actions. In Proceedings of the Fifteenth International Conference on Machine Learning, pages 556–564, 1998. R.S. Sutton, D. Precup, and S.P. Singh. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artiﬁcial Intelligence, 112(1-2):181–211, 1999. M.E. Taylor and P. Stone. Value functions for RL-based behavior transfer: a comparative study. In Proceedings of the Twentieth National Conference on Artiﬁcial Intelligence, 2005. M.E. Taylor, P. Stone, and Y. Liu. Transfer learning via inter-task mappings for temporal difference learning. Journal of Machine Learning Research, 8:2125–2167, 2007. M.E. Taylor, G. Kuhlmann, and P. Stone. Autonomous transfer for reinforcement learning. In Proceedings of the Seventh International Conference on Autonomous Agents and Multiagent Systems, 2008. S. Thrun and A. Schwartz. Finding structure in reinforcement learning. In Advances in Neural Information Processing Systems, volume 7, pages 385–392. The MIT Press, 1995. L. Torrey, J. Shavlik, T. Walker, and R. Maclin. Skill acquisition via transfer learning and advice taking. In Proceedings of the Seventeenth European Conference on Machine Learning, pages 425–436, 2006. J. Weng, J. McClelland, A. Pentland, O. Sporns, I. Stockman, M. Sur, and E. Thelen. Autonomous mental development by robots and animals. Science, 291(5504):599–600, 2000. E. Wiewiora. Potential-based shaping and Q-value initialization are equivalent. Journal of Artiﬁcial Intelligence Research, 19:205–208, 2003. E. Wiewiora, G. Cottrell, and C. Elkan. Principled methods for advising reinforcement learning agents. In Proceedings of the Twentieth International Conference on Machine Learning, pages 792–799, 2003. A. Wilson, A. Fern, S. Ray, and P. Tadepalli. Multi-task reinforcement learning: a hierarchical bayesian approach. In Proceedings of the 24th International Conference on Machine Learning, pages 1015–1022, 2007. W. Zhang and T.G. Dietterich. A reinforcement learning approach to job-shop scheduling. In Proceedings of the Fourteenth International Conference on Artiﬁcial Intelligence, pages 1114– 1120, 1995. 1371