nips nips2013 nips2013-278 nips2013-278-reference knowledge-graph by maker-knowledge-mining

278 nips-2013-Reward Mapping for Transfer in Long-Lived Agents

Source: pdf

Author: Xiaoxiao Guo, Satinder Singh, Richard L. Lewis

Abstract: We consider how to transfer knowledge from previous tasks (MDPs) to a current task in long-lived and bounded agents that must solve a sequence of tasks over a ﬁnite lifetime. A novel aspect of our transfer approach is that we reuse reward functions. While this may seem counterintuitive, we build on the insight of recent work on the optimal rewards problem that guiding an agent’s behavior with reward functions other than the task-specifying reward function can help overcome computational bounds of the agent. Speciﬁcally, we use good guidance reward functions learned on previous tasks in the sequence to incrementally train a reward mapping function that maps task-specifying reward functions into good initial guidance reward functions for subsequent tasks. We demonstrate that our approach can substantially improve the agent’s performance relative to other approaches, including an approach that transfers policies. 1

reference text

[1] Christopher G. Atkeson and Juan Carlos Santamaria. A comparison of direct and model-based reinforcement learning. In International Conference on Robotics and Automation, pages 3557–3564, 1997.

[2] Peter L Bartlett and Jonathan Baxter. Stochastic optimization of controlled partially observable markov decision processes. In Proceedings of the 39th IEEE Conference on Decision and Control., volume 1, pages 124–129, 2000.

[3] Urszula Chajewska, Daphne Koller, and Ronald Parr. Making rational decisions using adaptive utility elicitation. In Proceedings of the Seventeenth National Conference on Artiﬁcial Intelligence, pages 363– 369, 2000.

[4] Levente Kocsis and Csaba Szepesv´ ri. Bandit based monte-carlo planning. In Machine Learning: ECML, a pages 282–293. Springer, 2006.

[5] George Konidaris and Andrew Barto. Autonomous shaping: Knowledge transfer in reinforcement learning. In Proceedings of the 23rd International Conference on Machine learning, pages 489–496, 2006.

[6] George Konidaris and Andrew G Barto. Building portable options: Skill transfer in reinforcement learning. In Proceedings of the 20th International Joint Conference on Artiﬁcial Intelligence, volume 2, pages 895–900, 2007.

[7] Alessandro Lazaric, Marcello Restelli, and Andrea Bonarini. Transfer of samples in batch reinforcement learning. In Proceedings of the 25th International Conference on Machine learning, pages 544–551, 2008.

[8] Yaxin Liu and Peter Stone. Value-function-based transfer for reinforcement learning using structure mapping. In Proceedings of the Twenty-First National Conference on Artiﬁcial Intelligence, volume 21(1), page 415, 2006.

[9] Sriraam Natarajan and Prasad Tadepalli. Dynamic preferences in multi-criteria reinforcement learning. In Proceedings of the 22nd International Conference on Machine learning, 2005.

[10] Andrew Y. Ng, Daishi Harada, and Stuart Russell. Policy invariance under reward transformations: Theory and application to reward shaping. In Proceedings of the Sixteenth International Conference on Machine Learning, pages 278–287, 1999.

[11] Andrew Y Ng and Stuart Russell. Algorithms for inverse reinforcement learning. In Proceedings of the Seventeenth International Conference on Machine Learning, pages 663–670, 2000.

[12] Theodore J Perkins and Doina Precup. Using options for knowledge transfer in reinforcement learning. University of Massachusetts, Amherst, MA, USA, Tech. Rep, 1999.

[13] Satinder Singh, Richard L Lewis, Andrew G Barto, and Jonathan Sorg. Intrinsically motivated reinforcement learning: An evolutionary perspective. IEEE Transactions on Autonomous Mental Development., 2(2):70–82, 2010.

[14] Jonathan Sorg, Satinder Singh, and Richard L Lewis. Reward design via online gradient ascent. Advances of Neural Information Processing Systems, 23, 2010.

[15] Jonathan Sorg, Satinder Singh, and Richard L Lewis. Optimal rewards versus leaf-evaluation heuristics in planning agents. In Proceedings of the Twenty-Fifth Conference on Artiﬁcial Intelligence, 2011.

[16] Fumihide Tanaka and Masayuki Yamamura. Multitask reinforcement learning on the distribution of mdps. In Proceedings IEEE International Symposium on Computational Intelligence in Robotics and Automation., volume 3, pages 1108–1113, 2003.

[17] Matthew E Taylor, Nicholas K Jong, and Peter Stone. Transferring instances for model-based reinforcement learning. In Machine Learning and Knowledge Discovery in Databases, pages 488–505. 2008.

[18] Matthew E Taylor and Peter Stone. Transfer learning for reinforcement learning domains: A survey. The Journal of Machine Learning Research, 10:1633–1685, 2009.

[19] Matthew E Taylor, Shimon Whiteson, and Peter Stone. Transfer via inter-task mappings in policy search reinforcement learning. In Proceedings of the 6th International Joint Conference on Autonomous Agents and Multiagent Systems, page 37, 2007.

[20] Lisa Torrey and Jude Shavlik. Policy transfer via Markov logic networks. In Inductive Logic Programming, pages 234–248. Springer, 2010.

[21] Christopher JCH Watkins and Peter Dayan. Q-learning. Machine learning, 8(3-4):279–292, 1992. 9