nips nips2009 nips2009-218 nips2009-218-reference knowledge-graph by maker-knowledge-mining

218 nips-2009-Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining


Source: pdf

Author: George Konidaris, Andre S. Barreto

Abstract: We introduce a skill discovery method for reinforcement learning in continuous domains that constructs chains of skills leading to an end-of-task reward. We demonstrate experimentally that it creates appropriate skills and achieves performance benefits in a challenging continuous domain. 1


reference text

[1] A.G. Barto and S. Mahadevan. Recent advances in hierarchical reinforcement learning. Discrete Event Systems, 13:41–77, 2003. Special Issue on Reinforcement Learning. 8

[2] R.S. Sutton, D. Precup, and S.P. Singh. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112(1-2):181–211, 1999.

[3] S. Singh, A.G. Barto, and N. Chentanez. Intrinsically motivated reinforcement learning. In Proceedings of the 18th Annual Conference on Neural Information Processing Systems, 2004.

[4] R.S. Sutton and A.G. Barto. Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA, 1998.

[5] B.L. Digney. Learning hierarchical control structures for multiple tasks and changing environments. In From Animals to Animats 5: Proceedings of the Fifth International Conference on Simulation of Adaptive Behavior. MIT Press, 1998.

[6] A. McGovern and A.G. Barto. Automatic discovery of subgoals in reinforcement learning using diverse density. In Proceedings of the 18th International Conference on Machine Learning, pages 361–368, 2001. ¨ ¸ ¸

[7] O. Simsek and A.G. Barto. Using relative novelty to identify useful temporal abstractions in reinforcement learning. In Proceedings of the 21st International Conference on Machine Learning, pages 751–758, 2004. ¨ ¸ ¸

[8] O. Simsek and A.G. Barto. Skill characterization based on betweenness. In Advances in Neural Information Processing Systems 22, 2009.

[9] I. Menache, S. Mannor, and N. Shimkin. Q-cut—dynamic discovery of sub-goals in reinforcement learning. In Proceedings of the 13th European Conference on Machine Learning, pages 295–306, 2002.

[10] S. Mannor, I. Menache, A. Hoze, and U. Klein. Dynamic abstraction in reinforcement learning via clustering. In Proceedings of the 21st International Conference on Machine Learning, pages 560–567, 2004. ¨ ¸ ¸

[11] O. Simsek, A.P. Wolfe, and A.G. Barto. Identifying useful subgoals in reinforcement learning by local graph partitioning. In Proceedings of the 22nd International Conference on Machine Learning, 2005.

[12] B. Hengst. Discovering hierarchy in reinforcement learning with HEXQ. In Proceedings of the 19th International Conference on Machine Learning, pages 243–250, 2002.

[13] A. Jonsson and A.G. Barto. A causal approach to hierarchical decomposition of factored MDPs. In Proceedings of the 22nd International Conference on Machine Learning, 2005.

[14] S. Thrun and A. Schwartz. Finding structure in reinforcement learning. In Advances in Neural Information Processing Systems, volume 7, pages 385–392. The MIT Press, 1995.

[15] D.S. Bernstein. Reusing old policies to accelerate learning on new MDPs. Technical Report UM-CS1999-026, Department of Computer Science, University of Massachusetts at Amherst, April 1999.

[16] T.J. Perkins and D. Precup. Using options for knowledge transfer in reinforcement learning. Technical Report UM-CS-1999-034, Department of Computer Science, University of Massachusetts Amherst, 1999.

[17] M. Pickett and A.G. Barto. Policyblocks: An algorithm for creating useful macro-actions in reinforcement learning. In Proceedings of the 19th International Conference of Machine Learning, pages 506–513, 2002.

[18] J. Mugan and B. Kuipers. Autonomously learning an action hierarchy using a learned qualitative state representation. In Proceedings of the 21st International Joint Conference on Artificial Intelligence, 2009.

[19] G. Neumann, W. Maass, and J. Peters. Learning complex motions by sequencing simpler motion templates. In Proceedings of the 26th International Conference on Machine Learning, 2009.

[20] R.R. Burridge, A.A. Rizzi, and D.E. Koditschek. Sequential composition of dynamically dextrous robot behaviors. International Journal of Robotics Research, 18(6):534–555, 1999.

[21] R. Tedrake. LQR-Trees: Feedback motion planning on sparse randomized trees. In Proceedings of Robotics: Science and Systems, 2009.

[22] G.D. Konidaris and A.G. Barto. Building portable options: Skill transfer in reinforcement learning. In Proceedings of the 20th International Joint Conference on Artificial Intelligence, 2007.

[23] G.D. Konidaris and S. Osentoski. Value function approximation in reinforcement learning using the Fourier basis. Technical Report UM-CS-2008-19, Department of Computer Science, University of Massachusetts Amherst, June 2008.

[24] S. Mahadevan. Learning representation and control in Markov Decision Processes: New frontiers. Foundations and Trends in Machine Learning, 1(4):403–565, 2009.

[25] G.D. Konidaris and A.G. Barto. Sensorimotor abstraction selection for efficient, autonomous robot skill acquisition. In Proceedings of the 7th IEEE International Conference on Development and Learning, 2008.

[26] G.D. Konidaris and A.G. Barto. Efficient skill learning using abstraction selection. In Proceedings of the 21st International Joint Conference on Artificial Intelligence, July 2009. 9