nips nips2011 nips2011-48 nips2011-48-reference knowledge-graph by maker-knowledge-mining

48 nips-2011-Blending Autonomous Exploration and Apprenticeship Learning

Source: pdf

Author: Thomas J. Walsh, Daniel K. Hewlett, Clayton T. Morrison

Abstract: We present theoretical and empirical results for a framework that combines the beneﬁts of apprenticeship and autonomous reinforcement learning. Our approach modiﬁes an existing apprenticeship learning framework that relies on teacher demonstrations and does not necessarily explore the environment. The ﬁrst change is replacing previously used Mistake Bound model learners with a recently proposed framework that melds the KWIK and Mistake Bound supervised learning protocols. The second change is introducing a communication of expected utility from the student to the teacher. The resulting system only uses teacher traces when the agent needs to learn concepts it cannot efﬁciently learn on its own. 1

reference text

[1] Pieter Abbeel and Andrew Y. Ng. Exploration and apprenticeship learning in reinforcement learning. In ICML, 2005.

[2] Richard S. Sutton and Andrew G. Barto. Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA, March 1998.

[3] Thomas J. Walsh, Kaushik Subramanian, Michael L. Littman, and Carlos Diuk. Generalizing apprenticeship learning across hypothesis classes. In ICML, 2010.

[4] Pieter Abbeel and Andrew Y. Ng. Apprenticeship learning via inverse reinforcement learning. In ICML, 2004.

[5] Nathan Ratliff, David Silver, and J. Bagnell. Learning to search: Functional gradient techniques for imitation learning. Autonomous Robots, 27:25–53, 2009.

[6] Amin Sayedi, Morteza Zadimoghaddam, and Avrim Blum. Trading off mistakes and don’tknow predictions. In NIPS, 2010.

[7] Lihong Li, Michael L. Littman, Thomas J. Walsh, and Alexander L. Strehl. Knows what it knows: A framework for self-aware learning. Machine Learning, 82(3):399–443, 2011.

[8] Alexander L. Strehl, Lihong Li, and Michael L. Littman. Reinforcement learning in ﬁnite MDPs: PAC analysis. Journal of Machine Learning Research, 10:2413–2444, 2009.

[9] Nick Littlestone. Learning quickly when irrelevant attributes abound. Machine Learning, 2:285–318, 1988.

[10] Dana Angluin. Queries and concept learning. Machine Learning, 2(4):319–342, 1988.

[11] Lev Vygotsky. Interaction between learning and development. In Mind In Society. Harvard University Press, Cambridge, MA, 1978.

[12] Michael J. Kearns, Yishay Mansour, and Andrew Y. Ng. Approximate planning in large pomdps via reusable trajectories. In NIPS, 1999.

[13] Kshitij Judah, Saikat Roy, Alan Fern, and Thomas G. Dietterich. Reinforcement learning via practice and critique advice. In AAAI, 2010.

[14] W. Bradley Knox and Peter Stone. Combining manual feedback with subsequent mdp reward signals for reinforcement learning. In AAMAS, 2010.

[15] Andrea Lockerd Thomaz and Cynthia Breazeal. Teachable robots: Understanding human teaching behavior to build more effective robot learners. Artiﬁcial Intelligence, 172(6-7):716– 737, 2008.

[16] William D. Smart and Leslie Pack Kaelbling. Effective reinforcement learning for mobile robots. In ICRA, 2002.

[17] Sonia Chernova and Manuela Veloso. Interactive policy learning through conﬁdence-based autonomy. Journal of Artiﬁcial Intelligence Research, 34(1):1–25, 2009. 9