nips nips2011 nips2011-48 nips2011-48-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Thomas J. Walsh, Daniel K. Hewlett, Clayton T. Morrison
Abstract: We present theoretical and empirical results for a framework that combines the benefits of apprenticeship and autonomous reinforcement learning. Our approach modifies an existing apprenticeship learning framework that relies on teacher demonstrations and does not necessarily explore the environment. The first change is replacing previously used Mistake Bound model learners with a recently proposed framework that melds the KWIK and Mistake Bound supervised learning protocols. The second change is introducing a communication of expected utility from the student to the teacher. The resulting system only uses teacher traces when the agent needs to learn concepts it cannot efficiently learn on its own. 1
[1] Pieter Abbeel and Andrew Y. Ng. Exploration and apprenticeship learning in reinforcement learning. In ICML, 2005.
[2] Richard S. Sutton and Andrew G. Barto. Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA, March 1998.
[3] Thomas J. Walsh, Kaushik Subramanian, Michael L. Littman, and Carlos Diuk. Generalizing apprenticeship learning across hypothesis classes. In ICML, 2010.
[4] Pieter Abbeel and Andrew Y. Ng. Apprenticeship learning via inverse reinforcement learning. In ICML, 2004.
[5] Nathan Ratliff, David Silver, and J. Bagnell. Learning to search: Functional gradient techniques for imitation learning. Autonomous Robots, 27:25–53, 2009.
[6] Amin Sayedi, Morteza Zadimoghaddam, and Avrim Blum. Trading off mistakes and don’tknow predictions. In NIPS, 2010.
[7] Lihong Li, Michael L. Littman, Thomas J. Walsh, and Alexander L. Strehl. Knows what it knows: A framework for self-aware learning. Machine Learning, 82(3):399–443, 2011.
[8] Alexander L. Strehl, Lihong Li, and Michael L. Littman. Reinforcement learning in finite MDPs: PAC analysis. Journal of Machine Learning Research, 10:2413–2444, 2009.
[9] Nick Littlestone. Learning quickly when irrelevant attributes abound. Machine Learning, 2:285–318, 1988.
[10] Dana Angluin. Queries and concept learning. Machine Learning, 2(4):319–342, 1988.
[11] Lev Vygotsky. Interaction between learning and development. In Mind In Society. Harvard University Press, Cambridge, MA, 1978.
[12] Michael J. Kearns, Yishay Mansour, and Andrew Y. Ng. Approximate planning in large pomdps via reusable trajectories. In NIPS, 1999.
[13] Kshitij Judah, Saikat Roy, Alan Fern, and Thomas G. Dietterich. Reinforcement learning via practice and critique advice. In AAAI, 2010.
[14] W. Bradley Knox and Peter Stone. Combining manual feedback with subsequent mdp reward signals for reinforcement learning. In AAMAS, 2010.
[15] Andrea Lockerd Thomaz and Cynthia Breazeal. Teachable robots: Understanding human teaching behavior to build more effective robot learners. Artificial Intelligence, 172(6-7):716– 737, 2008.
[16] William D. Smart and Leslie Pack Kaelbling. Effective reinforcement learning for mobile robots. In ICRA, 2002.
[17] Sonia Chernova and Manuela Veloso. Interactive policy learning through confidence-based autonomy. Journal of Artificial Intelligence Research, 34(1):1–25, 2009. 9