nips nips2001 nips2001-51 nips2001-51-reference knowledge-graph by maker-knowledge-mining

51 nips-2001-Cobot: A Social Reinforcement Learning Agent

Source: pdf

Author: Charles Lee Isbell Jr., Christian R. Shelton

Abstract: We report on the use of reinforcement learning with Cobot, a software agent residing in the well-known online community LambdaMOO. Our initial work on Cobot (Isbell et al.2000) provided him with the ability to collect social statistics and report them to users. Here we describe an application of RL allowing Cobot to take proactive actions in this complex social environment, and adapt behavior from multiple sources of human reward. After 5 months of training, and 3171 reward and punishment events from 254 different LambdaMOO users, Cobot learned nontrivial preferences for a number of users, modiﬁng his behavior based on his current state. Here we describe LambdaMOO and the state and action spaces of Cobot, and report the statistical results of the learning experiment. 1

reference text

Foner, L. (1997). Entertaining Agents: a Sociological Case Study. In Proceedings of the First International Conference on Autonomous Agents. Isbell, C. L., Kearns, M., Kormann, D., Singh, S., and Stone, P. (2000). Cobot in LambdaMOO: A Social Statistics Agent. To appear in Proceedings of AAAI-2000. Mauldin, M. (1994). Chatterbots, TinyMUDs, and the Turing Test: Entering the Loebner Prize Competition. In Proceedings of the Twelfth National Conference on Artiﬁcial Intelligence. Shelton, C. R. (2000). Balancing Multiple Sources of Reward in Reinforcement Learning. Submitted for publication in Neural Information Processing Systems-2000. Singh, S., Kearns, M., Littman, D., and Walker, M. (2000). Empirical Evaluation of a Reinforcement Learning Dialogue System. To appear in Proceedings of AAAI-2000. Sutton, R. S. and Barto, A. G. (1998). Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA. Sutton, R. S., McAllester, D., Singh, S., and Mansour, Y. (1999). Policy gradient methods for reinforcement learning with function approximation. In Neural Information Processing Systems1999.