nips nips2004 nips2004-33 nips2004-33-reference knowledge-graph by maker-knowledge-mining

33 nips-2004-Brain Inspired Reinforcement Learning

Source: pdf

Author: Françcois Rivest, Yoshua Bengio, John Kalaska

Abstract: Successful application of reinforcement learning algorithms often involves considerable hand-crafting of the necessary non-linear features to reduce the complexity of the value functions and hence to promote convergence of the algorithm. In contrast, the human brain readily and autonomously finds the complex features when provided with sufficient training. Recent work in machine learning and neurophysiology has demonstrated the role of the basal ganglia and the frontal cortex in mammalian reinforcement learning. This paper develops and explores new reinforcement learning algorithms inspired by neurological evidence that provides potential new approaches to the feature construction problem. The algorithms are compared and evaluated on the Acrobot task. 1

reference text

[1] Foster, D. & Dayan, P. (2002) Structure in the space of value functions. Machine Learning 49(2):325-346.

[2] Tsitsiklis, J.N. & Van Roy, B. (1996) Featured-based methods for large scale dynamic programming. Machine Learning 22:59-94.

[3] Sutton, R.S., McAllester, D., Singh, S. & Mansour, Y. (2000) Policy gradient methods for reinforcement learning with function approximation. Advances in Neural Information Processing Systems 12, pp. 1057-1063. MIT Press.

[4] Barto A.G. (1995) Adaptive critics and the basal ganglia. In Models of Information Processing in the Basal Ganglia, pp.215-232. Cambridge, MA: MIT Press.

[5] Suri, R.E. & Schultz, W. (1999) A neural network model with dopamine-like reinforcement signal that learns a spatial delayed response task. Neuroscience 91(3):871-890.

[6] Suri, R.E. & Schultz, W. (2001) Temporal difference model reproduces anticipatory neural activity. Neural Computation 13:841-862.

[7] Doi, E., Inui, T., Lee, T.-W., Wachtler, T. & Sejnowski, T.J. (2003) Spatiochromatic receptive field properties derived from information-theoritic analysis of cone mosaic responses to natural scenes. Neural Computation 15:397-417.

[8] Sutton R.S. & Barto A.G. (1998) Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press.

[9] Doya K. (1999) What are the computations of the cerebellum, the basal ganglia and the cerebral cortex? Neural Networks 12:961-974.

[10] Foster, D.J., Morris, R.G.M., & Dayan, P. (2000) A model of hippocampally dependent navigation, using the temporal difference learning rule. Hippocampus 10:1-16.

[11] Wickens, J. & Kötter, R. (1995) Cellular models of reinforcement. In Models of Information Processing in the Basal Ganglia, pp.187-214. Cambridge, MA: MIT Press.

[12] Whiteson, S. & Stone, P. (2003) Concurrent layered learning. In Proceedings of the 2 nd Internaltional Joint Conference on Autonomous Agents & Multi-agent Systems.

[13] Amari, S-I (1999) Natural gradient learning for over- and under-complete bases in ICA. Neural Computatio n 11:1875-1883.