nips nips2001 nips2001-126 nips2001-126-reference knowledge-graph by maker-knowledge-mining

126 nips-2001-Motivated Reinforcement Learning


Source: pdf

Author: Peter Dayan

Abstract: The standard reinforcement learning view of the involvement of neuromodulatory systems in instrumental conditioning includes a rather straightforward conception of motivation as prediction of sum future reward. Competition between actions is based on the motivating characteristics of their consequent states in this sense. Substantial, careful, experiments reviewed in Dickinson & Balleine, 12,13 into the neurobiology and psychology of motivation shows that this view is incomplete. In many cases, animals are faced with the choice not between many different actions at a given state, but rather whether a single response is worth executing at all. Evidence suggests that the motivational process underlying this choice has different psychological and neural properties from that underlying action choice. We describe and model these motivational systems, and consider the way they interact.


reference text

[1] Adams, CD (1982) Variations in the sensitivity of instrumental responding to reinforcer devaluation. QJEP 34B:77-98.

[2] Baird, LC (1993) Advantage Updating. Technical report WL-TR-93-1146, Wright-Patterson Air Force Base.

[3] Balkenius, C (1995) Natural Intelligence in Artificial Creatures. PhD Thesis, Department of Cognitive Science, Lund University, Sweden.

[4] Balleine, BW & Dickinson, A (1998) Goal-directed instrumental action: Contingency and incentive learning and their cortical substrates. Neuropharmacology 37:407-419.

[5] Balleine, BW, Garner, C, Gonzalez, F & Dickinson, A (1995) Motivational control of heterogeneous instrumental chains. Journal of Experimental Psychology: Animal Behavior Processes 21:203-217.

[6] Barto, AG, Sutton, RS & Anderson, CW (1983) Neuronlike elements that can solve difficult learning problems. IEEE SMC 13:834-846.

[7] Berridge, KC (2000) Reward learning: Reinforcement, incentives, and expectations. In DL Medin, editor, The Psychology of Learning and Motivation 40:223-278.

[8] Berridge, KC & Schulkin, J (1989) Palatability shift of a salt-associated incentive during sodium depletion. Quarterly Journal of Experimental Psychology: Comparative & Physiological Psychology 41:121-138.

[9] Braver, TS, Barch, DM & Cohen, JD (1999) Cognition and control in schizophrenia: A computational model of dopamine and prefrontal function. Biological Psychiatry 46:312-328.

[10] Corbit, LH, Muir, JL & Balleine, BW (200l) The role of the nucleus accumbens in instrumental conditioning: Evidence of a functional dissociation between accumbens core and shell. Journal of Neuroscience 21:3251-3260.

[11] Dayan, P (1993) Improving generalisation for temporal difference learning: The successor representation. Neural Computation 5:6l3-624.

[12] Dickinson, A & Balleine, B (1994) Motivational control of goal-directed action. Animal Learning & Behavior 22:1-18. [l3] Dickinson, A & Balleine, B (200l) The role of learning in motivation. In CR Gallistel, editor, Learning, Motivation and Emotion, Volume 3 of Steven's Handbook of Experimental Psychology, Third Edition. New York, NY: Wiley.

[14] Dickinson, A, Balleine, B, Watt, A, Gonzalez, F & Boakes, RA (1995) Motivational control after extended instrumental training. Animal Learning & Behavior 23:197-206.

[15] Estes, WK (1943). Discriminative conditioning. I. A discriminative property of conditioned anticipation. JEP 32:150-155.

[16] Holland, PC & Gallagher, M (1999) Amygdala circuitry in attentional and representational processes. Trends in Cognitive Sciences 3:65-73.

[17] Holland, PC & Rescorla, RA (1975) The effect of two ways of devaluing the unconditioned stimulus after first- and second-order appetitive conditioning. Journal of Experimental Psychology: Animal Behavior Processes 1:355-363.

[18] Konorski, J (1948) Conditioned Reflexes and Neuron Organization. Cam- bridge, England: Cambridge University Press.

[19] Konorski, J (1967) Integrative Activity of the Brain: An Interdisciplinary Approach. Chicago, 11: University of Chicago Press.

[20] Mackintosh, NJ (1974) The Psychology of Animal Learning. New York, NY: Academic Press. TJ (1995) Bee foraging in uncertain environments using predictive hebbian learning. Nature

[21] Montague, PR, Dayan, P, Person, C & Sejnowski 377:725-728.

[22] Montague, PR, Dayan, P & Sejnowski, TJ (1996) A framework for mesencephalic dopamine systems based on predictive Hebbian learning. Journal of Neuroscience 16: 1936-1947.

[23] Schoenbaum, G, Chiba, AA & Gallagher, M (1999) Neural encoding in or-

[24]

[25]

[26]

[27] bitofrontal cortex and basolateral amygdala during olfactory discrimination learning. Journal of Neuroscience 19:1876-1884. Schultz, W, Dayan, P & Montague, PR (1997) A neural substrate of prediction and reward. Science 275:1593-1599. Spier, E (1997) From Reactive Behaviour to Adaptive Behaviour. PhD Thesis, Balliol College, Oxford. Sutton, RS (1995) TD models: modeling the world at a mixture of time scales. In A Prieditis & S Russell, editors, Proceedings of the Twelfth International Conference on Machine Learning. San Francisco, CA: Morgan Kaufmann, 531-539. Sutton, RS & Barto, AG (1981) An adaptive network that constructs and uses an internal model of its world. Cognition and Brain Theory 4:217- 246.

[28] Sutton, RS & Barto, AG (1998) Reinforcement Learning. Cambridge, MA: MIT Press.

[29] Sutton, RS & Pinette, B (1985) The learning of world models by connectionist networks. Proceedings of the Seventh Annual Conference of the Cognitive Science Society. Irvine, CA: Lawrence Erlbaum, 54-64.

[30] Tolman, EC (1938) The determiners of behavior at a choice point. Psychological Review 45:1-41.

[31] Watkins, CJCH (1989) Learning from Delayed Rewards. PhD Thesis, Uni- versity of Cambridge, Cambridge, UK.