nips nips2009 nips2009-99 nips2009-99-reference knowledge-graph by maker-knowledge-mining

99 nips-2009-Functional network reorganization in motor cortex can be explained by reward-modulated Hebbian learning


Source: pdf

Author: Steven Chase, Andrew Schwartz, Wolfgang Maass, Robert A. Legenstein

Abstract: The control of neuroprosthetic devices from the activity of motor cortex neurons benefits from learning effects where the function of these neurons is adapted to the control task. It was recently shown that tuning properties of neurons in monkey motor cortex are adapted selectively in order to compensate for an erroneous interpretation of their activity. In particular, it was shown that the tuning curves of those neurons whose preferred directions had been misinterpreted changed more than those of other neurons. In this article, we show that the experimentally observed self-tuning properties of the system can be explained on the basis of a simple learning rule. This learning rule utilizes neuronal noise for exploration and performs Hebbian weight updates that are modulated by a global reward signal. In contrast to most previously proposed reward-modulated Hebbian learning rules, this rule does not require extraneous knowledge about what is noise and what is signal. The learning rule is able to optimize the performance of the model system within biologically realistic periods of time and under high noise levels. When the neuronal noise is fitted to experimental data, the model produces learning effects similar to those found in monkey experiments.


reference text

[1] B. Jarosiewicz, S. M. Chase, G. W. Fraser, M. Velliste, R. E. Kass, and A. B. Schwartz. Functional network reorganization during learning in a brain-computer interface paradigm. Proc. Nat. Acad. Sci. USA, 105(49):19486–91, 2008.

[2] A. P. Georgopoulos, R. E. Ketner, and A. B. Schwartz. Primate motor cortex and free arm movements to visual targets in three- dimensional space. ii. coding of the direction of movement by a neuronal population. J. Neurosci., 8:2928–2937, 1988.

[3] A. B. Schwartz. Useful signals from motor cortex. J. Physiology, 579:581–601, 2007.

[4] Y. Loewenstein and H. S. Seung. Operant matching is a generic outcome of synaptic plasticity based on the covariance between reward and neural activity. Proc. Nat. Acad. Sci. USA, 103(41):15224–15229, 2006.

[5] A. G. Barto, R. S. Sutton, and C. W. Anderson. Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Trans. Syst. Man Cybern., SMC-13(5):834–846, 1983.

[6] P. Mazzoni, R. A. Andersen, and M. I. Jordan. A more biologically plausible learning rule for neural networks. Proc. Nat. Acad. Sci. USA, 88(10):4433–4437, 1991.

[7] R. J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8:229–256, 1992.

[8] J. Baxter and P. L. Bartlett. Direct gradient-based reinforcement learning: I. gradient estimation algorithms. Technical report, Research School of Information Sciences and Engineering, Australian National University, 1999.

[9] X. Xie and H. S. Seung. Learning in neural networks by reinforcement of irregular spiking. Phys. Rev. E, 69(041909), 2004.

[10] I. R. Fiete and H. S. Seung. Gradient learning in spiking neural networks by dynamic perturbation of conductances. Phys. Rev. Lett., 97(4):048104–1 to 048104–4, 2006.

[11] J.-P. Pfister, T. Toyoizumi, D. Barber, and W. Gerstner. Optimal spike-timing-dependent plasticity for precise action potential firing in supervised learning. Neural Computation, 18(6):1318–1348, 2006.

[12] E. M. Izhikevich. Solving the distal reward problem through linkage of STDP and dopamine signaling. Cerebral Cortex, 17:2443–2452, 2007.

[13] D. Baras and R. Meir. Reinforcement learning, spike-time-dependent plasticity, and the bcm rule. Neural Computation, 19(8):2245–2279, 2007.

[14] R. V. Florian. Reinforcement learning through modulation of spike-timing-dependent synaptic plasticity. Neural Computation, 6:1468–1502, 2007.

[15] M. A. Farries and A. L. Fairhall. Reinforcement learning with modulated spike timing-dependent synaptic plasticity. J. Neurophys., 98:3648–3665, 2007.

[16] R. Legenstein, D. Pecevski, and W. Maass. A learning theory for reward-modulated spike-timingdependent plasticity with application to biofeedback. PLoS Computational Biology, 4(10):1–27, 2008.

[17] C. H. Bailey, M. Giustetto, Y.-Y. Huang, R. D. Hawkins, and E. R. Kandel. Is heterosynaptic modulation essential for stabilizing Hebbian plasticity and memory? Nat. Rev. Neurosci., 1:11–20, 2000.

[18] Q. Gu. Neuromodulatory transmitter systems in the cortex and their role in cortical plasticity. Neuroscience, 111(4):815–835, 2002.

[19] Samuel J. Sober, Melville J. Wohlgemuth, and Michael S. Brainard. Central contributions to acoustic variation in birdsong. J. Neurosci., 28(41):10370–9, 2008.

[20] E. C. Tumer and M. S. Brainard. Performance variability enables adaptive plasticity of ‘crystallized’ adult birdsong. Nature, 250(7173):1240–1244, 2007.

[21] A. P. Georgopoulos, A. P. Schwartz, and R. E. Ketner. Neuronal population coding of movement direction. Science, 233:1416–1419, 1986.

[22] J. Baxter and P. L. Bartlett. Infinite-horizon policy-gradient estimation. J. Artif. Intell. Res., 15:319–350, 2001.

[23] R. Legenstein, S. M. Chase, A. B. Schwartz, and W. Maass. A reward-modulated hebbian learning rule can explain experimentally observed network reorganization in a brain control task. Submitted for publication, 2009.

[24] U. Rokni, A G. Richardson, E. Bizzi, and H. S. Seung. Motor learning with unstable neural representations. Neuron, 54:653–666, 2007. 9