nips nips2009 nips2009-52 nips2009-52-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Henning Sprekeler, Guillaume Hennequin, Wulfram Gerstner
Abstract: Although it is widely believed that reinforcement learning is a suitable tool for describing behavioral learning, the mechanisms by which it can be implemented in networks of spiking neurons are not fully understood. Here, we show that different learning rules emerge from a policy gradient approach depending on which features of the spike trains are assumed to influence the reward signals, i.e., depending on which neural code is in effect. We use the framework of Williams (1992) to derive learning rules for arbitrary neural codes. For illustration, we present policy-gradient rules for three different example codes - a spike count code, a spike timing code and the most general “full spike train” code - and test them on simple model problems. In addition to classical synaptic learning, we derive learning rules for intrinsic parameters that control the excitability of the neuron. The spike count learning rule has structural similarities with established Bienenstock-Cooper-Munro rules. If the distribution of the relevant spike train features belongs to the natural exponential family, the learning rules have a characteristic shape that raises interesting prediction problems. 1
[1] Baxter, J. and Bartlett, P. (2001). Infinite-horizon policy-gradient estimation. Journal of Artificial Intelligence Research, 15(4):319–350.
[2] Bienenstock, E., Cooper, L., and Munroe, P. (1982). Theory of the development of neuron selectivity: orientation specificity and binocular interaction in visual cortex. Journal of Neuroscience, 2:32–48. reprinted in Anderson and Rosenfeld, 1990.
[3] Florian, R. V. (2007). Reinforcement learning through modulation of spike-timing-dependent synaptic plasticity. Neural Computation, 19:1468–1502.
[4] Greensmith, E., Bartlett, P., and Baxter, J. (2004). Variance reduction techniques for gradient estimates in reinforcement learning. The Journal of Machine Learning Research, 5:1471–1530.
[5] Pfister, J.-P., Toyoizumi, T., Barber, D., and Gerstner, W. (2006). Optimal spike-timing dependent plasticity for precise action potential firing in supervised learning. Neural Computation, 18:1309–1339.
[6] Rao, R. P. and Ballard, D. H. (1999). Predictive coding in the visual cortex: A functional interpretation of some extra-classical receptive-field effects. Nature Neuroscience, 2(1):79–87.
[7] Schultz, W., Dayan, P., and Montague, R. (1997). A neural substrate for prediction and reward. Science, 275:1593–1599.
[8] Schwartz, G., Harris, R., Shrom, D., and II, M. (2007). Detection and prediction of periodic patterns by the retina. Nature Neuroscience, 10:552–554.
[9] Sutton, R. and Barto, A. (1998). Reinforcement learning. MIT Press, Cambridge.
[10] Triesch, J. (2007). Synergies between intrinsic and synaptic plasticity mechanisms. Neural computation, 19:885 –909. 8
[11] Urbanczik, R. and Senn, W. (2009). Reinforcement learning in populations of spiking neurons. Nat Neurosci, 12(3):250–252.
[12] Williams, R. (1992). Simple statistical gradient-following methods for connectionist reinforcement learning. Machine Learning, 8:229–256.
[13] Xie, X. and Seung, H. (2004). Learning in neural networks by reinforcement of irregular spiking. Physical Review E, 69(4):41909. 9