nips nips2002 nips2002-128 nips2002-128-reference knowledge-graph by maker-knowledge-mining

128 nips-2002-Learning a Forward Model of a Reflex

Source: pdf

Author: Bernd Porr, Florentin Wörgötter

Abstract: We develop a systems theoretical treatment of a behavioural system that interacts with its environment in a closed loop situation such that its motor actions inﬂuence its sensor inputs. The simplest form of a feedback is a reﬂex. Reﬂexes occur always “too late”; i.e., only after a (unpleasant, painful, dangerous) reﬂex-eliciting sensor event has occurred. This deﬁnes an objective problem which can be solved if another sensor input exists which can predict the primary reﬂex and can generate an earlier reaction. In contrast to previous approaches, our linear learning algorithm allows for an analytical proof that this system learns to apply feedforward control with the result that slow feedback loops are replaced by their equivalent feed-forward controller creating a forward model. In other words, learning turns the reactive system into a pro-active system. By means of a robot implementation we demonstrate the applicability of the theoretical results which can be used in a variety of different areas in physics and engineering.

reference text

[1] Daniel M. Wolpert and Zoubin Ghahramani. Computational principles of movement neuroscience. Nature Neuroscience supplement, 3:1212–1217, 2000.

[2] P. Read Montague, Peter Dayan, and Terrence J. Sejnowski. Bee foraging in uncertain environments using predictive hebbian learning. Nature, 377:725–728, 1995.

[3] W.E Sollecito and S.G Reque. Stability. In Jerry Fitzgerald, editor, Fundamentals of System Analysis, chapter 21. Wiley, New York, 1981.

[4] R.S. Sutton and A.G. Barto. Towards a modern theory of adaptive networks: expectation and prediction. Psychol. Review, 88:135–170, 1981.

[5] John L. Stewart. Fundamentals of signal theory. Mc Graw-Hill, New York, 1960.

[6] Gordon M. Shepherd, editor. The synaptic organisation of the brain. Oxford University Press, New York, 1990.

[7] Steven Grossberg. A spectral network model of pitch perception. J Acoust Soc Am, 98(2):862–879, 1995.

[8] P.F.M.J Verschure and T. Voegtlin. A bottom-up approach towards the aquisition, retention, and expression of sequential representations: Distributed adaptive control III. Neural Networks, 11:1531–1549, 1998.

[9] William J. Palm. Modeling, Analysis and Control of Dynamic Systems. Wiley, New York, 2000.

[10] R.S. Sutton. Learning to predict by method of temporal differences. Machine learning, 3(1):9–44, 1988.

[11] R.S. Sutton and A.G. Barto. Simulation of anticipatory responses in classical conditioning by a neuron-like adaptive element. Behav. Brain. Res., 4(3):221–235, 1982.

[12] R.A. Rescorla and A.R. Wagner. A theory of pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. In A.H Black and W.F. Prokasy, editors, Classical conditioning 2, current theory and research, pages 64–99. ACC, New York, 1972.

[13] A. Harry Klopf. A drive-reinforcement model of single neuron function. In John S. Denker, editor, Neural Networks for computing: AIP conference proceedings, volume 151 of AIP conference proceedings, New York, 1986. American Institute of Physics.

[14] Christofer J.C.H Watkins and Peter Dayan. Q-learning. Machine Learning, 8:279– 292, 1992.