nips nips2002 nips2002-128 knowledge-graph by maker-knowledge-mining

128 nips-2002-Learning a Forward Model of a Reflex


Source: pdf

Author: Bernd Porr, Florentin Wörgötter

Abstract: We develop a systems theoretical treatment of a behavioural system that interacts with its environment in a closed loop situation such that its motor actions influence its sensor inputs. The simplest form of a feedback is a reflex. Reflexes occur always “too late”; i.e., only after a (unpleasant, painful, dangerous) reflex-eliciting sensor event has occurred. This defines an objective problem which can be solved if another sensor input exists which can predict the primary reflex and can generate an earlier reaction. In contrast to previous approaches, our linear learning algorithm allows for an analytical proof that this system learns to apply feedforward control with the result that slow feedback loops are replaced by their equivalent feed-forward controller creating a forward model. In other words, learning turns the reactive system into a pro-active system. By means of a robot implementation we demonstrate the applicability of the theoretical results which can be used in a variety of different areas in physics and engineering.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 uk   ¡ Abstract We develop a systems theoretical treatment of a behavioural system that interacts with its environment in a closed loop situation such that its motor actions influence its sensor inputs. [sent-4, score-0.604]

2 , only after a (unpleasant, painful, dangerous) reflex-eliciting sensor event has occurred. [sent-8, score-0.116]

3 This defines an objective problem which can be solved if another sensor input exists which can predict the primary reflex and can generate an earlier reaction. [sent-9, score-0.195]

4 In contrast to previous approaches, our linear learning algorithm allows for an analytical proof that this system learns to apply feedforward control with the result that slow feedback loops are replaced by their equivalent feed-forward controller creating a forward model. [sent-10, score-0.597]

5 In other words, learning turns the reactive system into a pro-active system. [sent-11, score-0.101]

6 By means of a robot implementation we demonstrate the applicability of the theoretical results which can be used in a variety of different areas in physics and engineering. [sent-12, score-0.205]

7 1 Introduction Feedback loops are prevalent in animal behaviour, where they are normally called a “reflex”. [sent-13, score-0.164]

8 This can be done by an anticipatory (feedforward) action; for example when retracting a limb in response to heat radiation without actually having to touch the hot surface, which would elicit a pain-induced reflex. [sent-16, score-0.41]

9 While this has been interpreted as successful forward control [1] the question arises how such a behavioural system can be robustly generated. [sent-17, score-0.236]

10 In this article we introduce a linear algorithm for temporal sequence learning between two sensor events and provide an analytical proof that this process turns a pre-wired reflex loop into its equivalent feed-forward controller. [sent-18, score-0.542]

11 After learning the system will respond with an anticipatory action thereby avoiding the reflex. [sent-19, score-0.306]

12 Figure 1: Diagram of the system in its environment (in Laplace-notation). [sent-20, score-0.148]

13 The input signal is (“disturbance”) reaching both sensor inputs at different times as indicated by the temporal delay . [sent-21, score-0.341]

14 are linear the filtered inputs which converge with weights onto the output transfer functions, neuron . [sent-23, score-0.342]

15 © ¨ ¦ ¥£¡ ¤¢   §    2 The learning rule and its environment Fig. [sent-24, score-0.194]

16 1 shows the general situation which arises when temporal sequence learning takes place in a system which interacts with its environment [2]. [sent-25, score-0.44]

17 We distinguish two loops: The inner loop represents the reflex which has fixed unchanging properties. [sent-26, score-0.382]

18 Sequence learning requires causally related input events at both sensors (e. [sent-28, score-0.146]

19 heat radiation and pain) where denotes the time delay between both inputs. [sent-30, score-0.197]

20 The delayed and un-delayed signals are processed by a linear transform (e. [sent-32, score-0.127]

21 The output of the neuron is in the L APLACE-domain given by: § ¦ ¡ © § ¦ ¥£¡ ¤¢ ¦ ¥¡ ¤¢    ©  ¡ ! [sent-38, score-0.094]

22 1 denote how the environment influences the different signals. [sent-43, score-0.093]

23 The goal of sequence learning is that the outer loop should after learning functionally replace the inner loop such that the reflex will cease to be triggered. [sent-44, score-0.731]

24 This allows calculating the general requirements for the outer loop without having to specify the actual learning process. [sent-46, score-0.37]

25 ¢ GE£¡ (2) where represents the delay in L APLACE-notation. [sent-48, score-0.135]

26 The signal on the anticipatory (outer) pathway has the representation (3) e ©¦¢ ¨¦ fd`)`&¨ b ¢ © ¢ ¡ `¢ ¨ ¦ CYa   ¦ ¨ ! [sent-49, score-0.262]

27 © e ¢ ¢ ¢0 ¡ where is the learned transfer-function which generates the anticipatory response triggered by the input . [sent-52, score-0.315]

28 We want to express by the environmental transferfunctions and . [sent-53, score-0.086]

29 Following standard control theory [3] we neglect the denominator, because it does not add additional poles to the transfer function . [sent-59, score-0.341]

30 A transfer function , however, is meaningless because it violates temporal causality. [sent-61, score-0.298]

31 The learning goal of requires compensating the disturbance . [sent-64, score-0.233]

32 The disturbance, however, enters the system only after having been filtered by the environmental transfer function . [sent-65, score-0.335]

33 Thus, compensation of requires to reverse this filtering by a term which is the inverse environmental transfer function (hence “inverse controller”). [sent-66, score-0.358]

34 5 compensates for the delay between the two sensor signals originating from the disturbance . [sent-68, score-0.473]

35 The learning rule and convergence to a given solution under this rule. [sent-75, score-0.186]

36 Implementation of the system in a (real world) robot of (approximate) solutions experiment. [sent-80, score-0.295]

37 We will now specify the learning rule, by which the development of the weight values is controlled and show that any deviation from the given solution is eliminated due to learning. [sent-84, score-0.191]

38 In terms of the time domain functions , corresponding to and , our learning rule is given by:  ¥D X3 ¡ ! [sent-85, score-0.101]

39 5       e © ©¨ Thus, the weight change depends on the correlation between and the time derivative of . [sent-87, score-0.216]

40 Since the structure of the system is completely isotropic (see Fig. [sent-88, score-0.109]

41 1) and learning can take place at any synapse we shall call our learning algorithm isotropic sequence order learning (“ISO-learning”). [sent-89, score-0.234]

42 The positive constant is taken small enough such that all weight changes occur on a much longer time scale (i. [sent-90, score-0.101]

43 This rule is related to the one used in “temporal difference” learning [4]. [sent-93, score-0.101]

44 The total weight change can be calculated by [5]: ¨  (7)  f© F ! [sent-94, score-0.153]

45 ( & © ¨  1)b  1)b 0(  0( where represents the derivative of in the L APLACE domain. [sent-97, score-0.113]

46 We assume that the reflex pathway is unchanging with a fixed weight (negative feedback). [sent-98, score-0.217]

47 Note, that its open loop transfer characteristic given by must carry a low-pass component, otherwise the reflex loop would be unstable. [sent-99, score-0.662]

48 We will show that a perturbation of the weights will be 4 7 5a D6  $ 7 4 836 compensated by applying the learning procedure. [sent-103, score-0.201]

49 Since we do not make any assumption as to the size of the perturbation this is indicative of convergence in general. [sent-104, score-0.097]

50 Stability of the solution is expected if the weight change opposes the perturbation, thus, if . [sent-106, score-0.197]

51 Here, we however assume an ’adiabatic’ environment in which the system internally relaxes on a time scale much shorter than the time scale on which the disturbances occur. [sent-107, score-0.201]

52 In calculating the weight change (7) due to this disturbance signal we occur near disregard any subsequent disturbances as well as perturbations ( ) following the steady state condition. [sent-109, score-0.393]

53 1 we get: this yields: We use the superscript and to denote the arguments calculate the weight change using Eq. [sent-132, score-0.153]

54 & % ¢  where we call the autocorrelation function of which is the inverse transform of ( denotes a convolution) and is the temporal derivative of the impulse response of the inverse transform of the remaining second term in Eq. [sent-142, score-0.497]

55 Since we know that must carry a low-pass component we can in general state that the fraction represents a (non-standard) high-pass. [sent-144, score-0.089]

56 As an important special case we find that this especially holds if we assume delta-pulse disturbance at , corresponding to . [sent-148, score-0.187]

57 Here, we use a set of well-known functions (band-pass filters) and show explicitly that a solution which approximates the inverse controller (Eq. [sent-151, score-0.287]

58 7 © 7 The transfer functions of the band-pass filters , which we use, are specified in the L APLACE-domain: where represents the complex conjugate of the pole . [sent-154, score-0.294]

59 Real and imaginary parts of the poles are given by , where is the frequency of the oscillation. [sent-155, score-0.088]

60 In fact only a small drift of the weights is observed which could be compensated if required. [sent-160, score-0.099]

61 The use of resonators is also motivated by biology [6] and band-pass filtered response characteristics are prevalent in neuronal systems which also have been used in other neuro-theoretical approaches [7]. [sent-162, score-0.218]

62 Let us first assume that the environment does not filter the disturbance, thus . [sent-173, score-0.093]

63 For un-filtered throughput , this result shows that for all there exists a resonator with a weight , which approximates to the second order. [sent-183, score-0.16]

64 In general vironmental transfer function which is passive and “well-behaved”. [sent-185, score-0.237]

65 Note, if you would know , you had already reached your goal of designing the inverse controller and learning would be obsolete. [sent-194, score-0.328]

66 Thus, normally a set of resonators must be predefined in a somewhat arbitrary way and their weights shall be learned. [sent-195, score-0.246]

67 The uniqueness of the solution assured by orthogonality becomes secondary in practise, because – without prior knowledge of and – one has to use an over-complete set of , in order to make sure that a solution can be found. [sent-196, score-0.147]

68 ¦ R¦ ¨ § © © ¦ R¦ ¨ §   Figure 2: Robot experiment: (a) The robot has 2 output neurons for speed ( ) and steering , angle ( ). [sent-199, score-0.39]

69 The retraction mechanism is implemented by 3 resonators ( Hz) which connect the collision sensors (CS) to the neurons (speed) and (steering angle) with fixed weights (reflex). [sent-200, score-0.548]

70 Each range finder (RF) is fed into a filter bank of 10 resonators with Hz where its output converges with variable weights on both the and -neuron. [sent-201, score-0.277]

71 (c) Development of the weights from the left range finder sensor to the the neuron . [sent-209, score-0.216]

72   £  & ¢ ¢ a ©  3 Implementation in a robot experiment. [sent-214, score-0.205]

73 In this section, we show a robot experiment where we apply a conventional filter bank approach using rather few filters with constant and logarithmically spaced frequencies and demonstrate that the algorithms still produces the desired behaviour. [sent-215, score-0.245]

74  ¢ ¦ The task in this robot experiment is collision avoidance [8]. [sent-216, score-0.441]

75 The built-in reflex-behaviour is a retraction reaction after the robot has hit an obstacle which represents the inner loop feedback mechanism1. [sent-217, score-0.818]

76 The robot has three collision sensors ( ) and two range finders ( ), which produce the predictive signals. [sent-218, score-0.505]

77 When driving around there is always a causal relation between the earlier occurring range finder signals and the later occurring collision, which drives the learning process. [sent-219, score-0.219]

78 2b shows that early during learning many collisions (circles) occur. [sent-221, score-0.126]

79 After a collision a fast reflex-like retraction&turning; reaction is elicited. [sent-222, score-0.289]

80 On the other hand, the robot movement trace is now free of collisions after successful learning of the temporal correlation between range finder and collision signals (Fig. [sent-223, score-0.806]

81 The robot always found a stable solution, but those were as expected - not unique. [sent-226, score-0.205]

82 Possible solutions, which we have observed, are that the robot after learning simply stops in front of an obstacle and that it slightly oscillates back and forth. [sent-228, score-0.343]

83 The more common solution of the robot is that it continuously drives around and uses mainly his steering to avoid obstacles. [sent-229, score-0.395]

84 2c shows that the weight change slows down after the last collision has happened (dotted line in c). [sent-232, score-0.389]

85 The still existing smaller weight change is due to the fact that after functional silencing of (no more collisions) temporally correlated inputs still exist namely between the left and right range finders. [sent-233, score-0.153]

86 © ¢ ¡ 4 Discussion Replacing a feedback loop with its equivalent feed-forward controller is of central relevance for efficient control particularly in slow feedback systems, where long loop-delays exist. [sent-235, score-0.724]

87 On the other hand, it has been suggested earlier by studies of limb movement control that temporal sequence learning could be used to solve the inverse controller problem [1]. [sent-237, score-0.64]

88 a) shows the drive reinforcement-model by Sutton and Barto [4] and the case of c) the temporal difference (TD) learning by Sutton and Barto [10]. [sent-239, score-0.15]

89 Additionally the circuit for the weight change (learning) is in the Sutton and Barto-models (a,c) are first order low-pass shown. [sent-243, score-0.204]

90 7   ¡   Widely used models of derivative based temporal sequence learning are those by Sutton and Barto which have the aim to model experiments of classical conditioning [4, 11, 10]. [sent-249, score-0.329]

91 All models strengthen the weight if precedes (or , respectively). [sent-252, score-0.101]

92 However, in the Sutton and Barto-models these filtered input signals are only used as an input for the learning circuit (Fig. [sent-254, score-0.254]

93 Learning is therefore achieved by correlating the filtered input with the derivative of the (un-filtered) output-signal. [sent-256, score-0.099]

94 In contrast to the Sutton and Barto-models, our model is completely isotropic and uses the filtered signals for both, the learning circuit and the output since the filtered signals are also responsible for an appropriate behaviour of the organism. [sent-258, score-0.448]

95 These different wirings reflect the different learning goals: in our model the weight stabilises when the input has become silent (the reflex has been avoided). [sent-259, score-0.25]

96 In the Sutton and Barto-models the  ¦ ¢£ ¦ T£ ¦  ¢£ weight stabilises if the output has reached a specific condition. [sent-260, score-0.216]

97 In the case of TDlearning learning stops if the prediction error between reward and the output is zero, thus if optimally predicts . [sent-263, score-0.141]

98 © ¦ )£ ©  ¢£ © © The current study demonstrates analytically the convergence of ISO-learning in a closed loop paradigm in conjunction with some rather general assumptions concerning the structure of such a system. [sent-265, score-0.312]

99 Thus, this type of learning is able to generate a model-free inverse controller of a reflex, which improves the performance of conventional feedbackcontrol, while the feedback still serves as a fall-back. [sent-266, score-0.422]

100 Simulation of anticipatory responses in classical conditioning by a neuron-like adaptive element. [sent-323, score-0.279]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('ex', 0.265), ('collision', 0.236), ('loop', 0.234), ('anticipatory', 0.205), ('robot', 0.205), ('transfer', 0.194), ('sutton', 0.191), ('disturbance', 0.187), ('ltered', 0.178), ('controller', 0.165), ('re', 0.153), ('resonators', 0.135), ('feedback', 0.133), ('nder', 0.117), ('sensor', 0.116), ('barto', 0.107), ('temporal', 0.104), ('practise', 0.101), ('steering', 0.101), ('vstr', 0.101), ('weight', 0.101), ('environment', 0.093), ('outer', 0.09), ('lters', 0.088), ('poles', 0.088), ('environmental', 0.086), ('signals', 0.085), ('delay', 0.085), ('collisions', 0.08), ('behaviour', 0.079), ('inverse', 0.078), ('conditioning', 0.074), ('stabilises', 0.067), ('bs', 0.064), ('loops', 0.064), ('sensors', 0.064), ('derivative', 0.063), ('control', 0.059), ('retraction', 0.059), ('orthogonality', 0.059), ('heat', 0.059), ('aip', 0.059), ('interacts', 0.059), ('resonator', 0.059), ('stirling', 0.059), ('unchanging', 0.059), ('vtr', 0.059), ('normally', 0.057), ('pathway', 0.057), ('perturbation', 0.056), ('rule', 0.055), ('system', 0.055), ('isotropic', 0.054), ('weights', 0.054), ('reaction', 0.053), ('limb', 0.053), ('radiation', 0.053), ('disturbances', 0.053), ('lter', 0.053), ('change', 0.052), ('circuit', 0.051), ('vs', 0.051), ('represents', 0.05), ('autocorrelation', 0.05), ('pole', 0.05), ('fundamentals', 0.05), ('movement', 0.05), ('york', 0.048), ('output', 0.048), ('stops', 0.047), ('behavioural', 0.047), ('learning', 0.046), ('neuron', 0.046), ('integral', 0.045), ('compensated', 0.045), ('drives', 0.045), ('obstacle', 0.045), ('solution', 0.044), ('earlier', 0.043), ('denominator', 0.043), ('passive', 0.043), ('prevalent', 0.043), ('sequence', 0.042), ('transform', 0.042), ('feedforward', 0.041), ('arises', 0.041), ('convergence', 0.041), ('response', 0.04), ('bank', 0.04), ('know', 0.039), ('inner', 0.039), ('concerning', 0.037), ('get', 0.036), ('angle', 0.036), ('accordingly', 0.036), ('input', 0.036), ('solutions', 0.035), ('hz', 0.035), ('forward', 0.034), ('triggered', 0.034)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.000001 128 nips-2002-Learning a Forward Model of a Reflex

Author: Bernd Porr, Florentin Wörgötter

Abstract: We develop a systems theoretical treatment of a behavioural system that interacts with its environment in a closed loop situation such that its motor actions influence its sensor inputs. The simplest form of a feedback is a reflex. Reflexes occur always “too late”; i.e., only after a (unpleasant, painful, dangerous) reflex-eliciting sensor event has occurred. This defines an objective problem which can be solved if another sensor input exists which can predict the primary reflex and can generate an earlier reaction. In contrast to previous approaches, our linear learning algorithm allows for an analytical proof that this system learns to apply feedforward control with the result that slow feedback loops are replaced by their equivalent feed-forward controller creating a forward model. In other words, learning turns the reactive system into a pro-active system. By means of a robot implementation we demonstrate the applicability of the theoretical results which can be used in a variety of different areas in physics and engineering.

2 0.14161661 155 nips-2002-Nonparametric Representation of Policies and Value Functions: A Trajectory-Based Approach

Author: Christopher G. Atkeson, Jun Morimoto

Abstract: A longstanding goal of reinforcement learning is to develop nonparametric representations of policies and value functions that support rapid learning without suffering from interference or the curse of dimensionality. We have developed a trajectory-based approach, in which policies and value functions are represented nonparametrically along trajectories. These trajectories, policies, and value functions are updated as the value function becomes more accurate or as a model of the task is updated. We have applied this approach to periodic tasks such as hopping and walking, which required handling discount factors and discontinuities in the task dynamics, and using function approximation to represent value functions at discontinuities. We also describe extensions of the approach to make the policies more robust to modeling error and sensor noise.

3 0.12973644 9 nips-2002-A Minimal Intervention Principle for Coordinated Movement

Author: Emanuel Todorov, Michael I. Jordan

Abstract: Behavioral goals are achieved reliably and repeatedly with movements rarely reproducible in their detail. Here we offer an explanation: we show that not only are variability and goal achievement compatible, but indeed that allowing variability in redundant dimensions is the optimal control strategy in the face of uncertainty. The optimal feedback control laws for typical motor tasks obey a “minimal intervention” principle: deviations from the average trajectory are only corrected when they interfere with the task goals. The resulting behavior exhibits task-constrained variability, as well as synergetic coupling among actuators—which is another unexplained empirical phenomenon.

4 0.11846624 169 nips-2002-Real-Time Particle Filters

Author: Cody Kwok, Dieter Fox, Marina Meila

Abstract: Particle filters estimate the state of dynamical systems from sensor information. In many real time applications of particle filters, however, sensor information arrives at a significantly higher rate than the update rate of the filter. The prevalent approach to dealing with such situations is to update the particle filter as often as possible and to discard sensor information that cannot be processed in time. In this paper we present real-time particle filters, which make use of all sensor information even when the filter update rate is below the update rate of the sensors. This is achieved by representing posteriors as mixtures of sample sets, where each mixture component integrates one observation arriving during a filter update. The weights of the mixture components are set so as to minimize the approximation error introduced by the mixture representation. Thereby, our approach focuses computational resources (samples) on valuable sensor information. Experiments using data collected with a mobile robot show that our approach yields strong improvements over other approaches.

5 0.11000842 144 nips-2002-Minimax Differential Dynamic Programming: An Application to Robust Biped Walking

Author: Jun Morimoto, Christopher G. Atkeson

Abstract: We developed a robust control policy design method in high-dimensional state space by using differential dynamic programming with a minimax criterion. As an example, we applied our method to a simulated five link biped robot. The results show lower joint torques from the optimal control policy compared to a hand-tuned PD servo controller. Results also show that the simulated biped robot can successfully walk with unknown disturbances that cause controllers generated by standard differential dynamic programming and the hand-tuned PD servo to fail. Learning to compensate for modeling error and previously unknown disturbances in conjunction with robust control design is also demonstrated.

6 0.1009988 123 nips-2002-Learning Attractor Landscapes for Learning Motor Primitives

7 0.098591551 199 nips-2002-Timing and Partial Observability in the Dopamine System

8 0.095995709 171 nips-2002-Reconstructing Stimulus-Driven Neural Networks from Spike Times

9 0.094137102 136 nips-2002-Linear Combinations of Optic Flow Vectors for Estimating Self-Motion - a Real-World Test of a Neural Model

10 0.087100744 51 nips-2002-Classifying Patterns of Visual Motion - a Neuromorphic Approach

11 0.085321352 127 nips-2002-Learning Sparse Topographic Representations with Products of Student-t Distributions

12 0.085278727 82 nips-2002-Exponential Family PCA for Belief Compression in POMDPs

13 0.078416735 108 nips-2002-Improving Transfer Rates in Brain Computer Interfacing: A Case Study

14 0.078248322 137 nips-2002-Location Estimation with a Differential Update Network

15 0.074283004 189 nips-2002-Stable Fixed Points of Loopy Belief Propagation Are Local Minima of the Bethe Free Energy

16 0.073046163 65 nips-2002-Derivative Observations in Gaussian Process Models of Dynamic Systems

17 0.070878431 50 nips-2002-Circuit Model of Short-Term Synaptic Dynamics

18 0.068470083 193 nips-2002-Temporal Coherence, Natural Image Sequences, and the Visual Cortex

19 0.06836421 160 nips-2002-Optoelectronic Implementation of a FitzHugh-Nagumo Neural Model

20 0.06711366 153 nips-2002-Neural Decoding of Cursor Motion Using a Kalman Filter


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.213), (1, 0.1), (2, -0.114), (3, -0.016), (4, 0.017), (5, 0.055), (6, -0.009), (7, 0.081), (8, 0.139), (9, 0.115), (10, -0.1), (11, 0.065), (12, 0.003), (13, -0.045), (14, -0.006), (15, 0.002), (16, 0.017), (17, -0.082), (18, -0.117), (19, -0.081), (20, 0.071), (21, 0.016), (22, 0.105), (23, 0.091), (24, 0.187), (25, 0.011), (26, 0.004), (27, 0.046), (28, 0.03), (29, -0.099), (30, -0.04), (31, 0.04), (32, 0.021), (33, -0.027), (34, 0.022), (35, 0.018), (36, 0.056), (37, 0.148), (38, 0.061), (39, -0.016), (40, -0.104), (41, -0.021), (42, -0.001), (43, 0.128), (44, 0.002), (45, 0.042), (46, 0.135), (47, -0.007), (48, 0.038), (49, -0.13)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95484054 128 nips-2002-Learning a Forward Model of a Reflex

Author: Bernd Porr, Florentin Wörgötter

Abstract: We develop a systems theoretical treatment of a behavioural system that interacts with its environment in a closed loop situation such that its motor actions influence its sensor inputs. The simplest form of a feedback is a reflex. Reflexes occur always “too late”; i.e., only after a (unpleasant, painful, dangerous) reflex-eliciting sensor event has occurred. This defines an objective problem which can be solved if another sensor input exists which can predict the primary reflex and can generate an earlier reaction. In contrast to previous approaches, our linear learning algorithm allows for an analytical proof that this system learns to apply feedforward control with the result that slow feedback loops are replaced by their equivalent feed-forward controller creating a forward model. In other words, learning turns the reactive system into a pro-active system. By means of a robot implementation we demonstrate the applicability of the theoretical results which can be used in a variety of different areas in physics and engineering.

2 0.61348385 123 nips-2002-Learning Attractor Landscapes for Learning Motor Primitives

Author: Auke J. Ijspeert, Jun Nakanishi, Stefan Schaal

Abstract: Many control problems take place in continuous state-action spaces, e.g., as in manipulator robotics, where the control objective is often defined as finding a desired trajectory that reaches a particular goal state. While reinforcement learning offers a theoretical framework to learn such control policies from scratch, its applicability to higher dimensional continuous state-action spaces remains rather limited to date. Instead of learning from scratch, in this paper we suggest to learn a desired complex control policy by transforming an existing simple canonical control policy. For this purpose, we represent canonical policies in terms of differential equations with well-defined attractor properties. By nonlinearly transforming the canonical attractor dynamics using techniques from nonparametric regression, almost arbitrary new nonlinear policies can be generated without losing the stability properties of the canonical system. We demonstrate our techniques in the context of learning a set of movement skills for a humanoid robot from demonstrations of a human teacher. Policies are acquired rapidly, and, due to the properties of well formulated differential equations, can be re-used and modified on-line under dynamic changes of the environment. The linear parameterization of nonparametric regression moreover lends itself to recognize and classify previously learned movement skills. Evaluations in simulations and on an actual 30 degree-offreedom humanoid robot exemplify the feasibility and robustness of our approach. 1

3 0.58117634 144 nips-2002-Minimax Differential Dynamic Programming: An Application to Robust Biped Walking

Author: Jun Morimoto, Christopher G. Atkeson

Abstract: We developed a robust control policy design method in high-dimensional state space by using differential dynamic programming with a minimax criterion. As an example, we applied our method to a simulated five link biped robot. The results show lower joint torques from the optimal control policy compared to a hand-tuned PD servo controller. Results also show that the simulated biped robot can successfully walk with unknown disturbances that cause controllers generated by standard differential dynamic programming and the hand-tuned PD servo to fail. Learning to compensate for modeling error and previously unknown disturbances in conjunction with robust control design is also demonstrated.

4 0.54181314 169 nips-2002-Real-Time Particle Filters

Author: Cody Kwok, Dieter Fox, Marina Meila

Abstract: Particle filters estimate the state of dynamical systems from sensor information. In many real time applications of particle filters, however, sensor information arrives at a significantly higher rate than the update rate of the filter. The prevalent approach to dealing with such situations is to update the particle filter as often as possible and to discard sensor information that cannot be processed in time. In this paper we present real-time particle filters, which make use of all sensor information even when the filter update rate is below the update rate of the sensors. This is achieved by representing posteriors as mixtures of sample sets, where each mixture component integrates one observation arriving during a filter update. The weights of the mixture components are set so as to minimize the approximation error introduced by the mixture representation. Thereby, our approach focuses computational resources (samples) on valuable sensor information. Experiments using data collected with a mobile robot show that our approach yields strong improvements over other approaches.

5 0.53921211 160 nips-2002-Optoelectronic Implementation of a FitzHugh-Nagumo Neural Model

Author: Alexandre R. Romariz, Kelvin Wagner

Abstract: An optoelectronic implementation of a spiking neuron model based on the FitzHugh-Nagumo equations is presented. A tunable semiconductor laser source and a spectral filter provide a nonlinear mapping from driver voltage to detected signal. Linear electronic feedback completes the implementation, which allows either electronic or optical input signals. Experimental results for a single system and numeric results of model interaction confirm that important features of spiking neural models can be implemented through this approach.

6 0.52076262 155 nips-2002-Nonparametric Representation of Policies and Value Functions: A Trajectory-Based Approach

7 0.51287675 9 nips-2002-A Minimal Intervention Principle for Coordinated Movement

8 0.44196019 5 nips-2002-A Digital Antennal Lobe for Pattern Equalization: Analysis and Design

9 0.4394924 71 nips-2002-Dopamine Induced Bistability Enhances Signal Processing in Spiny Neurons

10 0.41224593 136 nips-2002-Linear Combinations of Optic Flow Vectors for Estimating Self-Motion - a Real-World Test of a Neural Model

11 0.40645939 199 nips-2002-Timing and Partial Observability in the Dopamine System

12 0.38856077 22 nips-2002-Adaptive Nonlinear System Identification with Echo State Networks

13 0.38493448 168 nips-2002-Real-Time Monitoring of Complex Industrial Processes with Particle Filters

14 0.37938002 81 nips-2002-Expected and Unexpected Uncertainty: ACh and NE in the Neocortex

15 0.36781245 51 nips-2002-Classifying Patterns of Visual Motion - a Neuromorphic Approach

16 0.36748838 65 nips-2002-Derivative Observations in Gaussian Process Models of Dynamic Systems

17 0.36049128 153 nips-2002-Neural Decoding of Cursor Motion Using a Kalman Filter

18 0.35933349 18 nips-2002-Adaptation and Unsupervised Learning

19 0.35217336 180 nips-2002-Selectivity and Metaplasticity in a Unified Calcium-Dependent Model

20 0.35018027 137 nips-2002-Location Estimation with a Differential Update Network


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(7, 0.298), (11, 0.018), (23, 0.033), (42, 0.07), (54, 0.105), (55, 0.057), (57, 0.013), (64, 0.014), (67, 0.027), (68, 0.063), (74, 0.074), (92, 0.037), (98, 0.102)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.80086946 128 nips-2002-Learning a Forward Model of a Reflex

Author: Bernd Porr, Florentin Wörgötter

Abstract: We develop a systems theoretical treatment of a behavioural system that interacts with its environment in a closed loop situation such that its motor actions influence its sensor inputs. The simplest form of a feedback is a reflex. Reflexes occur always “too late”; i.e., only after a (unpleasant, painful, dangerous) reflex-eliciting sensor event has occurred. This defines an objective problem which can be solved if another sensor input exists which can predict the primary reflex and can generate an earlier reaction. In contrast to previous approaches, our linear learning algorithm allows for an analytical proof that this system learns to apply feedforward control with the result that slow feedback loops are replaced by their equivalent feed-forward controller creating a forward model. In other words, learning turns the reactive system into a pro-active system. By means of a robot implementation we demonstrate the applicability of the theoretical results which can be used in a variety of different areas in physics and engineering.

2 0.54274571 11 nips-2002-A Model for Real-Time Computation in Generic Neural Microcircuits

Author: Wolfgang Maass, Thomas Natschläger, Henry Markram

Abstract: A key challenge for neural modeling is to explain how a continuous stream of multi-modal input from a rapidly changing environment can be processed by stereotypical recurrent circuits of integrate-and-fire neurons in real-time. We propose a new computational model that is based on principles of high dimensional dynamical systems in combination with statistical learning theory. It can be implemented on generic evolved or found recurrent circuitry.

3 0.54219151 10 nips-2002-A Model for Learning Variance Components of Natural Images

Author: Yan Karklin, Michael S. Lewicki

Abstract: We present a hierarchical Bayesian model for learning efficient codes of higher-order structure in natural images. The model, a non-linear generalization of independent component analysis, replaces the standard assumption of independence for the joint distribution of coefficients with a distribution that is adapted to the variance structure of the coefficients of an efficient image basis. This offers a novel description of higherorder image structure and provides a way to learn coarse-coded, sparsedistributed representations of abstract image properties such as object location, scale, and texture.

4 0.54003686 5 nips-2002-A Digital Antennal Lobe for Pattern Equalization: Analysis and Design

Author: Alex Holub, Gilles Laurent, Pietro Perona

Abstract: Re-mapping patterns in order to equalize their distribution may greatly simplify both the structure and the training of classifiers. Here, the properties of one such map obtained by running a few steps of discrete-time dynamical system are explored. The system is called 'Digital Antennal Lobe' (DAL) because it is inspired by recent studies of the antennallobe, a structure in the olfactory system of the grasshopper. The pattern-spreading properties of the DAL as well as its average behavior as a function of its (few) design parameters are analyzed by extending previous results of Van Vreeswijk and Sompolinsky. Furthermore, a technique for adapting the parameters of the initial design in order to obtain opportune noise-rejection behavior is suggested. Our results are demonstrated with a number of simulations. 1

5 0.53701419 76 nips-2002-Dynamical Constraints on Computing with Spike Timing in the Cortex

Author: Arunava Banerjee, Alexandre Pouget

Abstract: If the cortex uses spike timing to compute, the timing of the spikes must be robust to perturbations. Based on a recent framework that provides a simple criterion to determine whether a spike sequence produced by a generic network is sensitive to initial conditions, and numerical simulations of a variety of network architectures, we argue within the limits set by our model of the neuron, that it is unlikely that precise sequences of spike timings are used for computation under conditions typically found in the cortex.

6 0.53643733 127 nips-2002-Learning Sparse Topographic Representations with Products of Student-t Distributions

7 0.53610253 141 nips-2002-Maximally Informative Dimensions: Analyzing Neural Responses to Natural Signals

8 0.53287184 28 nips-2002-An Information Theoretic Approach to the Functional Classification of Neurons

9 0.5321182 21 nips-2002-Adaptive Classification by Variational Kalman Filtering

10 0.53193069 204 nips-2002-VIBES: A Variational Inference Engine for Bayesian Networks

11 0.53149498 62 nips-2002-Coulomb Classifiers: Generalizing Support Vector Machines via an Analogy to Electrostatic Systems

12 0.53127712 24 nips-2002-Adaptive Scaling for Feature Selection in SVMs

13 0.53093553 68 nips-2002-Discriminative Densities from Maximum Contrast Estimation

14 0.53089744 193 nips-2002-Temporal Coherence, Natural Image Sequences, and the Visual Cortex

15 0.53083611 37 nips-2002-Automatic Derivation of Statistical Algorithms: The EM Family and Beyond

16 0.52908272 3 nips-2002-A Convergent Form of Approximate Policy Iteration

17 0.52787566 2 nips-2002-A Bilinear Model for Sparse Coding

18 0.5275116 123 nips-2002-Learning Attractor Landscapes for Learning Motor Primitives

19 0.52720296 52 nips-2002-Cluster Kernels for Semi-Supervised Learning

20 0.52668649 44 nips-2002-Binary Tuning is Optimal for Neural Rate Coding with High Temporal Resolution