nips nips2002 nips2002-123 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Auke J. Ijspeert, Jun Nakanishi, Stefan Schaal
Abstract: Many control problems take place in continuous state-action spaces, e.g., as in manipulator robotics, where the control objective is often defined as finding a desired trajectory that reaches a particular goal state. While reinforcement learning offers a theoretical framework to learn such control policies from scratch, its applicability to higher dimensional continuous state-action spaces remains rather limited to date. Instead of learning from scratch, in this paper we suggest to learn a desired complex control policy by transforming an existing simple canonical control policy. For this purpose, we represent canonical policies in terms of differential equations with well-defined attractor properties. By nonlinearly transforming the canonical attractor dynamics using techniques from nonparametric regression, almost arbitrary new nonlinear policies can be generated without losing the stability properties of the canonical system. We demonstrate our techniques in the context of learning a set of movement skills for a humanoid robot from demonstrations of a human teacher. Policies are acquired rapidly, and, due to the properties of well formulated differential equations, can be re-used and modified on-line under dynamic changes of the environment. The linear parameterization of nonparametric regression moreover lends itself to recognize and classify previously learned movement skills. Evaluations in simulations and on an actual 30 degree-offreedom humanoid robot exemplify the feasibility and robustness of our approach. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu 1 Abstract Many control problems take place in continuous state-action spaces, e. [sent-6, score-0.115]
2 , as in manipulator robotics, where the control objective is often defined as finding a desired trajectory that reaches a particular goal state. [sent-8, score-0.397]
3 While reinforcement learning offers a theoretical framework to learn such control policies from scratch, its applicability to higher dimensional continuous state-action spaces remains rather limited to date. [sent-9, score-0.327]
4 Instead of learning from scratch, in this paper we suggest to learn a desired complex control policy by transforming an existing simple canonical control policy. [sent-10, score-0.527]
5 For this purpose, we represent canonical policies in terms of differential equations with well-defined attractor properties. [sent-11, score-0.503]
6 By nonlinearly transforming the canonical attractor dynamics using techniques from nonparametric regression, almost arbitrary new nonlinear policies can be generated without losing the stability properties of the canonical system. [sent-12, score-0.681]
7 We demonstrate our techniques in the context of learning a set of movement skills for a humanoid robot from demonstrations of a human teacher. [sent-13, score-0.751]
8 The linear parameterization of nonparametric regression moreover lends itself to recognize and classify previously learned movement skills. [sent-15, score-0.395]
9 Evaluations in simulations and on an actual 30 degree-offreedom humanoid robot exemplify the feasibility and robustness of our approach. [sent-16, score-0.346]
10 1 Introduction Learning control is formulated in one of the most general forms as learning a control policy u = π(x, t, w) that maps a state x, possibly in a time t dependent way, to an action u; the vector w denotes the adjustable parameters that can be used to optimize the policy. [sent-17, score-0.43]
11 Since learning control policies (CPs) based on atomic state-action representations is rather time consuming and faces problems in higher dimensional and/or continuous state-action spaces, a current topic in learning control is to use ∗ http://lslwww. [sent-18, score-0.431]
12 In this paper we suggest a novel encoding for such higher level representations based on the analogy between CPs and differential equations: both formulations suggest a change of state given the current state of the system, and both usually encode a desired goal in form of an attractor state. [sent-21, score-0.395]
13 If such a representation can keep the policy linear in the parameters w, rapid learning can be accomplished, and, moreover, the parameter vector may serve to classify a particular policy. [sent-23, score-0.21]
14 In the following sections, we will first develop our learning approach of shaping attractor landscapes by means of statistical learning building on preliminary previous work [3, 4]. [sent-24, score-0.433]
15 1 Second, we will present a particular form of canonical CPs suitable for manipulator robotics, and finally, we will demonstrate how our methods can be used to classify movement and equip an actual humanoid robot with a variety of movement skills through imitation learning. [sent-25, score-1.154]
16 2 Learning Attractor Landscapes We consider a learning scenario where the goal of control is to attain a particular attractor state, either formulated as a point attractor (for discrete movements) or as a limit cycle (for rhythmic movements). [sent-26, score-1.212]
17 For point attractors, we require that the CP will reach the goal state with a particular trajectory shape, irrespective of the initial conditions — a tennis swing toward a ball would be a typical example of such a movement. [sent-27, score-0.415]
18 For limit cycles, the goal is given as the trajectory shape of the limit cycle and needs to be realized from any start state, as for example, in a complex drumming beat hitting multiple drums during one period. [sent-28, score-0.457]
19 Using these samples, an asymptotically stable CP is to be generated, prescribing a desired velocity given a particular state 2 . [sent-30, score-0.127]
20 Various methods have been suggested to solve such control problems in the literature. [sent-31, score-0.115]
21 As the simplest approach, one could just use one of the demonstrated trajectories and track it as a desired trajectory. [sent-32, score-0.327]
22 Recurrent neural networks were suggested as a possible alternative that can avoid explicit time indexing — the complexity of training these networks to obtain stable attractor landscapes, however, has prevented a widespread application so far. [sent-35, score-0.26]
23 Finally, it is also possible to prime a reinforcement learning system with sample trajectories and pursue one of the established continuous state-action learning algorithms; investigations of such an approach, however, demonstrated rather limited efficiency [7]. [sent-36, score-0.4]
24 In the next sections, we present an alternative and surprisingly simple solution to learning the control problem above. [sent-37, score-0.146]
25 2 Note that we restrict our approach to purely kinematic CPs, assuming that the movement system is equipped with an appropriate feedback and feedforward controller that can accurately track the kinematic plans generated by our policies. [sent-40, score-0.414]
26 αz , βz , αv , βv , αz , βz , µ, σi and ci are positive constants. [sent-42, score-0.087]
27 x0 is the start state of the discrete system in order to allow nonzero initial conditions. [sent-43, score-0.154]
28 The design parameters of the discrete system are τ , the temporal scaling factor, and g, the goal position. [sent-44, score-0.163]
29 The design parameters of the rhythmic system are ym , the baseline of the oscillation, τ , the period divided by 2π, and ro , the amplitude of oscillations. [sent-45, score-0.635]
30 The parameters wi are fitted to a demonstrated trajectory using Locally Weighted Learning. [sent-46, score-0.322]
31 For appropriate parameter settings and f = 0, these equations form a globally stable linear dynamical system with g as a unique point attractor. [sent-49, score-0.204]
32 1 to change the rather trivial exponential convergence of y to allow more complex trajectories on the way to the goal? [sent-51, score-0.175]
33 1 enters the domain of nonlinear dynamics, an arbitrary complexity of the resulting equations can be expected. [sent-53, score-0.11]
34 To the best of our knowledge, this has prevented research from employing generic learning in nonlinear dynamical systems so far. [sent-54, score-0.236]
35 However, the introduction of an additional canonical dynamical system (x, v) τ v = αv (βv (g − x) − v) ˙ τx = v ˙ (2) Ψi = exp −hi (x/g − ci )2 (3) and the nonlinear function f f (x, v, g) = N i=1 Ψi wi v N i=1 Ψi can alleviate this problem. [sent-55, score-0.545]
36 2 is a second order dynamical system similar to Eq. [sent-57, score-0.165]
37 1, however, it is linear and not modulated by a nonlinear function, and, thus, its monotonic global convergence to g can be guaranteed with a proper choice of αv and βv . [sent-58, score-0.104]
38 3, it can be shown that the combined dynamical system (Eqs. [sent-61, score-0.165]
39 5 −1 Time [s] Time [s] Figure 1: Examples of time evolution of the discrete CPs (left) and rhythmic CPs (right). [sent-93, score-0.464]
40 The parameters wi have been adjusted to fit ydemo (t) = 10 sin(2πt) exp(−t2 ) for the dis˙ crete CPs and ydemo (t) = 2π cos(2πt) − 6π sin(6πt) for the rhythmic CPs. [sent-94, score-0.854]
41 For learning from a given sample trajectory, characterized by a trajectory ydemo (t), ydemo (t) and duration T , a supervised learn˙ ing problem can be formulated with the target trajectory ftarget = τ ydemo − zdemo ˙ for Eq. [sent-96, score-0.865]
42 The corresponding goal state is g = ydemo (T ) − ydemo (t = 0), i. [sent-99, score-0.397]
43 , the sample trajectory was translated to start at y = 0. [sent-101, score-0.139]
44 Moreover, as will be explained later, the parameters wi learned by LWL are also independent of the number of basis functions, such that they can be used robustly for categorization of different learned CPs. [sent-107, score-0.237]
45 Table 1 summarizes the proposed discrete and rhythmic CPs, and Figure 1 shows exemplary time evolutions of the complete systems. [sent-111, score-0.517]
46 3 Special Properties of Control Policies based on Dynamical Systems Spatial and Temporal Invariance An interesting property of both discrete and rhythmic CPs is that they are spatially and temporally invariant. [sent-113, score-0.464]
47 Scaling of the goal g for the discrete CP and of the amplitude r0 for the rhythmic CP does not affect the topology of the attractor landscape. [sent-114, score-0.794]
48 Similarly, the period (for the rhythmic system) and duration (for the discrete system) of the trajectory y is directly determined by the parameter τ . [sent-115, score-0.645]
49 This means that the amplitude and durations/periods of learned patterns can be independently modified without affecting the qualitative shape of trajectory y. [sent-116, score-0.256]
50 In section 3, we will exploit these properties to reuse a learned movement (such as a tennis swing, for instance) in novel conditions (e. [sent-117, score-0.338]
51 An obstacle can, for instance, block the trajectory of the robot, in which case large discrepancies between desired positions generated by the control policy and actual positions of the robot will occur. [sent-122, score-0.719]
52 As outlined in [3], the dynamical system formulation allows feeding back an error term between actual and desired positions into the CPs, such that the time evolution of the policy is smoothly paused during a perturbation, i. [sent-123, score-0.406]
53 Movement Recognition Given the temporal and spatial invariance of our policy representation, trajectories that are topologically similar tend to be fit by similar parameters wi , i. [sent-129, score-0.407]
54 , similar trajectories at different speeds and/or different amplitudes will result in similar wi . [sent-131, score-0.3]
55 3, we will use this property to demonstrate the potential of using the CPs for movement recognition. [sent-133, score-0.229]
56 1 Learning of Rhythmic Control Policies by Imitation We tested the proposed CPs in a learning by demonstration task with a humanoid robot. [sent-135, score-0.225]
57 9-meter tall 30 DOFs hydraulic anthropomorphic robot with legs, arms, a jointed torso, and a head [9]. [sent-137, score-0.152]
58 We recorded trajectories performed by a human subject using a joint-angle recording system, the Sarcos Sensuit (see Figure 2, top). [sent-138, score-0.3]
59 The joint-angle trajectories are fitted by the CPs, with one CP per degree of freedom (DOF). [sent-139, score-0.175]
60 The CPs are then used to replay the movement in the humanoid robot, using an inverse dynamics controller to track the desired trajectories generated by the CPs. [sent-140, score-0.776]
61 The actual positions y of each DOF are fed back ˜ into the CPs in order to take perturbations into account. [sent-141, score-0.121]
62 Using the joint-angle recording system, we recorded a set of rhythmic movements such as tracing a figure 8 in the air, or a drumming sequence on a bongo (i. [sent-142, score-0.667]
63 An exemplary movement and its replication by the robot is demonstrated in Figure 2 (top). [sent-146, score-0.492]
64 Figure 2 (left) shows the joint trajectories over one period of an exemplary drumming beat. [sent-147, score-0.391]
65 For the learning, the base frequency was extracted by hand such as to provide the parameter τ to the rhythmic CP. [sent-149, score-0.405]
66 Once a rhythmic movement has been learned by the CP, it can be modulated in several ways. [sent-150, score-0.723]
67 4 2 D 1 0 Figure 2: Top: Humanoid robot learning a figure-8 movement from a human demonstration. [sent-212, score-0.476]
68 Left: Recorded drumming movement performed with both arms (6 DOFs per arm). [sent-213, score-0.398]
69 The dotted lines and continuous lines correspond to one period of the demonstrated and learned trajectories, respectively. [sent-214, score-0.156]
70 Right: Modification of the learned rhythmic pattern (flexion/extension of the right elbow, R EB). [sent-215, score-0.461]
71 A: trajectory learned by the rhythmic CP, B: temporary modification with r0 = 2r0 , C: τ = τ /2, D: y˜ = ym + 1 ˜ ˜ m (dotted line), where r0 , τ , and y˜ correspond to modified parameters between t=3s ˜ ˜ m and t=7s. [sent-216, score-0.664]
72 Movies of the human subject and the humanoid robot can be found at http://lslwww. [sent-217, score-0.41]
73 modulate the amplitude and period of all DOFs, while keeping the same phase relation between DOFs. [sent-221, score-0.14]
74 This might be particularly useful for a drumming task in order to replay the same beat pattern at different speeds and/or amplitudes. [sent-222, score-0.204]
75 Alternatively, the r0 and τ parameters can be modulated independently for the DOFs each arm, in order to be able to change the beat pattern (doubling the frequency of one arm, for instance). [sent-223, score-0.081]
76 Figure 2 (right) illustrates different modulations which can be generated by the rhythmic CPs. [sent-224, score-0.405]
77 The rhythmic CP can smoothly modulate the amplitude, frequency, and baseline of the oscillations. [sent-226, score-0.405]
78 2 Learning of Discrete Control Policies by Imitation In this experiment, the task for the robot was to learn tennis forehand and backhand swings demonstrated by a human wearing the joint-angle recording system. [sent-228, score-0.398]
79 Once a particular swing has been learned, the robot is able to repeat the swing motion to different cartesian targets, by providing new goal positions g to the CPs for the different DOFs. [sent-229, score-0.457]
80 Using a system of two-cameras, the position of the ball is given to an inverse kinematic algorithm which computes these new goals in joint space. [sent-230, score-0.162]
81 When the new ball positions are not too distant from the original cartesian target, the modified trajectories reach the ball with swing motions very similar to those used for the demonstration. [sent-231, score-0.451]
82 An interesting aspect of locally weighted regression is that the regression parameters wi of each kernel i do not depend on the other kernels, since regression is based on a separate cost function for each kernel. [sent-235, score-0.276]
83 This means that kernel functions can be added or removed without affecting the parameters wi of the other kernels. [sent-236, score-0.125]
84 We here use this feature to perform movement recognition within a large variety of trajectories, based on a small subset of kernels at fixed locations c i in phase space. [sent-237, score-0.338]
85 other kernels generated by LWL makes them well-suited for comparing qualitative trajectory shapes. [sent-242, score-0.173]
86 To illustrate the possibility of using the CPs for movement recognition (i. [sent-243, score-0.267]
87 , recognition of spatiotemporal patterns, not just spatial patterns as in traditional character recognition), we carried out a simple task of fitting trajectories performed by a human user when drawing two-dimensional single-stroke patterns. [sent-245, score-0.277]
88 These characters are drawn in a single stroke, and are fed as a two-dimensional trajectory (x(t), y(t)) to be fitted by our system. [sent-247, score-0.193]
89 Fixed sets of five kernels per DOF were set aside for movement recognition. [sent-249, score-0.263]
90 The T wa w correlation |wa ||wbb | between their parameter vectors wa and wb of character a and b can be used to classify movements with similar velocity profiles (Figure 4, right). [sent-250, score-0.19]
91 These similarities in weight space can therefore serve as basis for recognizing demonstrated movements by fitting them and comparing the fitted parameters wi with those of previously learned policies in memory. [sent-252, score-0.499]
92 Further studies are required to evaluate the quality of recognition in larger training and test sets — what we wanted to demonstrate is the ability for recognition without any specific system tuning or sophisticated classification algorithm. [sent-257, score-0.139]
93 4 Conclusion Based on the analogy between autonomous differential equations and control policies, we presented a novel approach to learn control policies of basic movement skills by shaping the attractor landscape of nonlinear differential equations with statistical learning techniques. [sent-258, score-1.193]
94 8 0 2 4 6 8 10 12 14 16 18 20 1 N I P S Figure 4: Left: Examples of two-dimensional trajectories fitted by the CPs. [sent-295, score-0.175]
95 The demonstrated and fitted trajectories are shown with dotted and continuous lines, respectively. [sent-296, score-0.233]
96 can guarantee basic stability and convergence properties of the learned nonlinear systems. [sent-300, score-0.127]
97 We demonstrated the applicability of the suggested techniques by learning various movement skills for a complex humanoid robot by imitation learning, and illustrated the usefulness of the learned parameterization for recognition and classification of movement skills. [sent-301, score-1.212]
98 Future work will consider (1) learning of multidimensional control policies without assuming independence between the individual dimensions, and (2) the suitability of the linear parameterization of the control policies for reinforcement learning. [sent-302, score-0.624]
99 Nonlinear force fields: a distributed system of control primitives for representing and learning movements. [sent-313, score-0.209]
100 Movement imitation with nonlinear dynamical systems in humanoid robots. [sent-321, score-0.468]
wordName wordTfidf (topN-words)
[('rhythmic', 0.405), ('cps', 0.364), ('movement', 0.229), ('attractor', 0.228), ('humanoid', 0.194), ('trajectories', 0.175), ('ydemo', 0.162), ('robot', 0.152), ('policies', 0.139), ('trajectory', 0.139), ('wi', 0.125), ('dofs', 0.121), ('drumming', 0.121), ('erential', 0.12), ('control', 0.115), ('cp', 0.111), ('policy', 0.107), ('dynamical', 0.102), ('imitation', 0.101), ('landscapes', 0.101), ('lwl', 0.101), ('di', 0.101), ('canonical', 0.097), ('swing', 0.096), ('ci', 0.087), ('dof', 0.081), ('skills', 0.081), ('movements', 0.08), ('positions', 0.072), ('tted', 0.071), ('nonlinear', 0.071), ('erent', 0.071), ('landscape', 0.064), ('ym', 0.064), ('human', 0.064), ('system', 0.063), ('desired', 0.062), ('sin', 0.062), ('amplitude', 0.061), ('ijspeert', 0.061), ('cos', 0.06), ('discrete', 0.059), ('hi', 0.058), ('demonstrated', 0.058), ('learned', 0.056), ('robotics', 0.054), ('ball', 0.054), ('characters', 0.054), ('exemplary', 0.053), ('schaal', 0.053), ('scratch', 0.053), ('tennis', 0.053), ('perturbations', 0.049), ('dynamics', 0.049), ('beat', 0.048), ('arms', 0.048), ('modi', 0.047), ('correlation', 0.046), ('kinematic', 0.045), ('parameterization', 0.043), ('locally', 0.043), ('cycle', 0.042), ('reinforcement', 0.042), ('shaping', 0.042), ('mod', 0.042), ('period', 0.042), ('goal', 0.041), ('serve', 0.041), ('forehand', 0.04), ('gra', 0.04), ('manipulator', 0.04), ('nakanishi', 0.04), ('zdemo', 0.04), ('arm', 0.04), ('equations', 0.039), ('robots', 0.039), ('recognition', 0.038), ('phase', 0.037), ('regression', 0.036), ('letter', 0.036), ('elbow', 0.035), ('replay', 0.035), ('automation', 0.034), ('kernels', 0.034), ('velocity', 0.033), ('modulated', 0.033), ('alphabet', 0.033), ('limit', 0.033), ('track', 0.032), ('state', 0.032), ('kawato', 0.032), ('prevented', 0.032), ('atr', 0.032), ('jun', 0.032), ('instance', 0.032), ('classify', 0.031), ('learning', 0.031), ('recording', 0.031), ('recorded', 0.03), ('formulated', 0.03)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999976 123 nips-2002-Learning Attractor Landscapes for Learning Motor Primitives
Author: Auke J. Ijspeert, Jun Nakanishi, Stefan Schaal
Abstract: Many control problems take place in continuous state-action spaces, e.g., as in manipulator robotics, where the control objective is often defined as finding a desired trajectory that reaches a particular goal state. While reinforcement learning offers a theoretical framework to learn such control policies from scratch, its applicability to higher dimensional continuous state-action spaces remains rather limited to date. Instead of learning from scratch, in this paper we suggest to learn a desired complex control policy by transforming an existing simple canonical control policy. For this purpose, we represent canonical policies in terms of differential equations with well-defined attractor properties. By nonlinearly transforming the canonical attractor dynamics using techniques from nonparametric regression, almost arbitrary new nonlinear policies can be generated without losing the stability properties of the canonical system. We demonstrate our techniques in the context of learning a set of movement skills for a humanoid robot from demonstrations of a human teacher. Policies are acquired rapidly, and, due to the properties of well formulated differential equations, can be re-used and modified on-line under dynamic changes of the environment. The linear parameterization of nonparametric regression moreover lends itself to recognize and classify previously learned movement skills. Evaluations in simulations and on an actual 30 degree-offreedom humanoid robot exemplify the feasibility and robustness of our approach. 1
2 0.26188472 155 nips-2002-Nonparametric Representation of Policies and Value Functions: A Trajectory-Based Approach
Author: Christopher G. Atkeson, Jun Morimoto
Abstract: A longstanding goal of reinforcement learning is to develop nonparametric representations of policies and value functions that support rapid learning without suffering from interference or the curse of dimensionality. We have developed a trajectory-based approach, in which policies and value functions are represented nonparametrically along trajectories. These trajectories, policies, and value functions are updated as the value function becomes more accurate or as a model of the task is updated. We have applied this approach to periodic tasks such as hopping and walking, which required handling discount factors and discontinuities in the task dynamics, and using function approximation to represent value functions at discontinuities. We also describe extensions of the approach to make the policies more robust to modeling error and sensor noise.
3 0.14960641 9 nips-2002-A Minimal Intervention Principle for Coordinated Movement
Author: Emanuel Todorov, Michael I. Jordan
Abstract: Behavioral goals are achieved reliably and repeatedly with movements rarely reproducible in their detail. Here we offer an explanation: we show that not only are variability and goal achievement compatible, but indeed that allowing variability in redundant dimensions is the optimal control strategy in the face of uncertainty. The optimal feedback control laws for typical motor tasks obey a “minimal intervention” principle: deviations from the average trajectory are only corrected when they interfere with the task goals. The resulting behavior exhibits task-constrained variability, as well as synergetic coupling among actuators—which is another unexplained empirical phenomenon.
4 0.1270064 187 nips-2002-Spikernels: Embedding Spiking Neurons in Inner-Product Spaces
Author: Lavi Shpigelman, Yoram Singer, Rony Paz, Eilon Vaadia
Abstract: Inner-product operators, often referred to as kernels in statistical learning, define a mapping from some input space into a feature space. The focus of this paper is the construction of biologically-motivated kernels for cortical activities. The kernels we derive, termed Spikernels, map spike count sequences into an abstract vector space in which we can perform various prediction tasks. We discuss in detail the derivation of Spikernels and describe an efficient algorithm for computing their value on any two sequences of neural population spike counts. We demonstrate the merits of our modeling approach using the Spikernel and various standard kernels for the task of predicting hand movement velocities from cortical recordings. In all of our experiments all the kernels we tested outperform the standard scalar product used in regression with the Spikernel consistently achieving the best performance. 1
5 0.12471371 5 nips-2002-A Digital Antennal Lobe for Pattern Equalization: Analysis and Design
Author: Alex Holub, Gilles Laurent, Pietro Perona
Abstract: Re-mapping patterns in order to equalize their distribution may greatly simplify both the structure and the training of classifiers. Here, the properties of one such map obtained by running a few steps of discrete-time dynamical system are explored. The system is called 'Digital Antennal Lobe' (DAL) because it is inspired by recent studies of the antennallobe, a structure in the olfactory system of the grasshopper. The pattern-spreading properties of the DAL as well as its average behavior as a function of its (few) design parameters are analyzed by extending previous results of Van Vreeswijk and Sompolinsky. Furthermore, a technique for adapting the parameters of the initial design in order to obtain opportune noise-rejection behavior is suggested. Our results are demonstrated with a number of simulations. 1
6 0.11545468 82 nips-2002-Exponential Family PCA for Belief Compression in POMDPs
7 0.1126721 3 nips-2002-A Convergent Form of Approximate Policy Iteration
8 0.10500836 144 nips-2002-Minimax Differential Dynamic Programming: An Application to Robust Biped Walking
9 0.1009988 128 nips-2002-Learning a Forward Model of a Reflex
10 0.098911427 153 nips-2002-Neural Decoding of Cursor Motion Using a Kalman Filter
11 0.082978368 137 nips-2002-Location Estimation with a Differential Update Network
12 0.078674562 20 nips-2002-Adaptive Caching by Refetching
13 0.074283056 51 nips-2002-Classifying Patterns of Visual Motion - a Neuromorphic Approach
14 0.070911832 169 nips-2002-Real-Time Particle Filters
15 0.068777598 134 nips-2002-Learning to Take Concurrent Actions
16 0.068698421 13 nips-2002-A Note on the Representational Incompatibility of Function Approximation and Factored Dynamics
17 0.067519687 76 nips-2002-Dynamical Constraints on Computing with Spike Timing in the Cortex
18 0.06605351 55 nips-2002-Combining Features for BCI
19 0.063577987 159 nips-2002-Optimality of Reinforcement Learning Algorithms with Linear Function Approximation
20 0.061570369 33 nips-2002-Approximate Linear Programming for Average-Cost Dynamic Programming
topicId topicWeight
[(0, -0.191), (1, 0.032), (2, -0.196), (3, -0.062), (4, 0.016), (5, -0.009), (6, 0.081), (7, 0.034), (8, 0.193), (9, 0.184), (10, -0.165), (11, 0.075), (12, 0.055), (13, -0.07), (14, -0.114), (15, -0.026), (16, 0.016), (17, -0.081), (18, 0.028), (19, 0.043), (20, -0.026), (21, 0.019), (22, -0.0), (23, -0.069), (24, 0.142), (25, 0.03), (26, 0.081), (27, 0.065), (28, 0.077), (29, 0.045), (30, -0.118), (31, 0.027), (32, -0.049), (33, 0.012), (34, -0.021), (35, 0.019), (36, 0.05), (37, -0.014), (38, -0.023), (39, 0.04), (40, 0.014), (41, 0.021), (42, -0.024), (43, 0.059), (44, 0.007), (45, -0.04), (46, -0.021), (47, -0.01), (48, 0.001), (49, -0.059)]
simIndex simValue paperId paperTitle
same-paper 1 0.95707268 123 nips-2002-Learning Attractor Landscapes for Learning Motor Primitives
Author: Auke J. Ijspeert, Jun Nakanishi, Stefan Schaal
Abstract: Many control problems take place in continuous state-action spaces, e.g., as in manipulator robotics, where the control objective is often defined as finding a desired trajectory that reaches a particular goal state. While reinforcement learning offers a theoretical framework to learn such control policies from scratch, its applicability to higher dimensional continuous state-action spaces remains rather limited to date. Instead of learning from scratch, in this paper we suggest to learn a desired complex control policy by transforming an existing simple canonical control policy. For this purpose, we represent canonical policies in terms of differential equations with well-defined attractor properties. By nonlinearly transforming the canonical attractor dynamics using techniques from nonparametric regression, almost arbitrary new nonlinear policies can be generated without losing the stability properties of the canonical system. We demonstrate our techniques in the context of learning a set of movement skills for a humanoid robot from demonstrations of a human teacher. Policies are acquired rapidly, and, due to the properties of well formulated differential equations, can be re-used and modified on-line under dynamic changes of the environment. The linear parameterization of nonparametric regression moreover lends itself to recognize and classify previously learned movement skills. Evaluations in simulations and on an actual 30 degree-offreedom humanoid robot exemplify the feasibility and robustness of our approach. 1
2 0.82776356 155 nips-2002-Nonparametric Representation of Policies and Value Functions: A Trajectory-Based Approach
Author: Christopher G. Atkeson, Jun Morimoto
Abstract: A longstanding goal of reinforcement learning is to develop nonparametric representations of policies and value functions that support rapid learning without suffering from interference or the curse of dimensionality. We have developed a trajectory-based approach, in which policies and value functions are represented nonparametrically along trajectories. These trajectories, policies, and value functions are updated as the value function becomes more accurate or as a model of the task is updated. We have applied this approach to periodic tasks such as hopping and walking, which required handling discount factors and discontinuities in the task dynamics, and using function approximation to represent value functions at discontinuities. We also describe extensions of the approach to make the policies more robust to modeling error and sensor noise.
3 0.74244475 144 nips-2002-Minimax Differential Dynamic Programming: An Application to Robust Biped Walking
Author: Jun Morimoto, Christopher G. Atkeson
Abstract: We developed a robust control policy design method in high-dimensional state space by using differential dynamic programming with a minimax criterion. As an example, we applied our method to a simulated five link biped robot. The results show lower joint torques from the optimal control policy compared to a hand-tuned PD servo controller. Results also show that the simulated biped robot can successfully walk with unknown disturbances that cause controllers generated by standard differential dynamic programming and the hand-tuned PD servo to fail. Learning to compensate for modeling error and previously unknown disturbances in conjunction with robust control design is also demonstrated.
4 0.69700944 9 nips-2002-A Minimal Intervention Principle for Coordinated Movement
Author: Emanuel Todorov, Michael I. Jordan
Abstract: Behavioral goals are achieved reliably and repeatedly with movements rarely reproducible in their detail. Here we offer an explanation: we show that not only are variability and goal achievement compatible, but indeed that allowing variability in redundant dimensions is the optimal control strategy in the face of uncertainty. The optimal feedback control laws for typical motor tasks obey a “minimal intervention” principle: deviations from the average trajectory are only corrected when they interfere with the task goals. The resulting behavior exhibits task-constrained variability, as well as synergetic coupling among actuators—which is another unexplained empirical phenomenon.
5 0.59850317 5 nips-2002-A Digital Antennal Lobe for Pattern Equalization: Analysis and Design
Author: Alex Holub, Gilles Laurent, Pietro Perona
Abstract: Re-mapping patterns in order to equalize their distribution may greatly simplify both the structure and the training of classifiers. Here, the properties of one such map obtained by running a few steps of discrete-time dynamical system are explored. The system is called 'Digital Antennal Lobe' (DAL) because it is inspired by recent studies of the antennallobe, a structure in the olfactory system of the grasshopper. The pattern-spreading properties of the DAL as well as its average behavior as a function of its (few) design parameters are analyzed by extending previous results of Van Vreeswijk and Sompolinsky. Furthermore, a technique for adapting the parameters of the initial design in order to obtain opportune noise-rejection behavior is suggested. Our results are demonstrated with a number of simulations. 1
6 0.57152325 128 nips-2002-Learning a Forward Model of a Reflex
7 0.49246225 33 nips-2002-Approximate Linear Programming for Average-Cost Dynamic Programming
8 0.4668242 153 nips-2002-Neural Decoding of Cursor Motion Using a Kalman Filter
9 0.43307382 187 nips-2002-Spikernels: Embedding Spiking Neurons in Inner-Product Spaces
10 0.41897053 160 nips-2002-Optoelectronic Implementation of a FitzHugh-Nagumo Neural Model
11 0.40340179 20 nips-2002-Adaptive Caching by Refetching
12 0.39998078 3 nips-2002-A Convergent Form of Approximate Policy Iteration
13 0.39433527 137 nips-2002-Location Estimation with a Differential Update Network
14 0.36188906 51 nips-2002-Classifying Patterns of Visual Motion - a Neuromorphic Approach
15 0.3586764 134 nips-2002-Learning to Take Concurrent Actions
16 0.35019332 13 nips-2002-A Note on the Representational Incompatibility of Function Approximation and Factored Dynamics
17 0.32905066 194 nips-2002-The Decision List Machine
19 0.31773666 82 nips-2002-Exponential Family PCA for Belief Compression in POMDPs
20 0.31301785 47 nips-2002-Branching Law for Axons
topicId topicWeight
[(11, 0.015), (23, 0.065), (34, 0.29), (42, 0.055), (54, 0.1), (55, 0.051), (57, 0.025), (64, 0.012), (67, 0.02), (68, 0.064), (74, 0.084), (86, 0.01), (87, 0.012), (92, 0.025), (98, 0.084)]
simIndex simValue paperId paperTitle
same-paper 1 0.7753166 123 nips-2002-Learning Attractor Landscapes for Learning Motor Primitives
Author: Auke J. Ijspeert, Jun Nakanishi, Stefan Schaal
Abstract: Many control problems take place in continuous state-action spaces, e.g., as in manipulator robotics, where the control objective is often defined as finding a desired trajectory that reaches a particular goal state. While reinforcement learning offers a theoretical framework to learn such control policies from scratch, its applicability to higher dimensional continuous state-action spaces remains rather limited to date. Instead of learning from scratch, in this paper we suggest to learn a desired complex control policy by transforming an existing simple canonical control policy. For this purpose, we represent canonical policies in terms of differential equations with well-defined attractor properties. By nonlinearly transforming the canonical attractor dynamics using techniques from nonparametric regression, almost arbitrary new nonlinear policies can be generated without losing the stability properties of the canonical system. We demonstrate our techniques in the context of learning a set of movement skills for a humanoid robot from demonstrations of a human teacher. Policies are acquired rapidly, and, due to the properties of well formulated differential equations, can be re-used and modified on-line under dynamic changes of the environment. The linear parameterization of nonparametric regression moreover lends itself to recognize and classify previously learned movement skills. Evaluations in simulations and on an actual 30 degree-offreedom humanoid robot exemplify the feasibility and robustness of our approach. 1
2 0.6322965 27 nips-2002-An Impossibility Theorem for Clustering
Author: Jon M. Kleinberg
Abstract: Although the study of clustering is centered around an intuitively compelling goal, it has been very difficult to develop a unified framework for reasoning about it at a technical level, and profoundly diverse approaches to clustering abound in the research community. Here we suggest a formal perspective on the difficulty in finding such a unification, in the form of an impossibility theorem: for a set of three simple properties, we show that there is no clustering function satisfying all three. Relaxations of these properties expose some of the interesting (and unavoidable) trade-offs at work in well-studied clustering techniques such as single-linkage, sum-of-pairs, k-means, and k-median. 1
3 0.53137833 5 nips-2002-A Digital Antennal Lobe for Pattern Equalization: Analysis and Design
Author: Alex Holub, Gilles Laurent, Pietro Perona
Abstract: Re-mapping patterns in order to equalize their distribution may greatly simplify both the structure and the training of classifiers. Here, the properties of one such map obtained by running a few steps of discrete-time dynamical system are explored. The system is called 'Digital Antennal Lobe' (DAL) because it is inspired by recent studies of the antennallobe, a structure in the olfactory system of the grasshopper. The pattern-spreading properties of the DAL as well as its average behavior as a function of its (few) design parameters are analyzed by extending previous results of Van Vreeswijk and Sompolinsky. Furthermore, a technique for adapting the parameters of the initial design in order to obtain opportune noise-rejection behavior is suggested. Our results are demonstrated with a number of simulations. 1
4 0.52118093 11 nips-2002-A Model for Real-Time Computation in Generic Neural Microcircuits
Author: Wolfgang Maass, Thomas Natschläger, Henry Markram
Abstract: A key challenge for neural modeling is to explain how a continuous stream of multi-modal input from a rapidly changing environment can be processed by stereotypical recurrent circuits of integrate-and-fire neurons in real-time. We propose a new computational model that is based on principles of high dimensional dynamical systems in combination with statistical learning theory. It can be implemented on generic evolved or found recurrent circuitry.
5 0.52036399 141 nips-2002-Maximally Informative Dimensions: Analyzing Neural Responses to Natural Signals
Author: Tatyana Sharpee, Nicole C. Rust, William Bialek
Abstract: unkown-abstract
6 0.51960158 76 nips-2002-Dynamical Constraints on Computing with Spike Timing in the Cortex
7 0.5170064 10 nips-2002-A Model for Learning Variance Components of Natural Images
8 0.51680756 44 nips-2002-Binary Tuning is Optimal for Neural Rate Coding with High Temporal Resolution
9 0.51615548 28 nips-2002-An Information Theoretic Approach to the Functional Classification of Neurons
10 0.51530218 148 nips-2002-Morton-Style Factorial Coding of Color in Primary Visual Cortex
11 0.51215911 3 nips-2002-A Convergent Form of Approximate Policy Iteration
12 0.51039785 62 nips-2002-Coulomb Classifiers: Generalizing Support Vector Machines via an Analogy to Electrostatic Systems
13 0.5102669 127 nips-2002-Learning Sparse Topographic Representations with Products of Student-t Distributions
14 0.50969338 55 nips-2002-Combining Features for BCI
15 0.5094344 82 nips-2002-Exponential Family PCA for Belief Compression in POMDPs
16 0.5089097 2 nips-2002-A Bilinear Model for Sparse Coding
17 0.50818133 187 nips-2002-Spikernels: Embedding Spiking Neurons in Inner-Product Spaces
18 0.50811416 169 nips-2002-Real-Time Particle Filters
19 0.50761408 68 nips-2002-Discriminative Densities from Maximum Contrast Estimation
20 0.50755835 137 nips-2002-Location Estimation with a Differential Update Network