nips nips2013 nips2013-255 knowledge-graph by maker-knowledge-mining

255 nips-2013-Probabilistic Movement Primitives


Source: pdf

Author: Alexandros Paraschos, Christian Daniel, January Peters, Gerhard Neumann

Abstract: Movement Primitives (MP) are a well-established approach for representing modular and re-usable robot movement generators. Many state-of-the-art robot learning successes are based MPs, due to their compact representation of the inherently continuous and high dimensional robot movements. A major goal in robot learning is to combine multiple MPs as building blocks in a modular control architecture to solve complex tasks. To this effect, a MP representation has to allow for blending between motions, adapting to altered task variables, and co-activating multiple MPs in parallel. We present a probabilistic formulation of the MP concept that maintains a distribution over trajectories. Our probabilistic approach allows for the derivation of new operations which are essential for implementing all aforementioned properties in one framework. In order to use such a trajectory distribution for robot movement control, we analytically derive a stochastic feedback controller which reproduces the given trajectory distribution. We evaluate and compare our approach to existing methods on several simulated as well as real robot scenarios. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 de Abstract Movement Primitives (MP) are a well-established approach for representing modular and re-usable robot movement generators. [sent-4, score-0.75]

2 Many state-of-the-art robot learning successes are based MPs, due to their compact representation of the inherently continuous and high dimensional robot movements. [sent-5, score-0.556]

3 A major goal in robot learning is to combine multiple MPs as building blocks in a modular control architecture to solve complex tasks. [sent-6, score-0.43]

4 To this effect, a MP representation has to allow for blending between motions, adapting to altered task variables, and co-activating multiple MPs in parallel. [sent-7, score-0.167]

5 In order to use such a trajectory distribution for robot movement control, we analytically derive a stochastic feedback controller which reproduces the given trajectory distribution. [sent-10, score-1.466]

6 We evaluate and compare our approach to existing methods on several simulated as well as real robot scenarios. [sent-11, score-0.278]

7 The aim of MPs is to allow for composing complex robot skills out of elemental movements with a modular control architecture. [sent-19, score-0.603]

8 Hence, we require a MP architecture that supports parallel activation and smooth blending of MPs for composing complex movements of sequentially [9] and simultaneously [10] activated primitives. [sent-20, score-0.358]

9 Moreover, adaptation to a new task or a new situation requires modulation of the MP to an altered desired target position, target velocity or via-points [3]. [sent-21, score-0.263]

10 Additionally, the execution speed of the movement needs to be adjustable to change the speed of, for example, a ball-hitting movement. [sent-22, score-0.507]

11 As we want to learn the movement from data, another crucial requirement is that the parameters of the MPs should be straightforward to learn from demonstrations as well as through trial and error for reinforcement learning approaches. [sent-23, score-0.593]

12 However, this approach heavily depends on the quality of the used planner and the 1 movement can not be temporally scaled. [sent-27, score-0.396]

13 [12, 16] use a combination of primitives, yet, their control policy of the MP is based on heuristics and it is unclear how the combination of MPs affects the resulting movements. [sent-30, score-0.177]

14 In this paper, we introduce the concept of probabilistic movement primitives (ProMPs) as a general probabilistic framework for representing and learning MPs. [sent-31, score-0.702]

15 For example, modulation of a movement to a novel target can be realized by conditioning on the desired target’s positions or velocities. [sent-34, score-0.734]

16 Similarly, consistent parallel activation of two elementary behaviors can be accomplished by a product of two independent trajectory probability distributions. [sent-35, score-0.307]

17 Moreover, a trajectory distribution can also encode the variance of the movement, and, hence, a ProMP can often directly encode optimal behavior in stochastic systems [17]. [sent-36, score-0.291]

18 Finally, a probabilistic framework allows us to model the covariance between trajectories of different degrees of freedom, that can be used to couple the joints of the robot. [sent-37, score-0.197]

19 Such properties of trajectory distributions have so far not been properly exploited for representing and learning MPs. [sent-38, score-0.306]

20 The main reason for the absence of such an approach has been the difficulty of extracting a policy for controlling the robot from a trajectory distribution. [sent-39, score-0.562]

21 We show how this step can be accomplished and derive a control policy that exactly reproduces a given trajectory distribution. [sent-40, score-0.42]

22 2 Probabilistic Movement Primitives (ProMPs) A movement primitive representation should exhibit several desirable properties, such as co- Table 1: Desirable properties and their implemenactivation, adaptability and optimality in order tation in the ProMP to be a powerful MP representation. [sent-44, score-0.493]

23 As Rhythmic Movements Periodic Basis crucial part of our objective, we will introduce conditioning and a product of ProMPs as new operations that can be applied on the ProMPs due to the probabilistic formulation. [sent-50, score-0.173]

24 Finally, we show how to derive a controller which follows a given trajectory distribution. [sent-51, score-0.399]

25 1 Probabilistic Trajectory Representation We model a single movement execution as a trajectory τ = {qt }t=0. [sent-53, score-0.693]

26 Our movement primitive representation models the time-varying variance of the trajectories to be able to capture multiple demonstrations with high-variability. [sent-59, score-0.723]

27 Representing the variance information is crucial as it reflects the importance of 2 single time points for the movement execution and it is often a requirement for representing optimal behavior in stochastic systems [17]. [sent-60, score-0.519]

28 The trajectory distribution p(τ ; θ) can now be computed ´ by marginalizing out the weight vector w, i. [sent-68, score-0.248]

29 We introduce a phase variable z to decouple the movement from the time signal as for previous non-probabilistic approaches [18]. [sent-74, score-0.458]

30 Without loss of generality, we define the phase as z0 = 0 at the beginning of the movement and as zT = 1 at the end. [sent-77, score-0.458]

31 The choice of the basis functions depends on the type of movement, which can be either rhythmic or stroke-based. [sent-80, score-0.203]

32 For stroke-based movements, we use Gaussian basis functions bG , while for rhythmic movements we use Von-Mises basis functions bVM i i to model periodicity in the phase variable z, i. [sent-81, score-0.439]

33 However, for many tasks we have to coordinate the movement of the joints. [sent-87, score-0.396]

34 A common way to implement such coordination is via the phase variable zt that couples the mean of the trajectory distribution [18]. [sent-88, score-0.375]

35 As a ProMP represents multiple ways to execute an elemental movement, we also need multiple demonstrations to learn p(w; θ). [sent-119, score-0.192]

36 The parameters θ = {µw , Σw } can be learned from multiple demonstrations by maximum likelihood estimation, for example, by using the expectation maximization algorithm for HBMs with Gaussian distributions [19]. [sent-120, score-0.21]

37 , conditioning for modulating the trajectory and a product of distributions for co-activating MPs. [sent-124, score-0.416]

38 In our probabilistic formulation, such operations can be described by conditioning the MP to reach a certain state y ∗ at time t. [sent-128, score-0.21]

39 For example, by specifying a desired joint position q1 for the first joint the t trajectory distribution will automatically infer the most probable joint positions for the other joints. [sent-134, score-0.479]

40 For Gaussian trajectory distributions the conditional distribution p (w|x∗ ) for w is Gaussian with t mean and variance [new] µw = µw + Σw Ψt Σ∗ + ΨT Σw Ψt y t [new] Σw = Σw − Σw Ψt Σ∗ + ΨT Σw Ψt t y −1 −1 y ∗ − ΨT µw , t t Ψ T Σw . [sent-135, score-0.318]

41 We can see that, despite the modulation of the ProMP by conditioning, the ProMP stays within the original distribution, and, hence, the modulation is also learned from the original demonstrations. [sent-137, score-0.222]

42 , the part of the trajectory space where all MPs have high probability mass. [sent-147, score-0.248]

43 However, we also want to be able to modulate the activations of the primitives, for example, to continuously blend the movement execution from one primitive to the next. [sent-148, score-0.667]

44 Hence, we decompose [i] the trajectory into single time steps and use time-varying activation functions αt , i. [sent-149, score-0.307]

45 The blue shaded area represents the learned trajectory distribution. [sent-165, score-0.341]

46 The trajectory distributions are indicated by the blue and red shaded areas. [sent-169, score-0.342]

47 Both primitives have to reach via-points at different points in time, indicated by the ‘x’-markers. [sent-170, score-0.204]

48 We co-activate both primitives with the same activation factor. [sent-171, score-0.226]

49 The trajectory distribution generated by the resulting feedback controller now goes through all four via-points. [sent-172, score-0.483]

50 We smoothly blend from the red primitive to the blue primitive. [sent-174, score-0.184]

51 The resulting movement (green) first follows the red primitive and, subsequently, switches to following the blue primitive. [sent-176, score-0.523]

52 3 Using Trajectory Distributions for Robot Control In order to fully exploit the properties of trajectory distributions, a policy for controlling the robot is needed that reproduces these distributions. [sent-178, score-0.623]

53 To this effect, we analytically derivate a stochastic feedback controller that can accurately reproduce the mean vectors µt and the variances Σt for all t of a given trajectory distribution. [sent-179, score-0.483]

54 We assume a stochastic linear feedback controller with time varying feedback gains is generating the control actions, i. [sent-182, score-0.42]

55 (9), we rewrite the next state of the system as y t+dt = (I + (At + B t K t ) dt) y t + B t dt(kt + with F t = (I + (At + B t K t ) dt) , u) + cdt = F t y t + f t + B t dt f t = B t kt dt + cdt. [sent-188, score-0.744]

56 2 As we multiply the noise by Bdt, we need to divide the covariance Σu of the control noise u by dt to obtain this desired behavior. [sent-193, score-0.517]

57 12 are Gaussian distributions, where the left-hand side can also be computed by our desired trajectory distribution p(τ ; θ). [sent-197, score-0.315]

58 Σt+dt − Σt = By rearranging terms, the covariance constraint becomes T Σs dt + (A + BK) Σt dt + Σt (A + BK) dt + O(dt2 ), (14) where O(dt2 ) denotes all second order terms in dt. [sent-204, score-1.091]

59 After dividing by dt and taking the limit of dt → 0, the second order terms disappear and we obtain the time derivative of the covariance Σt+dt − Σt T ˙ Σt = lim = (A + BK)Σt + Σt (A + BK) + Σs . [sent-205, score-0.714]

60 dt→0 dt (15) ˙ ˙ ˙T ˙ The matrix Σt can also be obtained from the trajectory distribution Σt = Ψt Σw Ψt + ΨT Σw Ψt , t which we substitute into Eq. [sent-206, score-0.587]

61 Similarly, we obtain the feed-forward control signal k by matching the mean of the trajectory distribution µt+dt with the mean computed with the forward model. [sent-211, score-0.323]

62 After rearranging terms, dividing by dt and taking the limit of dt → 0, we arrive at the continuous time constraint for the vector k, ˙ µt = (A + BK)µt + Bk + c. [sent-212, score-0.716]

63 (18) ˙ ˙ We can again use the trajectory distribution p(τ ; θ) to obtain µt = Ψt µw and µt = Ψt µw and solve Eq. [sent-213, score-0.248]

64 In order to match a trajectory distribution, we also need to match the control noise matrix Σu which has been applied to generate the distribution. [sent-215, score-0.375]

65 We first compute the system noise covariance Σs = BΣu B T by examining the cross-correlation between time steps of the trajectory distribution. [sent-216, score-0.316]

66 The joint distribution for y t and y t+dt is obtained by our system dynamics by p y t , y t+dt = N (y t |µt , Σt ) N y t+dt |F y t + f , Σu which yields p y t , y t+dt = N yt y t+dt µt F µt + f , Σt F T F Σt F T + Σs dt Σt F Σt . [sent-219, score-0.401]

67 (20) and (21), Σs dt = Σt+dt − F Σt F T = Σt+dt − F Σt Σ−1 Σt F T = Σt+dt − C T Σ−1 C t t t t † †T (22) The variance Σu of the control noise is then given by Σu = B Σs B . [sent-221, score-0.457]

68 (22) the variance of our stochastic feedback controller does not depend on the controller gains and can be pre-computed before estimating the controller gains. [sent-223, score-0.606]

69 0s 6 4 2 y−axis [m] 0 6 4 2 0 6 4 2 0 −2 0 2 4 6 −2 0 2 4 6 −2 0 2 4 x−axis [m] 6 −2 0 2 4 6 −2 0 2 4 6 Figure 2: A 7-link planar robot has to reach a target position at T = 1. [sent-228, score-0.412]

70 The plot shows the mean posture of the robot at different time steps in black and samples generated by the ProMP in gray. [sent-232, score-0.278]

71 The resulting movement reached both viapoints with high accuracy. [sent-235, score-0.396]

72 We demonstrate ten straight shots for varying distances and ten shots for varying angles. [sent-238, score-0.367]

73 The pictures show samples from the ProMP model for straight shots (b) and angled shots (c). [sent-239, score-0.311]

74 Multiplying the individual models leads to a model that only reproduces shots where both models had probability mass, in the center at medium distance (e). [sent-241, score-0.2]

75 3 Experiments We evaluated our approach on two different real robot tasks, one stroke based movement and one rhythmic movements. [sent-243, score-0.859]

76 For all real robot experiments we use a seven degrees of freedom KUKA lightweight robot arm. [sent-245, score-0.584]

77 In this task, a seven link planar robot has to reach a target position in end-effector space. [sent-248, score-0.44]

78 We generated the demonstrations for learning the MPs with an optimal control law [22]. [sent-250, score-0.232]

79 In the first set of demonstrations, the robot has to reach the via-point at t1 = 0. [sent-251, score-0.315]

80 We learned the coupling of all seven joints with one ProMP. [sent-254, score-0.172]

81 Moreover, the ProMP could also reproduce the coupling of the joints from the optimal control law which can be seen by the small variance of the end-effector in comparison to the rather large variance of the single joints at the via-points. [sent-256, score-0.356]

82 We combined the ProMPs learned from both demonstrations, which resulted in the movement illustrated in Figure 2(bottom). [sent-260, score-0.45]

83 By modulating the speed of the phase signal zt , the speed of the movement can be adapted. [sent-284, score-0.633]

84 The plot shows the desired distribution in blue and the generated distribution from the feedback controller in green. [sent-285, score-0.332]

85 (c) Blending between two rhythmic movements (blue and red shaded areas) for playing maracas. [sent-287, score-0.322]

86 In the hockey task, the robot has to shoot a hockey puck in different directions and distances. [sent-290, score-0.405]

87 We record two different sets of demonstrations, one that contains straight shots with varying distances while the second set contains shots with a varying shooting angle. [sent-292, score-0.311]

88 Sampling from the two models generated by the different data sets yields shots that exhibit the demonstrated variance in either angle or distance, as shown in Figure 3(b) and 3(c). [sent-294, score-0.213]

89 Demonstrating fast movements can be difficult on the robot arm, due to the inertia of the arm. [sent-302, score-0.406]

90 Instead, we demonstrate a slower movement of ten periods to learn the motion. [sent-303, score-0.424]

91 We use this slow demonstration and change the phase after learning the model to achieve a shaking movement of appropriate speed to generate the desired sound of the instrument. [sent-304, score-0.687]

92 We show an example movement of the robot in Figure 4(a). [sent-306, score-0.674]

93 The desired trajectory distribution of the rhythmic movement and the resulting distribution generated from the feedback controller are shown in Figure 4(b). [sent-307, score-1.103]

94 We also demonstrated a second type of rhythmic shaking movement which we use to continuously blend between both movements to produce different sounds. [sent-309, score-0.816]

95 4 Conclusion Probabilistic movement primitives are a promising approach for learning, modulating, and re-using movements in a modular control architecture. [sent-311, score-0.811]

96 To effectively take advantage of such a control architecture, ProMPs support simultaneous activation, match the quality of the encoded behavior from the demonstrations, are able to adapt to different desired target positions, and efficiently learn by imitation. [sent-312, score-0.203]

97 We parametrize the desired trajectory distribution of the primitive by a Hierarchical Bayesian Model with Gaussian distributions. [sent-313, score-0.412]

98 The trajectory distribution can be easily obtained from demonstrations. [sent-314, score-0.248]

99 Our probabilistic formulation allows for new operations for movement primitives, including conditioning and combination of primitives. [sent-315, score-0.602]

100 Future work will focus on using the ProMPs in a modular control architecture and improving upon imitation learning by reinforcement learning. [sent-316, score-0.192]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('movement', 0.396), ('dt', 0.339), ('robot', 0.278), ('promps', 0.261), ('trajectory', 0.248), ('mps', 0.23), ('promp', 0.226), ('primitives', 0.167), ('demonstrations', 0.157), ('rhythmic', 0.157), ('controller', 0.151), ('mp', 0.15), ('blending', 0.139), ('shots', 0.139), ('movements', 0.128), ('modulation', 0.098), ('primitive', 0.097), ('conditioning', 0.093), ('bk', 0.092), ('demonstration', 0.085), ('feedback', 0.084), ('joints', 0.077), ('control', 0.075), ('desired', 0.067), ('zt', 0.065), ('phase', 0.062), ('reproduces', 0.061), ('activation', 0.059), ('robotics', 0.057), ('blend', 0.057), ('motor', 0.054), ('qt', 0.054), ('probabilistic', 0.054), ('maracas', 0.052), ('rad', 0.051), ('execution', 0.049), ('neumann', 0.048), ('modulating', 0.048), ('basis', 0.046), ('hockey', 0.046), ('calinon', 0.046), ('shaking', 0.046), ('shoots', 0.046), ('positions', 0.045), ('modular', 0.045), ('variance', 0.043), ('skills', 0.042), ('coupling', 0.041), ('reinforcement', 0.04), ('robots', 0.039), ('rearranging', 0.038), ('reach', 0.037), ('pi', 0.037), ('shaded', 0.037), ('policy', 0.036), ('modulate', 0.036), ('covariance', 0.036), ('ct', 0.035), ('target', 0.035), ('bvm', 0.035), ('darmstadt', 0.035), ('elemental', 0.035), ('kormushev', 0.035), ('paraschos', 0.035), ('puck', 0.035), ('rozo', 0.035), ('kt', 0.034), ('combination', 0.033), ('straight', 0.033), ('planar', 0.033), ('system', 0.032), ('velocities', 0.032), ('architecture', 0.032), ('continuously', 0.032), ('speed', 0.031), ('representing', 0.031), ('nakanishi', 0.031), ('angle', 0.031), ('blue', 0.03), ('periodic', 0.03), ('trajectories', 0.03), ('joint', 0.03), ('angles', 0.029), ('peters', 0.029), ('position', 0.029), ('toussaint', 0.028), ('bg', 0.028), ('stroke', 0.028), ('altered', 0.028), ('konidaris', 0.028), ('illustrated', 0.028), ('seven', 0.028), ('ten', 0.028), ('distributions', 0.027), ('gains', 0.026), ('learned', 0.026), ('operations', 0.026), ('match', 0.026), ('reproduced', 0.025), ('kober', 0.025)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000002 255 nips-2013-Probabilistic Movement Primitives

Author: Alexandros Paraschos, Christian Daniel, January Peters, Gerhard Neumann

Abstract: Movement Primitives (MP) are a well-established approach for representing modular and re-usable robot movement generators. Many state-of-the-art robot learning successes are based MPs, due to their compact representation of the inherently continuous and high dimensional robot movements. A major goal in robot learning is to combine multiple MPs as building blocks in a modular control architecture to solve complex tasks. To this effect, a MP representation has to allow for blending between motions, adapting to altered task variables, and co-activating multiple MPs in parallel. We present a probabilistic formulation of the MP concept that maintains a distribution over trajectories. Our probabilistic approach allows for the derivation of new operations which are essential for implementing all aforementioned properties in one framework. In order to use such a trajectory distribution for robot movement control, we analytically derive a stochastic feedback controller which reproduces the given trajectory distribution. We evaluate and compare our approach to existing methods on several simulated as well as real robot scenarios. 1

2 0.26970196 162 nips-2013-Learning Trajectory Preferences for Manipulators via Iterative Improvement

Author: Ashesh Jain, Brian Wojcik, Thorsten Joachims, Ashutosh Saxena

Abstract: We consider the problem of learning good trajectories for manipulation tasks. This is challenging because the criterion defining a good trajectory varies with users, tasks and environments. In this paper, we propose a co-active online learning framework for teaching robots the preferences of its users for object manipulation tasks. The key novelty of our approach lies in the type of feedback expected from the user: the human user does not need to demonstrate optimal trajectories as training data, but merely needs to iteratively provide trajectories that slightly improve over the trajectory currently proposed by the system. We argue that this co-active preference feedback can be more easily elicited from the user than demonstrations of optimal trajectories, which are often challenging and non-intuitive to provide on high degrees of freedom manipulators. Nevertheless, theoretical regret bounds of our algorithm match the asymptotic rates of optimal trajectory algorithms. We demonstrate the generalizability of our algorithm on a variety of grocery checkout tasks, for whom, the preferences were not only influenced by the object being manipulated but also by the surrounding environment.1 1

3 0.12030232 165 nips-2013-Learning from Limited Demonstrations

Author: Beomjoon Kim, Amir massoud Farahmand, Joelle Pineau, Doina Precup

Abstract: We propose a Learning from Demonstration (LfD) algorithm which leverages expert data, even if they are very few or inaccurate. We achieve this by using both expert data, as well as reinforcement signals gathered through trial-and-error interactions with the environment. The key idea of our approach, Approximate Policy Iteration with Demonstration (APID), is that expert’s suggestions are used to define linear constraints which guide the optimization performed by Approximate Policy Iteration. We prove an upper bound on the Bellman error of the estimate computed by APID at each iteration. Moreover, we show empirically that APID outperforms pure Approximate Policy Iteration, a state-of-the-art LfD algorithm, and supervised learning in a variety of scenarios, including when very few and/or suboptimal demonstrations are available. Our experiments include simulations as well as a real robot path-finding task. 1

4 0.10851076 348 nips-2013-Variational Policy Search via Trajectory Optimization

Author: Sergey Levine, Vladlen Koltun

Abstract: In order to learn effective control policies for dynamical systems, policy search methods must be able to discover successful executions of the desired task. While random exploration can work well in simple domains, complex and highdimensional tasks present a serious challenge, particularly when combined with high-dimensional policies that make parameter-space exploration infeasible. We present a method that uses trajectory optimization as a powerful exploration strategy that guides the policy search. A variational decomposition of a maximum likelihood policy objective allows us to use standard trajectory optimization algorithms such as differential dynamic programming, interleaved with standard supervised learning for the policy itself. We demonstrate that the resulting algorithm can outperform prior methods on two challenging locomotion tasks. 1

5 0.090739958 100 nips-2013-Dynamic Clustering via Asymptotics of the Dependent Dirichlet Process Mixture

Author: Trevor Campbell, Miao Liu, Brian Kulis, Jonathan P. How, Lawrence Carin

Abstract: This paper presents a novel algorithm, based upon the dependent Dirichlet process mixture model (DDPMM), for clustering batch-sequential data containing an unknown number of evolving clusters. The algorithm is derived via a lowvariance asymptotic analysis of the Gibbs sampling algorithm for the DDPMM, and provides a hard clustering with convergence guarantees similar to those of the k-means algorithm. Empirical results from a synthetic test with moving Gaussian clusters and a test with real ADS-B aircraft trajectory data demonstrate that the algorithm requires orders of magnitude less computational time than contemporary probabilistic and hard clustering algorithms, while providing higher accuracy on the examined datasets. 1

6 0.087817125 21 nips-2013-Action from Still Image Dataset and Inverse Optimal Control to Learn Task Specific Visual Scanpaths

7 0.074841179 16 nips-2013-A message-passing algorithm for multi-agent trajectory planning

8 0.074794136 257 nips-2013-Projected Natural Actor-Critic

9 0.069444887 250 nips-2013-Policy Shaping: Integrating Human Feedback with Reinforcement Learning

10 0.06699647 48 nips-2013-Bayesian Inference and Learning in Gaussian Process State-Space Models with Particle MCMC

11 0.064309195 235 nips-2013-Online learning in episodic Markovian decision processes by relative entropy policy search

12 0.063883014 150 nips-2013-Learning Adaptive Value of Information for Structured Prediction

13 0.059614178 28 nips-2013-Adaptive Step-Size for Policy Gradient Methods

14 0.057219602 226 nips-2013-One-shot learning by inverting a compositional causal process

15 0.057144154 39 nips-2013-Approximate Gaussian process inference for the drift function in stochastic differential equations

16 0.052320763 17 nips-2013-A multi-agent control framework for co-adaptation in brain-computer interfaces

17 0.051746447 298 nips-2013-Small-Variance Asymptotics for Hidden Markov Models

18 0.045058142 69 nips-2013-Context-sensitive active sensing in humans

19 0.044214856 239 nips-2013-Optimistic policy iteration and natural actor-critic: A unifying view and a non-optimality result

20 0.043453015 267 nips-2013-Recurrent networks of coupled Winner-Take-All oscillators for solving constraint satisfaction problems


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.134), (1, -0.053), (2, -0.055), (3, 0.0), (4, -0.037), (5, 0.001), (6, 0.0), (7, 0.053), (8, 0.008), (9, -0.042), (10, -0.038), (11, -0.058), (12, -0.016), (13, 0.022), (14, -0.1), (15, 0.064), (16, -0.073), (17, -0.005), (18, 0.012), (19, 0.029), (20, -0.021), (21, -0.085), (22, 0.019), (23, 0.026), (24, -0.058), (25, -0.04), (26, 0.0), (27, 0.057), (28, -0.002), (29, -0.055), (30, -0.003), (31, 0.067), (32, 0.114), (33, -0.07), (34, -0.044), (35, -0.037), (36, -0.097), (37, -0.062), (38, -0.029), (39, -0.144), (40, -0.104), (41, 0.099), (42, -0.05), (43, -0.146), (44, 0.151), (45, -0.035), (46, -0.242), (47, -0.147), (48, 0.106), (49, -0.116)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95237523 255 nips-2013-Probabilistic Movement Primitives

Author: Alexandros Paraschos, Christian Daniel, January Peters, Gerhard Neumann

Abstract: Movement Primitives (MP) are a well-established approach for representing modular and re-usable robot movement generators. Many state-of-the-art robot learning successes are based MPs, due to their compact representation of the inherently continuous and high dimensional robot movements. A major goal in robot learning is to combine multiple MPs as building blocks in a modular control architecture to solve complex tasks. To this effect, a MP representation has to allow for blending between motions, adapting to altered task variables, and co-activating multiple MPs in parallel. We present a probabilistic formulation of the MP concept that maintains a distribution over trajectories. Our probabilistic approach allows for the derivation of new operations which are essential for implementing all aforementioned properties in one framework. In order to use such a trajectory distribution for robot movement control, we analytically derive a stochastic feedback controller which reproduces the given trajectory distribution. We evaluate and compare our approach to existing methods on several simulated as well as real robot scenarios. 1

2 0.87472296 162 nips-2013-Learning Trajectory Preferences for Manipulators via Iterative Improvement

Author: Ashesh Jain, Brian Wojcik, Thorsten Joachims, Ashutosh Saxena

Abstract: We consider the problem of learning good trajectories for manipulation tasks. This is challenging because the criterion defining a good trajectory varies with users, tasks and environments. In this paper, we propose a co-active online learning framework for teaching robots the preferences of its users for object manipulation tasks. The key novelty of our approach lies in the type of feedback expected from the user: the human user does not need to demonstrate optimal trajectories as training data, but merely needs to iteratively provide trajectories that slightly improve over the trajectory currently proposed by the system. We argue that this co-active preference feedback can be more easily elicited from the user than demonstrations of optimal trajectories, which are often challenging and non-intuitive to provide on high degrees of freedom manipulators. Nevertheless, theoretical regret bounds of our algorithm match the asymptotic rates of optimal trajectory algorithms. We demonstrate the generalizability of our algorithm on a variety of grocery checkout tasks, for whom, the preferences were not only influenced by the object being manipulated but also by the surrounding environment.1 1

3 0.55552655 165 nips-2013-Learning from Limited Demonstrations

Author: Beomjoon Kim, Amir massoud Farahmand, Joelle Pineau, Doina Precup

Abstract: We propose a Learning from Demonstration (LfD) algorithm which leverages expert data, even if they are very few or inaccurate. We achieve this by using both expert data, as well as reinforcement signals gathered through trial-and-error interactions with the environment. The key idea of our approach, Approximate Policy Iteration with Demonstration (APID), is that expert’s suggestions are used to define linear constraints which guide the optimization performed by Approximate Policy Iteration. We prove an upper bound on the Bellman error of the estimate computed by APID at each iteration. Moreover, we show empirically that APID outperforms pure Approximate Policy Iteration, a state-of-the-art LfD algorithm, and supervised learning in a variety of scenarios, including when very few and/or suboptimal demonstrations are available. Our experiments include simulations as well as a real robot path-finding task. 1

4 0.39976159 16 nips-2013-A message-passing algorithm for multi-agent trajectory planning

Author: Jose Bento, Nate Derbinsky, Javier Alonso-Mora, Jonathan S. Yedidia

Abstract: We describe a novel approach for computing collision-free global trajectories for p agents with specified initial and final configurations, based on an improved version of the alternating direction method of multipliers (ADMM). Compared with existing methods, our approach is naturally parallelizable and allows for incorporating different cost functionals with only minor adjustments. We apply our method to classical challenging instances and observe that its computational requirements scale well with p for several cost functionals. We also show that a specialization of our algorithm can be used for local motion planning by solving the problem of joint optimization in velocity space. 1

5 0.39859608 17 nips-2013-A multi-agent control framework for co-adaptation in brain-computer interfaces

Author: Josh S. Merel, Roy Fox, Tony Jebara, Liam Paninski

Abstract: In a closed-loop brain-computer interface (BCI), adaptive decoders are used to learn parameters suited to decoding the user’s neural response. Feedback to the user provides information which permits the neural tuning to also adapt. We present an approach to model this process of co-adaptation between the encoding model of the neural signal and the decoding algorithm as a multi-agent formulation of the linear quadratic Gaussian (LQG) control problem. In simulation we characterize how decoding performance improves as the neural encoding and adaptive decoder optimize, qualitatively resembling experimentally demonstrated closed-loop improvement. We then propose a novel, modified decoder update rule which is aware of the fact that the encoder is also changing and show it can improve simulated co-adaptation dynamics. Our modeling approach offers promise for gaining insights into co-adaptation as well as improving user learning of BCI control in practical settings.

6 0.38883731 250 nips-2013-Policy Shaping: Integrating Human Feedback with Reinforcement Learning

7 0.37061834 181 nips-2013-Machine Teaching for Bayesian Learners in the Exponential Family

8 0.33949575 348 nips-2013-Variational Policy Search via Trajectory Optimization

9 0.33108693 235 nips-2013-Online learning in episodic Markovian decision processes by relative entropy policy search

10 0.31618655 100 nips-2013-Dynamic Clustering via Asymptotics of the Dependent Dirichlet Process Mixture

11 0.31536421 69 nips-2013-Context-sensitive active sensing in humans

12 0.31208152 150 nips-2013-Learning Adaptive Value of Information for Structured Prediction

13 0.30467668 257 nips-2013-Projected Natural Actor-Critic

14 0.28746611 124 nips-2013-Forgetful Bayes and myopic planning: Human learning and decision-making in a bandit setting

15 0.27718702 48 nips-2013-Bayesian Inference and Learning in Gaussian Process State-Space Models with Particle MCMC

16 0.27661979 199 nips-2013-More data speeds up training time in learning halfspaces over sparse vectors

17 0.26743245 134 nips-2013-Graphical Models for Inference with Missing Data

18 0.264036 264 nips-2013-Reciprocally Coupled Local Estimators Implement Bayesian Information Integration Distributively

19 0.25924537 168 nips-2013-Learning to Pass Expectation Propagation Messages

20 0.25715786 329 nips-2013-Third-Order Edge Statistics: Contour Continuation, Curvature, and Cortical Connections


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(2, 0.018), (16, 0.034), (33, 0.085), (34, 0.126), (41, 0.053), (42, 0.287), (49, 0.067), (56, 0.078), (70, 0.081), (73, 0.01), (85, 0.026), (89, 0.013), (93, 0.037)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.76091462 255 nips-2013-Probabilistic Movement Primitives

Author: Alexandros Paraschos, Christian Daniel, January Peters, Gerhard Neumann

Abstract: Movement Primitives (MP) are a well-established approach for representing modular and re-usable robot movement generators. Many state-of-the-art robot learning successes are based MPs, due to their compact representation of the inherently continuous and high dimensional robot movements. A major goal in robot learning is to combine multiple MPs as building blocks in a modular control architecture to solve complex tasks. To this effect, a MP representation has to allow for blending between motions, adapting to altered task variables, and co-activating multiple MPs in parallel. We present a probabilistic formulation of the MP concept that maintains a distribution over trajectories. Our probabilistic approach allows for the derivation of new operations which are essential for implementing all aforementioned properties in one framework. In order to use such a trajectory distribution for robot movement control, we analytically derive a stochastic feedback controller which reproduces the given trajectory distribution. We evaluate and compare our approach to existing methods on several simulated as well as real robot scenarios. 1

2 0.57543266 77 nips-2013-Correlations strike back (again): the case of associative memory retrieval

Author: Cristina Savin, Peter Dayan, Mate Lengyel

Abstract: It has long been recognised that statistical dependencies in neuronal activity need to be taken into account when decoding stimuli encoded in a neural population. Less studied, though equally pernicious, is the need to take account of dependencies between synaptic weights when decoding patterns previously encoded in an auto-associative memory. We show that activity-dependent learning generically produces such correlations, and failing to take them into account in the dynamics of memory retrieval leads to catastrophically poor recall. We derive optimal network dynamics for recall in the face of synaptic correlations caused by a range of synaptic plasticity rules. These dynamics involve well-studied circuit motifs, such as forms of feedback inhibition and experimentally observed dendritic nonlinearities. We therefore show how addressing the problem of synaptic correlations leads to a novel functional account of key biophysical features of the neural substrate. 1

3 0.5616585 121 nips-2013-Firing rate predictions in optimal balanced networks

Author: David G. Barrett, Sophie Denève, Christian K. Machens

Abstract: How are firing rates in a spiking network related to neural input, connectivity and network function? This is an important problem because firing rates are a key measure of network activity, in both the study of neural computation and neural network dynamics. However, it is a difficult problem, because the spiking mechanism of individual neurons is highly non-linear, and these individual neurons interact strongly through connectivity. We develop a new technique for calculating firing rates in optimal balanced networks. These are particularly interesting networks because they provide an optimal spike-based signal representation while producing cortex-like spiking activity through a dynamic balance of excitation and inhibition. We can calculate firing rates by treating balanced network dynamics as an algorithm for optimising signal representation. We identify this algorithm and then calculate firing rates by finding the solution to the algorithm. Our firing rate calculation relates network firing rates directly to network input, connectivity and function. This allows us to explain the function and underlying mechanism of tuning curves in a variety of systems. 1

4 0.55664742 15 nips-2013-A memory frontier for complex synapses

Author: Subhaneil Lahiri, Surya Ganguli

Abstract: An incredible gulf separates theoretical models of synapses, often described solely by a single scalar value denoting the size of a postsynaptic potential, from the immense complexity of molecular signaling pathways underlying real synapses. To understand the functional contribution of such molecular complexity to learning and memory, it is essential to expand our theoretical conception of a synapse from a single scalar to an entire dynamical system with many internal molecular functional states. Moreover, theoretical considerations alone demand such an expansion; network models with scalar synapses assuming finite numbers of distinguishable synaptic strengths have strikingly limited memory capacity. This raises the fundamental question, how does synaptic complexity give rise to memory? To address this, we develop new mathematical theorems elucidating the relationship between the structural organization and memory properties of complex synapses that are themselves molecular networks. Moreover, in proving such theorems, we uncover a framework, based on first passage time theory, to impose an order on the internal states of complex synaptic models, thereby simplifying the relationship between synaptic structure and function. 1

5 0.55662602 141 nips-2013-Inferring neural population dynamics from multiple partial recordings of the same neural circuit

Author: Srini Turaga, Lars Buesing, Adam M. Packer, Henry Dalgleish, Noah Pettit, Michael Hausser, Jakob Macke

Abstract: Simultaneous recordings of the activity of large neural populations are extremely valuable as they can be used to infer the dynamics and interactions of neurons in a local circuit, shedding light on the computations performed. It is now possible to measure the activity of hundreds of neurons using 2-photon calcium imaging. However, many computations are thought to involve circuits consisting of thousands of neurons, such as cortical barrels in rodent somatosensory cortex. Here we contribute a statistical method for “stitching” together sequentially imaged sets of neurons into one model by phrasing the problem as fitting a latent dynamical system with missing observations. This method allows us to substantially expand the population-sizes for which population dynamics can be characterized—beyond the number of simultaneously imaged neurons. In particular, we demonstrate using recordings in mouse somatosensory cortex that this method makes it possible to predict noise correlations between non-simultaneously recorded neuron pairs. 1

6 0.55570441 16 nips-2013-A message-passing algorithm for multi-agent trajectory planning

7 0.55138129 157 nips-2013-Learning Multi-level Sparse Representations

8 0.54795361 56 nips-2013-Better Approximation and Faster Algorithm Using the Proximal Average

9 0.54388517 86 nips-2013-Demixing odors - fast inference in olfaction

10 0.53585422 262 nips-2013-Real-Time Inference for a Gamma Process Model of Neural Spiking

11 0.53447998 148 nips-2013-Latent Maximum Margin Clustering

12 0.53419828 22 nips-2013-Action is in the Eye of the Beholder: Eye-gaze Driven Model for Spatio-Temporal Action Localization

13 0.53400582 64 nips-2013-Compete to Compute

14 0.53062105 136 nips-2013-Hierarchical Modular Optimization of Convolutional Networks Achieves Representations Similar to Macaque IT and Human Ventral Stream

15 0.53061962 5 nips-2013-A Deep Architecture for Matching Short Texts

16 0.5300808 49 nips-2013-Bayesian Inference and Online Experimental Design for Mapping Neural Microcircuits

17 0.52932101 278 nips-2013-Reward Mapping for Transfer in Long-Lived Agents

18 0.52860683 114 nips-2013-Extracting regions of interest from biological images with convolutional sparse block coding

19 0.52703619 236 nips-2013-Optimal Neural Population Codes for High-dimensional Stimulus Variables

20 0.52656037 173 nips-2013-Least Informative Dimensions