nips nips2013 nips2013-16 knowledge-graph by maker-knowledge-mining

16 nips-2013-A message-passing algorithm for multi-agent trajectory planning

Source: pdf

Author: Jose Bento, Nate Derbinsky, Javier Alonso-Mora, Jonathan S. Yedidia

Abstract: We describe a novel approach for computing collision-free global trajectories for p agents with speciﬁed initial and ﬁnal conﬁgurations, based on an improved version of the alternating direction method of multipliers (ADMM). Compared with existing methods, our approach is naturally parallelizable and allows for incorporating different cost functionals with only minor adjustments. We apply our method to classical challenging instances and observe that its computational requirements scale well with p for several cost functionals. We also show that a specialization of our algorithm can be used for local motion planning by solving the problem of joint optimization in velocity space. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 A message-passing algorithm for multi-agent trajectory planning Jos´ Bento ⇤ e jbento@disneyresearch. [sent-1, score-0.29]

2 com Abstract We describe a novel approach for computing collision-free global trajectories for p agents with speciﬁed initial and ﬁnal conﬁgurations, based on an improved version of the alternating direction method of multipliers (ADMM). [sent-6, score-0.766]

3 We also show that a specialization of our algorithm can be used for local motion planning by solving the problem of joint optimization in velocity space. [sent-9, score-0.49]

4 This paper focuses on trajectory planning for multiple agents, an important problem in robotics [1, 2], computer animation, and crowd simulation [3]. [sent-12, score-0.358]

5 Centralized planning for multiple agents is PSPACE hard [4, 5]. [sent-13, score-0.642]

6 To contend with this complexity, traditional multi-agent planning prioritizes agents and computes their trajectories sequentially [6], leading to suboptimal solutions. [sent-14, score-0.843]

7 By contrast, our method plans for all agents simultaneously. [sent-15, score-0.488]

8 Trajectory planning is also simpliﬁed if agents are non-distinct and can be dynamically assigned to a set of goal positions [1]. [sent-16, score-0.703]

9 Due to the complexity of planning collision-free trajectories, real-time robot navigation is commonly decoupled into a global planner and a fast local planner that performs collision-avoidance. [sent-22, score-0.314]

10 Building on an extension of the velocity-obstacles approach, recent work on centralized collision avoidance [12] computes collision-free local motions for all agents whilst maximizing a joint utility using either a computationally expensive MIQP or an efﬁcient, though local, QP. [sent-24, score-0.746]

11 In this paper we formalize the global trajectory planning task as follows. [sent-27, score-0.325]

12 Given p agents of different radii {ri }p with given desired initial and ﬁnal positions, {xi (0)}p and {xi (T )}p , along with i=1 i=1 i=1 a cost functional over trajectories, compute collision-free trajectories for all agents that minimize p the cost functional. [sent-28, score-1.347]

13 That is, ﬁnd a set of intermediate points {xi (t)}i=1 , t 2 (0, T ), that satisﬁes the “hard” collision-free constraints that kxi (t) xj (t)k > ri + rj , for all i, j and t, and that insofar as possible, minimizes the cost functional. [sent-29, score-0.276]

14 The method we propose searches for a solution within the space of piece-wise linear trajectories, wherein the trajectory of an agent is completely speciﬁed by a set of positions at a ﬁxed set of time instants {ts }⌘ . [sent-30, score-0.344]

15 All other intermediate points of the trajectories are computed by assuming that each agent moves with constant velocity in between break-points: if t1 and t2 > t1 are consecutive break-points, then xi (t) = t2 1 t1 ((t2 t)xi (t1 ) + (t t1 )xi (t2 )) for t 2 [t1 , t2 ]. [sent-32, score-0.658]

16 The main contributions of this paper are as follows: i) We formulate the global trajectory planning task as a decomposable optimization problem. [sent-34, score-0.361]

17 A particularly crucial minimizer ensures there are no inter-agent collisions, but we also derive other minimizers that allow for ﬁnding trajectories with minimal total energy, avoiding static obstacles, or imposing dynamic constraints, such as maximum/minimum agent velocity. [sent-39, score-0.789]

18 iii) We show that our method can specialize to perform local planning by solving the problem of joint optimization in velocity space [12]. [sent-40, score-0.449]

19 Our work is among the few examples where the success of applying ADMM to ﬁnd approximate solutions to a large non-convex problems can be judged with the naked eye, by the gracefulness of the trajectories found. [sent-41, score-0.237]

20 2 Minimizers in the TWA In this section we provide a short description of the TWA [13], and, in particular, the role of the minimizer building blocks that it needs to solve a particular optimization problem. [sent-50, score-0.272]

21 The TWA solves this optimization problem iteratively by passing messages on a bipartite graph, in the form of a Forney factor graph [14]: one minimizer-node per function fb , one equality-node per variable xj and an edge (b, j), connecting b and j, if fb depends on xj (see Figure 1-left). [sent-58, score-0.413]

22 Right: The input and output variables for each minimizer block. [sent-60, score-0.236]

23 ⇢ ⇢ The equality-nodes receive these local messages and weights and produce consensus estimates for all variables by computing an average of the incoming messages, weighted by the incoming certainty weights ! [sent-68, score-0.422]

24 From these consensus estimates, correcting messages are computed and communicated ⇢ back to the minimizers to help them reach consensus. [sent-70, score-0.371]

25 For example, the minimizer g1 receives correcting messages n1,1 and n1,3 with corresponding certainty weights ⇢ 1,1 and ⇢ 1,3 (see Figure 1-right). [sent-72, score-0.507]

26 b,j } that this minimizer outputs must be 0 (uncertain); 1 (certain); or ⇢0 , set to some ﬁxed ⇢ value. [sent-75, score-0.236]

27 The logic for setting weights from minimizer-nodes depends on the problem; as we shall see, in trajectory planning problems, we only use 0 or ⇢0 weights. [sent-76, score-0.403]

28 If we choose that all minimizers always output weights equal to ⇢0 , the TWA reduces to standard ADMM; however, 0-weights allows equality-nodes to ignore inactive constraints, traversing the search space much faster. [sent-77, score-0.253]

29 Finally, notice that all minimizers can operate simultaneously, and the same is true for the consensus calculation performed by each equality-node. [sent-78, score-0.233]

30 3 Global trajectory planning We now turn to describing our decomposition of the global trajectory planning optimization problem in detail. [sent-80, score-0.651]

31 Rather, our variables are the positions {xi (s)}i2[p] , where the trajectories are indexed by i and break-points are indexed by a discrete variable s taking values between 1 and ⌘ 1. [sent-85, score-0.262]

32 1 Formulation as unconstrained optimization without static obstacles In terms of these variables, the non-collision constraints2 are k(↵xi (s + 1) + (1 ↵)xi (s)) (↵xj (s + 1) + (1 for all i, j 2 [p], s 2 {0, . [sent-88, score-0.277]

33 ↵)xj (s))k ri + rj , (2) The parameter ↵ is used to trace out the constant-velocity trajectories of agents i and j between break-points s + 1 and s. [sent-92, score-0.77]

34 If t1 is the absolute time of the break-point with integer index s and t2 is the absolute time of the break-point with integer index s + 1 and t parametrizes the trajectories in absolute time then ↵ = (t t1 )/(t2 t1 ). [sent-94, score-0.3]

35 Note that in the above formulation, absolute time does not appear, and any solution is simply a set of paths that, when travelled by each agent at constant velocity between break-points, leads to no collisions. [sent-95, score-0.368]

36 When converting this solution into trajectories parameterized by absolute time, the break-points do not need to be chosen uniformly spaced in absolute time. [sent-96, score-0.267]

37 the integrated kinetic energy coll or the maximum velocity over all the agents) and the function fr,r0 is deﬁned as, coll fr,r0 (x, x, x0 , x0 ) = J(k↵(x x0 ) + (1 ↵)(x x0 )k r + r0 8↵ 2 [0, 1]). [sent-100, score-0.985]

38 (4) In this section, x and x represent the position of an arbitrary agent of radius r at two consecutive break-points and x0 and x0 the position of a second arbitrary agent of radius r0 at the same breakpoints. [sent-101, score-0.493]

39 ) takes the value 0 whenever its argument, a clause, is true and coll takes the value +1 otherwise. [sent-103, score-0.303]

40 For example, we might prefer trajectories for which the total kinetic energy spent by the set of agents is cost small. [sent-107, score-0.908]

41 In this case, deﬁning fC (x, x) = Ckx xk2 , we have, p n 1 f cost ({xi (s)}i,s ) = 1 X X cost f (xi (s), xi (s + 1)). [sent-108, score-0.215]

42 pn i=1 s=0 Ci,s (5) where the coefﬁcients {Ci,s } can account for agents with different masses, different absolute-time intervals between-break points or different preferences regarding which agents we want to be less active and which agents are allowed to move faster. [sent-109, score-1.464]

43 More simply, we might want to exclude trajectories in which agents move faster than a certain amount, but without distinguishing among all remaining trajectories. [sent-110, score-0.689]

44 (6) In this case, associating each break-point to a time instant, the coefﬁcients {Ci,s } in expression (5) would represent different limits on the velocity of different agents between different sections of the trajectory. [sent-112, score-0.712]

45 If we want to force all agents to have a minimum velocity we can simply reverse the inequality in (6). [sent-113, score-0.738]

46 Since the agents are round, this allows for a single point of contact between two agents and does not reduce practical relevance. [sent-115, score-0.976]

47 2 Formulation as unconstrained optimization with static obstacles In many scenarios agents should also avoid collisions with static obstacles. [sent-117, score-0.965]

48 Given two points in space, xL and xR , we can forbid all agents from crossing the line segment from xL to xR by adding Pp Pn 1 wall the following term to the function (3): i=1 s=0 fxL ,xR ,ri (xi (s), xi (s + 1)). [sent-118, score-0.745]

49 We recall that ri is the radius of agent i and wall fxL ,xR ,r (x, x) = J(k(↵x + (1 Notice that f coll ↵)x) can be expressed using f ( xR + (1 wall coll fr,r0 (x, x, x0 , x0 ) )xL )k r for all ↵, . [sent-119, score-1.128]

50 (7) (8) We use this fact later to express the minimizer associated with agent-agent collisions using the minimizer associated with agent-obstacle collisions. [sent-122, score-0.58]

51 xi (s) 2 R2 for all i 2 [p] and s+1 2 [⌘ +1], being able to avoid collisions with a general static line segment allows to automatically avoid collisions with multiple static obstacles of arbitrary polygonal shape. [sent-125, score-0.645]

52 Our numerical experiments only consider agents in the plane and so, in this paper, we only describe the minimizer block for wall collision for a 2D world. [sent-126, score-1.04]

53 The unconstrained formulation (3) then tells us that the global objective function is cost the sum of ⌘p(p + 1)/2 terms: ⌘p(p 1)/2 functions f coll and ⌘p functions fC . [sent-136, score-0.472]

54 In particular, the equalitynode associated with the break-point variable xi (s), ⌘ > s > 0, is connected to 2(p 1) different cost g coll minimizer-nodes and two different gC minimizer-nodes. [sent-139, score-0.454]

55 If s = 0 or s = ⌘ the equality-node cost only connects to half as many g coll nodes and gC nodes. [sent-140, score-0.367]

56 Every minimizer basically is a special case of (1). [sent-142, score-0.236]

57 1 Agent-agent collision minimizer We start with the minimizer associated with the functions f coll , that we denoted by g coll . [sent-145, score-1.211]

58 This minimizer receives as parameters the radius, r and r0 , of the two agents whose collision it is avoiding. [sent-146, score-0.888]

59 The minimizer takes as input a set of incoming n-messages, {n, n, n0 , n0 }, and associated ⇢ -weights, { ⇢ , ⇢ , ⇢ 0 , ⇢ 0 }, and outputs a set of updated x-variables according to expression (9). [sent-147, score-0.275]

60 Messages n and n come from the two equality-nodes associated with the positions of one of the agents at two consecutive break-points and n0 and n0 from the corresponding equality-nodes for the other agent. [sent-148, score-0.584]

61 g coll (n, n, n0 , n0 , ⇢ , ⇢ , ⇢ 0 , ⇢ 0 , r, r0 ) = arg min {x,x,x0 ,x0 } coll fr,r0 (x, x, x0 , x0 ) ⇢ ⇢0 0 ⇢ ⇢0 0 kx nk2 + kx nk2 + kx n0 k2 + kx n0 k2 . [sent-149, score-0.99]

62 If the trajectory from n to n for ⇢ an agent of radius r does not collide with the trajectory from n0 to n0 for an agent of radius r0 then set all the outgoing weights ! [sent-152, score-0.81]

63 + The solution to (9) is found using the agent-obstacle collision minimizer that we describe next. [sent-156, score-0.369]

64 2 Agent-obstacle collision minimizer The minimizer for f wall is denoted by g wall . [sent-159, score-0.891]

65 It is parameterized by the obstacle position {xL , xR } as well as the radius of the agent that needs to avoid the obstacle. [sent-160, score-0.295]

66 It receives two n-messages, {n, n}, and corresponding weights { ⇢ , ⇢ }, from the equality-nodes associated with two consecutive positions of an agent that needs to avoid the obstacle. [sent-161, score-0.31]

67 Its output, the x-variables, are deﬁned as wall g wall (n, n, r, xL , xR , ⇢ , ⇢ ) = arg min fxL ,xR ,r (x, x) + {x,x} ⇢ kx 2 nk2 + ⇢ kx 2 nk2 . [sent-162, score-0.478]

68 (10) When agents move in the plane (2D), this minimizer can be solved by reformulating the optimization in (10) as a mechanical problem involving a system of springs that we can solve exactly and efﬁciently. [sent-163, score-0.8]

69 -weights is similar to that of the g coll minimizer. [sent-166, score-0.303]

70 If an agent of radius ⇢ r going from n and n does not collide with the line segment from xL to xR then set all outgoing weights to zero because the constraint is inactive; otherwise set all the outgoing weights to ⇢0 . [sent-167, score-0.489]

71 Notice that, from (8), it follows that the agent-agent minimizer g coll can be expressed using g wall . [sent-168, score-0.682]

72 More concretely, as proved in the supplementary material, Section C, ⇣ ⌘ g coll (n, n, n0 , n0 , ⇢ , ⇢ , ⇢ 0 , ⇢ 0 , r, r0 ) = M2 g wall M1 . [sent-169, score-0.476]

73 3 Minimum energy and maximum (minimum) velocity minimizer When f cost can be decomposed as in (5), the minimizer associated with the functions f cost is denoted by g cost and receives as input two n-messages, {n, n}, and corresponding weights, { ⇢ , ⇢ }. [sent-173, score-1.005]

74 The minimizer is also parameterized by a cost factor c. [sent-175, score-0.3]

75 It outputs a set of updated x-messages deﬁned as cost g cost (n, n, ⇢ , ⇢ , c) = arg min fc (x, x) + {x,x} ⇢ kx 2 nk2 + ⇢ kx 2 nk2 . [sent-176, score-0.38]

76 -weights of the minimum energy minimizer is very simply: always set all ⇢ outgoing weights ! [sent-178, score-0.466]

77 -weights of the maximum velocity minimizer ⇢ ⇢ is the following. [sent-181, score-0.46]

78 The update logic for the minimum velocity minimizer is similar. [sent-184, score-0.554]

79 The solution to the minimum energy, maximum velocity and minimum velocity minimizer is written in the supplementary material in Sections E, F, and G respectively. [sent-188, score-0.803]

80 Note that the lack of open-source scalable algorithms for global trajectory planning in the literature makes it difﬁcult to benchmark our performance against other methods. [sent-190, score-0.325]

81 All the tests described here are for agents in a twodimensional plane. [sent-192, score-0.488]

82 6 The ﬁrst test considers scenario CONF1: p (even) agents of radius r, equally spaced around on a circle of radius R, are each required to exchange position with the corresponding antipodal agent, r = (5/4)R sin(⇡/2(p 4)). [sent-199, score-0.689]

83 This is a classical difﬁcult test scenario because the straight line motion of all agents to their goal would result in them all colliding in the center of the circle. [sent-200, score-0.529]

84 The objective is to minimize the total kinetic energy (C in the energy minimizer is set to 1). [sent-203, score-0.477]

85 The agents initial and ﬁnal conﬁguration is randomly chosen in the plane (CONF2). [sent-219, score-0.57]

86 7 5 Local trajectory planning based on velocity obstacles In this section we show how the joint optimization presented in [12], which is based on the concept of velocity obstacles [11] (VO), can be also solved via the message-passing TWA. [sent-232, score-1.036]

87 In VO, given the current position {xi (0)}i2[p] and radius {ri } of all agents, a new velocity command is computed ref jointly for all agents minimizing the distance to their preferred velocity {vi }i2[p] . [sent-233, score-1.145]

88 This new velocity command must guarantee that the trajectories of all agents remain collision-free for at least a time horizon ⌧ . [sent-234, score-0.945]

89 New collision-free velocities are computed every ↵⌧ seconds, ↵ < 1, until all agents reach their ﬁnal conﬁguration. [sent-235, score-0.521]

90 k(1 ↵)(xi (0) xj (0)) + ↵(xi xj )k ri + rj 8 j > i 2 [p], ↵ 2 [0, 1] (12) 2 ref ref where = Ci /⌧ , xi = xi (0) + vi ⌧ and we have dropped the ⌧ in xi (⌧ ). [sent-241, score-0.638]

91 The above problem, extended to account for collisions with the static line segments {xRk , xLk }k , can be formulated in an unconstrained form using the functions f cost , f coll and f wall . [sent-242, score-0.728]

92 Namely, X X XX cost coll wall min fC 0 (xi , xref ) + fri ,rj (xi (0), xi , xj (0), xj ) + fxR k ,xL k ,ri (xi (0), xi ). [sent-243, score-0.912]

93 All corresponding minimizers are special cases of minimizers derived in the previous section for global trajectory planning (see Section H in the supplementary material for details). [sent-246, score-0.744]

94 Both approaches use a mechanism for breaking the symmetry from CONF1 and avoid deadlocks: theirs uses a preferential rotation direction for agents, while we use agents with slightly different C coefﬁcients in their energy minimizers (Cith agent = 1 + 0. [sent-251, score-0.888]

95 6 Conclusion and future work We have presented a novel algorithm for global and local planning of the trajectory of multiple distinct agents, a problem known to be hard. [sent-256, score-0.36]

96 Our implementation of the algorithm in Java on a regular desktop computer, using a basic scheduler/synchronization over its few cores, already scales to hundreds of agents and achieves real-time performance for local planning. [sent-260, score-0.523]

97 For agents in the plane, we derived explicit expressions that account for static obstacles, moving obstacles, and dynamic constraints on the velocity and energy. [sent-262, score-0.801]

98 acceleration constraints) and provide fast solvers to our minimizers for agents in 3D. [sent-265, score-0.664]

99 For example, minimizers like g coll could be solved by message exchange between pairs of agents within a maximum communication radius. [sent-267, score-0.967]

100 Collision avoidance for multiple agents with joint utility maximization. [sent-311, score-0.578]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('agents', 0.488), ('coll', 0.303), ('twa', 0.289), ('minimizer', 0.236), ('velocity', 0.224), ('trajectories', 0.201), ('minimizers', 0.176), ('planning', 0.154), ('wall', 0.143), ('trajectory', 0.136), ('collision', 0.133), ('obstacles', 0.131), ('agent', 0.111), ('collisions', 0.108), ('admm', 0.107), ('messages', 0.105), ('kx', 0.096), ('avoidance', 0.09), ('miqp', 0.09), ('xi', 0.087), ('energy', 0.086), ('radius', 0.083), ('cores', 0.08), ('outgoing', 0.073), ('xj', 0.071), ('xr', 0.07), ('kinetic', 0.069), ('logic', 0.068), ('robotics', 0.068), ('static', 0.065), ('cost', 0.064), ('positions', 0.061), ('fc', 0.06), ('xl', 0.06), ('ref', 0.059), ('sec', 0.059), ('certainty', 0.057), ('consensus', 0.057), ('fxl', 0.054), ('xref', 0.054), ('bipartite', 0.052), ('unconstrained', 0.045), ('weights', 0.045), ('animation', 0.044), ('initial', 0.042), ('ri', 0.042), ('javier', 0.041), ('motion', 0.041), ('plane', 0.04), ('minima', 0.04), ('fb', 0.039), ('obstacle', 0.039), ('rj', 0.039), ('incoming', 0.039), ('nal', 0.037), ('material', 0.037), ('derbinsky', 0.036), ('forney', 0.036), ('gracefulness', 0.036), ('instants', 0.036), ('kxi', 0.036), ('raffaello', 0.036), ('roland', 0.036), ('siegwart', 0.036), ('vi', 0.036), ('optimization', 0.036), ('consecutive', 0.035), ('local', 0.035), ('position', 0.035), ('global', 0.035), ('absolute', 0.033), ('correcting', 0.033), ('mobile', 0.033), ('velocities', 0.033), ('inactive', 0.032), ('bento', 0.032), ('collide', 0.032), ('command', 0.032), ('fri', 0.032), ('nate', 0.032), ('navigation', 0.032), ('receives', 0.031), ('convergence', 0.031), ('supplementary', 0.03), ('planner', 0.029), ('ru', 0.029), ('vo', 0.029), ('multiplier', 0.029), ('magnus', 0.028), ('guration', 0.027), ('ci', 0.027), ('segment', 0.027), ('avoid', 0.027), ('java', 0.026), ('minimum', 0.026), ('formulation', 0.025), ('jos', 0.025), ('jonathan', 0.025), ('magnitude', 0.024), ('constraints', 0.024)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999964 16 nips-2013-A message-passing algorithm for multi-agent trajectory planning

Author: Jose Bento, Nate Derbinsky, Javier Alonso-Mora, Jonathan S. Yedidia

2 0.27533942 129 nips-2013-Generalized Random Utility Models with Multiple Types

Author: Hossein Azari Soufiani, Hansheng Diao, Zhenyu Lai, David C. Parkes

Abstract: We propose a model for demand estimation in multi-agent, differentiated product settings and present an estimation algorithm that uses reversible jump MCMC techniques to classify agents’ types. Our model extends the popular setup in Berry, Levinsohn and Pakes (1995) to allow for the data-driven classiﬁcation of agents’ types using agent-level data. We focus on applications involving data on agents’ ranking over alternatives, and present theoretical conditions that establish the identiﬁability of the model and uni-modality of the likelihood/posterior. Results on both real and simulated data provide support for the scalability of our approach. 1

3 0.18968703 228 nips-2013-Online Learning of Dynamic Parameters in Social Networks

Author: Shahin Shahrampour, Sasha Rakhlin, Ali Jadbabaie

Abstract: This paper addresses the problem of online learning in a dynamic setting. We consider a social network in which each individual observes a private signal about the underlying state of the world and communicates with her neighbors at each time period. Unlike many existing approaches, the underlying state is dynamic, and evolves according to a geometric random walk. We view the scenario as an optimization problem where agents aim to learn the true state while suffering the smallest possible loss. Based on the decomposition of the global loss function, we introduce two update mechanisms, each of which generates an estimate of the true state. We establish a tight bound on the rate of change of the underlying state, under which individuals can track the parameter with a bounded variance. Then, we characterize explicit expressions for the steady state mean-square deviation(MSD) of the estimates from the truth, per individual. We observe that only one of the estimators recovers the optimal MSD, which underscores the impact of the objective function decomposition on the learning quality. Finally, we provide an upper bound on the regret of the proposed methods, measured as an average of errors in estimating the parameter in a ﬁnite time. 1

4 0.16131888 248 nips-2013-Point Based Value Iteration with Optimal Belief Compression for Dec-POMDPs

Author: Liam C. MacDermed, Charles Isbell

Abstract: We present four major results towards solving decentralized partially observable Markov decision problems (DecPOMDPs) culminating in an algorithm that outperforms all existing algorithms on all but one standard inﬁnite-horizon benchmark problems. (1) We give an integer program that solves collaborative Bayesian games (CBGs). The program is notable because its linear relaxation is very often integral. (2) We show that a DecPOMDP with bounded belief can be converted to a POMDP (albeit with actions exponential in the number of beliefs). These actions correspond to strategies of a CBG. (3) We present a method to transform any DecPOMDP into a DecPOMDP with bounded beliefs (the number of beliefs is a free parameter) using optimal (not lossless) belief compression. (4) We show that the combination of these results opens the door for new classes of DecPOMDP algorithms based on previous POMDP algorithms. We choose one such algorithm, point-based valued iteration, and modify it to produce the ﬁrst tractable value iteration method for DecPOMDPs that outperforms existing algorithms. 1

5 0.13972318 278 nips-2013-Reward Mapping for Transfer in Long-Lived Agents

Author: Xiaoxiao Guo, Satinder Singh, Richard L. Lewis

Abstract: We consider how to transfer knowledge from previous tasks (MDPs) to a current task in long-lived and bounded agents that must solve a sequence of tasks over a ﬁnite lifetime. A novel aspect of our transfer approach is that we reuse reward functions. While this may seem counterintuitive, we build on the insight of recent work on the optimal rewards problem that guiding an agent’s behavior with reward functions other than the task-specifying reward function can help overcome computational bounds of the agent. Speciﬁcally, we use good guidance reward functions learned on previous tasks in the sequence to incrementally train a reward mapping function that maps task-specifying reward functions into good initial guidance reward functions for subsequent tasks. We demonstrate that our approach can substantially improve the agent’s performance relative to other approaches, including an approach that transfers policies. 1

6 0.13436376 162 nips-2013-Learning Trajectory Preferences for Manipulators via Iterative Improvement

7 0.10062405 164 nips-2013-Learning and using language via recursive pragmatic reasoning about other agents

8 0.09517128 293 nips-2013-Sign Cauchy Projections and Chi-Square Kernel

9 0.07805638 168 nips-2013-Learning to Pass Expectation Propagation Messages

10 0.074841179 255 nips-2013-Probabilistic Movement Primitives

11 0.072376348 123 nips-2013-Flexible sampling of discrete data correlations without the marginal distributions

12 0.072144739 79 nips-2013-DESPOT: Online POMDP Planning with Regularization

13 0.072042406 146 nips-2013-Large Scale Distributed Sparse Precision Estimation

14 0.071682654 318 nips-2013-Structured Learning via Logistic Regression

15 0.067144975 32 nips-2013-Aggregating Optimistic Planning Trees for Solving Markov Decision Processes

16 0.064111292 17 nips-2013-A multi-agent control framework for co-adaptation in brain-computer interfaces

17 0.059600621 150 nips-2013-Learning Adaptive Value of Information for Structured Prediction

18 0.059090689 323 nips-2013-Synthesizing Robust Plans under Incomplete Domain Models

19 0.057817094 43 nips-2013-Auxiliary-variable Exact Hamiltonian Monte Carlo Samplers for Binary Distributions

20 0.055617526 53 nips-2013-Bayesian inference for low rank spatiotemporal neural receptive fields

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.156), (1, -0.018), (2, -0.028), (3, 0.021), (4, 0.01), (5, 0.049), (6, 0.04), (7, -0.016), (8, -0.02), (9, 0.002), (10, 0.024), (11, -0.006), (12, 0.047), (13, 0.001), (14, -0.197), (15, 0.274), (16, 0.01), (17, 0.093), (18, 0.125), (19, 0.006), (20, -0.263), (21, -0.097), (22, -0.004), (23, 0.026), (24, -0.145), (25, -0.055), (26, 0.002), (27, 0.173), (28, 0.119), (29, -0.0), (30, 0.025), (31, 0.086), (32, 0.058), (33, -0.116), (34, -0.027), (35, 0.107), (36, 0.027), (37, -0.041), (38, 0.01), (39, -0.066), (40, -0.037), (41, 0.021), (42, -0.033), (43, -0.003), (44, 0.021), (45, -0.05), (46, 0.04), (47, 0.005), (48, -0.037), (49, -0.013)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94240803 16 nips-2013-A message-passing algorithm for multi-agent trajectory planning

Author: Jose Bento, Nate Derbinsky, Javier Alonso-Mora, Jonathan S. Yedidia

2 0.83420634 129 nips-2013-Generalized Random Utility Models with Multiple Types

Author: Hossein Azari Soufiani, Hansheng Diao, Zhenyu Lai, David C. Parkes

3 0.71347779 248 nips-2013-Point Based Value Iteration with Optimal Belief Compression for Dec-POMDPs

Author: Liam C. MacDermed, Charles Isbell

4 0.6474359 228 nips-2013-Online Learning of Dynamic Parameters in Social Networks

Author: Shahin Shahrampour, Sasha Rakhlin, Ali Jadbabaie

5 0.56649828 71 nips-2013-Convergence of Monte Carlo Tree Search in Simultaneous Move Games

Author: Viliam Lisy, Vojta Kovarik, Marc Lanctot, Branislav Bosansky

Abstract: unkown-abstract

6 0.54563814 164 nips-2013-Learning and using language via recursive pragmatic reasoning about other agents

7 0.49943855 128 nips-2013-Generalized Method-of-Moments for Rank Aggregation

8 0.47674519 278 nips-2013-Reward Mapping for Transfer in Long-Lived Agents

9 0.4331055 162 nips-2013-Learning Trajectory Preferences for Manipulators via Iterative Improvement

10 0.39570922 255 nips-2013-Probabilistic Movement Primitives

11 0.38602111 17 nips-2013-A multi-agent control framework for co-adaptation in brain-computer interfaces

12 0.35778904 165 nips-2013-Learning from Limited Demonstrations

13 0.33915311 94 nips-2013-Distributed $k$-means and $k$-median Clustering on General Topologies

14 0.33727744 189 nips-2013-Message Passing Inference with Chemical Reaction Networks

15 0.31250107 332 nips-2013-Tracking Time-varying Graphical Structure

16 0.30485815 8 nips-2013-A Graphical Transformation for Belief Propagation: Maximum Weight Matchings and Odd-Sized Cycles

17 0.30445486 79 nips-2013-DESPOT: Online POMDP Planning with Regularization

18 0.3041096 250 nips-2013-Policy Shaping: Integrating Human Feedback with Reinforcement Learning

19 0.3000046 245 nips-2013-Pass-efficient unsupervised feature selection

20 0.29904598 168 nips-2013-Learning to Pass Expectation Propagation Messages

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(2, 0.011), (16, 0.03), (33, 0.08), (34, 0.131), (41, 0.026), (49, 0.036), (56, 0.067), (70, 0.432), (85, 0.036), (89, 0.014), (93, 0.05), (95, 0.013)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.92892921 267 nips-2013-Recurrent networks of coupled Winner-Take-All oscillators for solving constraint satisfaction problems

Author: Hesham Mostafa, Lorenz. K. Mueller, Giacomo Indiveri

Abstract: We present a recurrent neuronal network, modeled as a continuous-time dynamical system, that can solve constraint satisfaction problems. Discrete variables are represented by coupled Winner-Take-All (WTA) networks, and their values are encoded in localized patterns of oscillations that are learned by the recurrent weights in these networks. Constraints over the variables are encoded in the network connectivity. Although there are no sources of noise, the network can escape from local optima in its search for solutions that satisfy all constraints by modifying the effective network connectivity through oscillations. If there is no solution that satisﬁes all constraints, the network state changes in a seemingly random manner and its trajectory approximates a sampling procedure that selects a variable assignment with a probability that increases with the fraction of constraints satisﬁed by this assignment. External evidence, or input to the network, can force variables to speciﬁc values. When new inputs are applied, the network re-evaluates the entire set of variables in its search for states that satisfy the maximum number of constraints, while being consistent with the external input. Our results demonstrate that the proposed network architecture can perform a deterministic search for the optimal solution to problems with non-convex cost functions. The network is inspired by canonical microcircuit models of the cortex and suggests possible dynamical mechanisms to solve constraint satisfaction problems that can be present in biological networks, or implemented in neuromorphic electronic circuits. 1

same-paper 2 0.83893818 16 nips-2013-A message-passing algorithm for multi-agent trajectory planning

Author: Jose Bento, Nate Derbinsky, Javier Alonso-Mora, Jonathan S. Yedidia

3 0.83724886 84 nips-2013-Deep Neural Networks for Object Detection

Author: Christian Szegedy, Alexander Toshev, Dumitru Erhan

Abstract: Deep Neural Networks (DNNs) have recently shown outstanding performance on image classiﬁcation tasks [14]. In this paper we go one step further and address the problem of object detection using DNNs, that is not only classifying but also precisely localizing objects of various classes. We present a simple and yet powerful formulation of object detection as a regression problem to object bounding box masks. We deﬁne a multi-scale inference procedure which is able to produce high-resolution object detections at a low cost by a few network applications. State-of-the-art performance of the approach is shown on Pascal VOC. 1

4 0.83440763 157 nips-2013-Learning Multi-level Sparse Representations

Author: Ferran Diego Andilla, Fred A. Hamprecht

Abstract: Bilinear approximation of a matrix is a powerful paradigm of unsupervised learning. In some applications, however, there is a natural hierarchy of concepts that ought to be reﬂected in the unsupervised analysis. For example, in the neurosciences image sequence considered here, there are the semantic concepts of pixel → neuron → assembly that should ﬁnd their counterpart in the unsupervised analysis. Driven by this concrete problem, we propose a decomposition of the matrix of observations into a product of more than two sparse matrices, with the rank decreasing from lower to higher levels. In contrast to prior work, we allow for both hierarchical and heterarchical relations of lower-level to higher-level concepts. In addition, we learn the nature of these relations rather than imposing them. Finally, we describe an optimization scheme that allows to optimize the decomposition over all levels jointly, rather than in a greedy level-by-level fashion. The proposed bilevel SHMF (sparse heterarchical matrix factorization) is the ﬁrst formalism that allows to simultaneously interpret a calcium imaging sequence in terms of the constituent neurons, their membership in assemblies, and the time courses of both neurons and assemblies. Experiments show that the proposed model fully recovers the structure from difﬁcult synthetic data designed to imitate the experimental data. More importantly, bilevel SHMF yields plausible interpretations of real-world Calcium imaging data. 1

5 0.81029916 90 nips-2013-Direct 0-1 Loss Minimization and Margin Maximization with Boosting

Author: Shaodan Zhai, Tian Xia, Ming Tan, Shaojun Wang

Abstract: We propose a boosting method, DirectBoost, a greedy coordinate descent algorithm that builds an ensemble classiﬁer of weak classiﬁers through directly minimizing empirical classiﬁcation error over labeled training examples; once the training classiﬁcation error is reduced to a local coordinatewise minimum, DirectBoost runs a greedy coordinate ascent algorithm that continuously adds weak classiﬁers to maximize any targeted arbitrarily deﬁned margins until reaching a local coordinatewise maximum of the margins in a certain sense. Experimental results on a collection of machine-learning benchmark datasets show that DirectBoost gives better results than AdaBoost, LogitBoost, LPBoost with column generation and BrownBoost, and is noise tolerant when it maximizes an n′ th order bottom sample margin. 1

6 0.75354993 56 nips-2013-Better Approximation and Faster Algorithm Using the Proximal Average

7 0.73879045 15 nips-2013-A memory frontier for complex synapses

8 0.61811197 141 nips-2013-Inferring neural population dynamics from multiple partial recordings of the same neural circuit

9 0.61102957 121 nips-2013-Firing rate predictions in optimal balanced networks

10 0.60991639 77 nips-2013-Correlations strike back (again): the case of associative memory retrieval

11 0.57148409 162 nips-2013-Learning Trajectory Preferences for Manipulators via Iterative Improvement

12 0.54763234 136 nips-2013-Hierarchical Modular Optimization of Convolutional Networks Achieves Representations Similar to Macaque IT and Human Ventral Stream

13 0.54384869 264 nips-2013-Reciprocally Coupled Local Estimators Implement Bayesian Information Integration Distributively

14 0.54233539 22 nips-2013-Action is in the Eye of the Beholder: Eye-gaze Driven Model for Spatio-Temporal Action Localization

15 0.53089619 64 nips-2013-Compete to Compute

16 0.52489829 255 nips-2013-Probabilistic Movement Primitives

17 0.51742101 86 nips-2013-Demixing odors - fast inference in olfaction

18 0.5073368 114 nips-2013-Extracting regions of interest from biological images with convolutional sparse block coding

19 0.49710339 49 nips-2013-Bayesian Inference and Online Experimental Design for Mapping Neural Microcircuits

20 0.48849624 163 nips-2013-Learning a Deep Compact Image Representation for Visual Tracking