nips nips2013 nips2013-16 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Jose Bento, Nate Derbinsky, Javier Alonso-Mora, Jonathan S. Yedidia
Abstract: We describe a novel approach for computing collision-free global trajectories for p agents with specified initial and final configurations, based on an improved version of the alternating direction method of multipliers (ADMM). Compared with existing methods, our approach is naturally parallelizable and allows for incorporating different cost functionals with only minor adjustments. We apply our method to classical challenging instances and observe that its computational requirements scale well with p for several cost functionals. We also show that a specialization of our algorithm can be used for local motion planning by solving the problem of joint optimization in velocity space. 1
Reference: text
sentIndex sentText sentNum sentScore
1 A message-passing algorithm for multi-agent trajectory planning Jos´ Bento ⇤ e jbento@disneyresearch. [sent-1, score-0.29]
2 com Abstract We describe a novel approach for computing collision-free global trajectories for p agents with specified initial and final configurations, based on an improved version of the alternating direction method of multipliers (ADMM). [sent-6, score-0.766]
3 We also show that a specialization of our algorithm can be used for local motion planning by solving the problem of joint optimization in velocity space. [sent-9, score-0.49]
4 This paper focuses on trajectory planning for multiple agents, an important problem in robotics [1, 2], computer animation, and crowd simulation [3]. [sent-12, score-0.358]
5 Centralized planning for multiple agents is PSPACE hard [4, 5]. [sent-13, score-0.642]
6 To contend with this complexity, traditional multi-agent planning prioritizes agents and computes their trajectories sequentially [6], leading to suboptimal solutions. [sent-14, score-0.843]
7 By contrast, our method plans for all agents simultaneously. [sent-15, score-0.488]
8 Trajectory planning is also simplified if agents are non-distinct and can be dynamically assigned to a set of goal positions [1]. [sent-16, score-0.703]
9 Due to the complexity of planning collision-free trajectories, real-time robot navigation is commonly decoupled into a global planner and a fast local planner that performs collision-avoidance. [sent-22, score-0.314]
10 Building on an extension of the velocity-obstacles approach, recent work on centralized collision avoidance [12] computes collision-free local motions for all agents whilst maximizing a joint utility using either a computationally expensive MIQP or an efficient, though local, QP. [sent-24, score-0.746]
11 In this paper we formalize the global trajectory planning task as follows. [sent-27, score-0.325]
12 Given p agents of different radii {ri }p with given desired initial and final positions, {xi (0)}p and {xi (T )}p , along with i=1 i=1 i=1 a cost functional over trajectories, compute collision-free trajectories for all agents that minimize p the cost functional. [sent-28, score-1.347]
13 That is, find a set of intermediate points {xi (t)}i=1 , t 2 (0, T ), that satisfies the “hard” collision-free constraints that kxi (t) xj (t)k > ri + rj , for all i, j and t, and that insofar as possible, minimizes the cost functional. [sent-29, score-0.276]
14 The method we propose searches for a solution within the space of piece-wise linear trajectories, wherein the trajectory of an agent is completely specified by a set of positions at a fixed set of time instants {ts }⌘ . [sent-30, score-0.344]
15 All other intermediate points of the trajectories are computed by assuming that each agent moves with constant velocity in between break-points: if t1 and t2 > t1 are consecutive break-points, then xi (t) = t2 1 t1 ((t2 t)xi (t1 ) + (t t1 )xi (t2 )) for t 2 [t1 , t2 ]. [sent-32, score-0.658]
16 The main contributions of this paper are as follows: i) We formulate the global trajectory planning task as a decomposable optimization problem. [sent-34, score-0.361]
17 A particularly crucial minimizer ensures there are no inter-agent collisions, but we also derive other minimizers that allow for finding trajectories with minimal total energy, avoiding static obstacles, or imposing dynamic constraints, such as maximum/minimum agent velocity. [sent-39, score-0.789]
18 iii) We show that our method can specialize to perform local planning by solving the problem of joint optimization in velocity space [12]. [sent-40, score-0.449]
19 Our work is among the few examples where the success of applying ADMM to find approximate solutions to a large non-convex problems can be judged with the naked eye, by the gracefulness of the trajectories found. [sent-41, score-0.237]
20 2 Minimizers in the TWA In this section we provide a short description of the TWA [13], and, in particular, the role of the minimizer building blocks that it needs to solve a particular optimization problem. [sent-50, score-0.272]
21 The TWA solves this optimization problem iteratively by passing messages on a bipartite graph, in the form of a Forney factor graph [14]: one minimizer-node per function fb , one equality-node per variable xj and an edge (b, j), connecting b and j, if fb depends on xj (see Figure 1-left). [sent-58, score-0.413]
22 Right: The input and output variables for each minimizer block. [sent-60, score-0.236]
23 ⇢ ⇢ The equality-nodes receive these local messages and weights and produce consensus estimates for all variables by computing an average of the incoming messages, weighted by the incoming certainty weights ! [sent-68, score-0.422]
24 From these consensus estimates, correcting messages are computed and communicated ⇢ back to the minimizers to help them reach consensus. [sent-70, score-0.371]
25 For example, the minimizer g1 receives correcting messages n1,1 and n1,3 with corresponding certainty weights ⇢ 1,1 and ⇢ 1,3 (see Figure 1-right). [sent-72, score-0.507]
26 b,j } that this minimizer outputs must be 0 (uncertain); 1 (certain); or ⇢0 , set to some fixed ⇢ value. [sent-75, score-0.236]
27 The logic for setting weights from minimizer-nodes depends on the problem; as we shall see, in trajectory planning problems, we only use 0 or ⇢0 weights. [sent-76, score-0.403]
28 If we choose that all minimizers always output weights equal to ⇢0 , the TWA reduces to standard ADMM; however, 0-weights allows equality-nodes to ignore inactive constraints, traversing the search space much faster. [sent-77, score-0.253]
29 Finally, notice that all minimizers can operate simultaneously, and the same is true for the consensus calculation performed by each equality-node. [sent-78, score-0.233]
30 3 Global trajectory planning We now turn to describing our decomposition of the global trajectory planning optimization problem in detail. [sent-80, score-0.651]
31 Rather, our variables are the positions {xi (s)}i2[p] , where the trajectories are indexed by i and break-points are indexed by a discrete variable s taking values between 1 and ⌘ 1. [sent-85, score-0.262]
32 1 Formulation as unconstrained optimization without static obstacles In terms of these variables, the non-collision constraints2 are k(↵xi (s + 1) + (1 ↵)xi (s)) (↵xj (s + 1) + (1 for all i, j 2 [p], s 2 {0, . [sent-88, score-0.277]
33 ↵)xj (s))k ri + rj , (2) The parameter ↵ is used to trace out the constant-velocity trajectories of agents i and j between break-points s + 1 and s. [sent-92, score-0.77]
34 If t1 is the absolute time of the break-point with integer index s and t2 is the absolute time of the break-point with integer index s + 1 and t parametrizes the trajectories in absolute time then ↵ = (t t1 )/(t2 t1 ). [sent-94, score-0.3]
35 Note that in the above formulation, absolute time does not appear, and any solution is simply a set of paths that, when travelled by each agent at constant velocity between break-points, leads to no collisions. [sent-95, score-0.368]
36 When converting this solution into trajectories parameterized by absolute time, the break-points do not need to be chosen uniformly spaced in absolute time. [sent-96, score-0.267]
37 the integrated kinetic energy coll or the maximum velocity over all the agents) and the function fr,r0 is defined as, coll fr,r0 (x, x, x0 , x0 ) = J(k↵(x x0 ) + (1 ↵)(x x0 )k r + r0 8↵ 2 [0, 1]). [sent-100, score-0.985]
38 (4) In this section, x and x represent the position of an arbitrary agent of radius r at two consecutive break-points and x0 and x0 the position of a second arbitrary agent of radius r0 at the same breakpoints. [sent-101, score-0.493]
39 ) takes the value 0 whenever its argument, a clause, is true and coll takes the value +1 otherwise. [sent-103, score-0.303]
40 For example, we might prefer trajectories for which the total kinetic energy spent by the set of agents is cost small. [sent-107, score-0.908]
41 In this case, defining fC (x, x) = Ckx xk2 , we have, p n 1 f cost ({xi (s)}i,s ) = 1 X X cost f (xi (s), xi (s + 1)). [sent-108, score-0.215]
42 pn i=1 s=0 Ci,s (5) where the coefficients {Ci,s } can account for agents with different masses, different absolute-time intervals between-break points or different preferences regarding which agents we want to be less active and which agents are allowed to move faster. [sent-109, score-1.464]
43 More simply, we might want to exclude trajectories in which agents move faster than a certain amount, but without distinguishing among all remaining trajectories. [sent-110, score-0.689]
44 (6) In this case, associating each break-point to a time instant, the coefficients {Ci,s } in expression (5) would represent different limits on the velocity of different agents between different sections of the trajectory. [sent-112, score-0.712]
45 If we want to force all agents to have a minimum velocity we can simply reverse the inequality in (6). [sent-113, score-0.738]
46 Since the agents are round, this allows for a single point of contact between two agents and does not reduce practical relevance. [sent-115, score-0.976]
47 2 Formulation as unconstrained optimization with static obstacles In many scenarios agents should also avoid collisions with static obstacles. [sent-117, score-0.965]
48 Given two points in space, xL and xR , we can forbid all agents from crossing the line segment from xL to xR by adding Pp Pn 1 wall the following term to the function (3): i=1 s=0 fxL ,xR ,ri (xi (s), xi (s + 1)). [sent-118, score-0.745]
49 We recall that ri is the radius of agent i and wall fxL ,xR ,r (x, x) = J(k(↵x + (1 Notice that f coll ↵)x) can be expressed using f ( xR + (1 wall coll fr,r0 (x, x, x0 , x0 ) )xL )k r for all ↵, . [sent-119, score-1.128]
50 (7) (8) We use this fact later to express the minimizer associated with agent-agent collisions using the minimizer associated with agent-obstacle collisions. [sent-122, score-0.58]
51 xi (s) 2 R2 for all i 2 [p] and s+1 2 [⌘ +1], being able to avoid collisions with a general static line segment allows to automatically avoid collisions with multiple static obstacles of arbitrary polygonal shape. [sent-125, score-0.645]
52 Our numerical experiments only consider agents in the plane and so, in this paper, we only describe the minimizer block for wall collision for a 2D world. [sent-126, score-1.04]
53 The unconstrained formulation (3) then tells us that the global objective function is cost the sum of ⌘p(p + 1)/2 terms: ⌘p(p 1)/2 functions f coll and ⌘p functions fC . [sent-136, score-0.472]
54 In particular, the equalitynode associated with the break-point variable xi (s), ⌘ > s > 0, is connected to 2(p 1) different cost g coll minimizer-nodes and two different gC minimizer-nodes. [sent-139, score-0.454]
55 If s = 0 or s = ⌘ the equality-node cost only connects to half as many g coll nodes and gC nodes. [sent-140, score-0.367]
56 Every minimizer basically is a special case of (1). [sent-142, score-0.236]
57 1 Agent-agent collision minimizer We start with the minimizer associated with the functions f coll , that we denoted by g coll . [sent-145, score-1.211]
58 This minimizer receives as parameters the radius, r and r0 , of the two agents whose collision it is avoiding. [sent-146, score-0.888]
59 The minimizer takes as input a set of incoming n-messages, {n, n, n0 , n0 }, and associated ⇢ -weights, { ⇢ , ⇢ , ⇢ 0 , ⇢ 0 }, and outputs a set of updated x-variables according to expression (9). [sent-147, score-0.275]
60 Messages n and n come from the two equality-nodes associated with the positions of one of the agents at two consecutive break-points and n0 and n0 from the corresponding equality-nodes for the other agent. [sent-148, score-0.584]
61 g coll (n, n, n0 , n0 , ⇢ , ⇢ , ⇢ 0 , ⇢ 0 , r, r0 ) = arg min {x,x,x0 ,x0 } coll fr,r0 (x, x, x0 , x0 ) ⇢ ⇢0 0 ⇢ ⇢0 0 kx nk2 + kx nk2 + kx n0 k2 + kx n0 k2 . [sent-149, score-0.99]
62 If the trajectory from n to n for ⇢ an agent of radius r does not collide with the trajectory from n0 to n0 for an agent of radius r0 then set all the outgoing weights ! [sent-152, score-0.81]
63 + The solution to (9) is found using the agent-obstacle collision minimizer that we describe next. [sent-156, score-0.369]
64 2 Agent-obstacle collision minimizer The minimizer for f wall is denoted by g wall . [sent-159, score-0.891]
65 It is parameterized by the obstacle position {xL , xR } as well as the radius of the agent that needs to avoid the obstacle. [sent-160, score-0.295]
66 It receives two n-messages, {n, n}, and corresponding weights { ⇢ , ⇢ }, from the equality-nodes associated with two consecutive positions of an agent that needs to avoid the obstacle. [sent-161, score-0.31]
67 Its output, the x-variables, are defined as wall g wall (n, n, r, xL , xR , ⇢ , ⇢ ) = arg min fxL ,xR ,r (x, x) + {x,x} ⇢ kx 2 nk2 + ⇢ kx 2 nk2 . [sent-162, score-0.478]
68 (10) When agents move in the plane (2D), this minimizer can be solved by reformulating the optimization in (10) as a mechanical problem involving a system of springs that we can solve exactly and efficiently. [sent-163, score-0.8]
69 -weights is similar to that of the g coll minimizer. [sent-166, score-0.303]
70 If an agent of radius ⇢ r going from n and n does not collide with the line segment from xL to xR then set all outgoing weights to zero because the constraint is inactive; otherwise set all the outgoing weights to ⇢0 . [sent-167, score-0.489]
71 Notice that, from (8), it follows that the agent-agent minimizer g coll can be expressed using g wall . [sent-168, score-0.682]
72 More concretely, as proved in the supplementary material, Section C, ⇣ ⌘ g coll (n, n, n0 , n0 , ⇢ , ⇢ , ⇢ 0 , ⇢ 0 , r, r0 ) = M2 g wall M1 . [sent-169, score-0.476]
73 3 Minimum energy and maximum (minimum) velocity minimizer When f cost can be decomposed as in (5), the minimizer associated with the functions f cost is denoted by g cost and receives as input two n-messages, {n, n}, and corresponding weights, { ⇢ , ⇢ }. [sent-173, score-1.005]
74 The minimizer is also parameterized by a cost factor c. [sent-175, score-0.3]
75 It outputs a set of updated x-messages defined as cost g cost (n, n, ⇢ , ⇢ , c) = arg min fc (x, x) + {x,x} ⇢ kx 2 nk2 + ⇢ kx 2 nk2 . [sent-176, score-0.38]
76 -weights of the minimum energy minimizer is very simply: always set all ⇢ outgoing weights ! [sent-178, score-0.466]
77 -weights of the maximum velocity minimizer ⇢ ⇢ is the following. [sent-181, score-0.46]
78 The update logic for the minimum velocity minimizer is similar. [sent-184, score-0.554]
79 The solution to the minimum energy, maximum velocity and minimum velocity minimizer is written in the supplementary material in Sections E, F, and G respectively. [sent-188, score-0.803]
80 Note that the lack of open-source scalable algorithms for global trajectory planning in the literature makes it difficult to benchmark our performance against other methods. [sent-190, score-0.325]
81 All the tests described here are for agents in a twodimensional plane. [sent-192, score-0.488]
82 6 The first test considers scenario CONF1: p (even) agents of radius r, equally spaced around on a circle of radius R, are each required to exchange position with the corresponding antipodal agent, r = (5/4)R sin(⇡/2(p 4)). [sent-199, score-0.689]
83 This is a classical difficult test scenario because the straight line motion of all agents to their goal would result in them all colliding in the center of the circle. [sent-200, score-0.529]
84 The objective is to minimize the total kinetic energy (C in the energy minimizer is set to 1). [sent-203, score-0.477]
85 The agents initial and final configuration is randomly chosen in the plane (CONF2). [sent-219, score-0.57]
86 7 5 Local trajectory planning based on velocity obstacles In this section we show how the joint optimization presented in [12], which is based on the concept of velocity obstacles [11] (VO), can be also solved via the message-passing TWA. [sent-232, score-1.036]
87 In VO, given the current position {xi (0)}i2[p] and radius {ri } of all agents, a new velocity command is computed ref jointly for all agents minimizing the distance to their preferred velocity {vi }i2[p] . [sent-233, score-1.145]
88 This new velocity command must guarantee that the trajectories of all agents remain collision-free for at least a time horizon ⌧ . [sent-234, score-0.945]
89 New collision-free velocities are computed every ↵⌧ seconds, ↵ < 1, until all agents reach their final configuration. [sent-235, score-0.521]
90 k(1 ↵)(xi (0) xj (0)) + ↵(xi xj )k ri + rj 8 j > i 2 [p], ↵ 2 [0, 1] (12) 2 ref ref where = Ci /⌧ , xi = xi (0) + vi ⌧ and we have dropped the ⌧ in xi (⌧ ). [sent-241, score-0.638]
91 The above problem, extended to account for collisions with the static line segments {xRk , xLk }k , can be formulated in an unconstrained form using the functions f cost , f coll and f wall . [sent-242, score-0.728]
92 Namely, X X XX cost coll wall min fC 0 (xi , xref ) + fri ,rj (xi (0), xi , xj (0), xj ) + fxR k ,xL k ,ri (xi (0), xi ). [sent-243, score-0.912]
93 All corresponding minimizers are special cases of minimizers derived in the previous section for global trajectory planning (see Section H in the supplementary material for details). [sent-246, score-0.744]
94 Both approaches use a mechanism for breaking the symmetry from CONF1 and avoid deadlocks: theirs uses a preferential rotation direction for agents, while we use agents with slightly different C coefficients in their energy minimizers (Cith agent = 1 + 0. [sent-251, score-0.888]
95 6 Conclusion and future work We have presented a novel algorithm for global and local planning of the trajectory of multiple distinct agents, a problem known to be hard. [sent-256, score-0.36]
96 Our implementation of the algorithm in Java on a regular desktop computer, using a basic scheduler/synchronization over its few cores, already scales to hundreds of agents and achieves real-time performance for local planning. [sent-260, score-0.523]
97 For agents in the plane, we derived explicit expressions that account for static obstacles, moving obstacles, and dynamic constraints on the velocity and energy. [sent-262, score-0.801]
98 acceleration constraints) and provide fast solvers to our minimizers for agents in 3D. [sent-265, score-0.664]
99 For example, minimizers like g coll could be solved by message exchange between pairs of agents within a maximum communication radius. [sent-267, score-0.967]
100 Collision avoidance for multiple agents with joint utility maximization. [sent-311, score-0.578]
wordName wordTfidf (topN-words)
[('agents', 0.488), ('coll', 0.303), ('twa', 0.289), ('minimizer', 0.236), ('velocity', 0.224), ('trajectories', 0.201), ('minimizers', 0.176), ('planning', 0.154), ('wall', 0.143), ('trajectory', 0.136), ('collision', 0.133), ('obstacles', 0.131), ('agent', 0.111), ('collisions', 0.108), ('admm', 0.107), ('messages', 0.105), ('kx', 0.096), ('avoidance', 0.09), ('miqp', 0.09), ('xi', 0.087), ('energy', 0.086), ('radius', 0.083), ('cores', 0.08), ('outgoing', 0.073), ('xj', 0.071), ('xr', 0.07), ('kinetic', 0.069), ('logic', 0.068), ('robotics', 0.068), ('static', 0.065), ('cost', 0.064), ('positions', 0.061), ('fc', 0.06), ('xl', 0.06), ('ref', 0.059), ('sec', 0.059), ('certainty', 0.057), ('consensus', 0.057), ('fxl', 0.054), ('xref', 0.054), ('bipartite', 0.052), ('unconstrained', 0.045), ('weights', 0.045), ('animation', 0.044), ('initial', 0.042), ('ri', 0.042), ('javier', 0.041), ('motion', 0.041), ('plane', 0.04), ('minima', 0.04), ('fb', 0.039), ('obstacle', 0.039), ('rj', 0.039), ('incoming', 0.039), ('nal', 0.037), ('material', 0.037), ('derbinsky', 0.036), ('forney', 0.036), ('gracefulness', 0.036), ('instants', 0.036), ('kxi', 0.036), ('raffaello', 0.036), ('roland', 0.036), ('siegwart', 0.036), ('vi', 0.036), ('optimization', 0.036), ('consecutive', 0.035), ('local', 0.035), ('position', 0.035), ('global', 0.035), ('absolute', 0.033), ('correcting', 0.033), ('mobile', 0.033), ('velocities', 0.033), ('inactive', 0.032), ('bento', 0.032), ('collide', 0.032), ('command', 0.032), ('fri', 0.032), ('nate', 0.032), ('navigation', 0.032), ('receives', 0.031), ('convergence', 0.031), ('supplementary', 0.03), ('planner', 0.029), ('ru', 0.029), ('vo', 0.029), ('multiplier', 0.029), ('magnus', 0.028), ('guration', 0.027), ('ci', 0.027), ('segment', 0.027), ('avoid', 0.027), ('java', 0.026), ('minimum', 0.026), ('formulation', 0.025), ('jos', 0.025), ('jonathan', 0.025), ('magnitude', 0.024), ('constraints', 0.024)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999964 16 nips-2013-A message-passing algorithm for multi-agent trajectory planning
Author: Jose Bento, Nate Derbinsky, Javier Alonso-Mora, Jonathan S. Yedidia
Abstract: We describe a novel approach for computing collision-free global trajectories for p agents with specified initial and final configurations, based on an improved version of the alternating direction method of multipliers (ADMM). Compared with existing methods, our approach is naturally parallelizable and allows for incorporating different cost functionals with only minor adjustments. We apply our method to classical challenging instances and observe that its computational requirements scale well with p for several cost functionals. We also show that a specialization of our algorithm can be used for local motion planning by solving the problem of joint optimization in velocity space. 1
2 0.27533942 129 nips-2013-Generalized Random Utility Models with Multiple Types
Author: Hossein Azari Soufiani, Hansheng Diao, Zhenyu Lai, David C. Parkes
Abstract: We propose a model for demand estimation in multi-agent, differentiated product settings and present an estimation algorithm that uses reversible jump MCMC techniques to classify agents’ types. Our model extends the popular setup in Berry, Levinsohn and Pakes (1995) to allow for the data-driven classification of agents’ types using agent-level data. We focus on applications involving data on agents’ ranking over alternatives, and present theoretical conditions that establish the identifiability of the model and uni-modality of the likelihood/posterior. Results on both real and simulated data provide support for the scalability of our approach. 1
3 0.18968703 228 nips-2013-Online Learning of Dynamic Parameters in Social Networks
Author: Shahin Shahrampour, Sasha Rakhlin, Ali Jadbabaie
Abstract: This paper addresses the problem of online learning in a dynamic setting. We consider a social network in which each individual observes a private signal about the underlying state of the world and communicates with her neighbors at each time period. Unlike many existing approaches, the underlying state is dynamic, and evolves according to a geometric random walk. We view the scenario as an optimization problem where agents aim to learn the true state while suffering the smallest possible loss. Based on the decomposition of the global loss function, we introduce two update mechanisms, each of which generates an estimate of the true state. We establish a tight bound on the rate of change of the underlying state, under which individuals can track the parameter with a bounded variance. Then, we characterize explicit expressions for the steady state mean-square deviation(MSD) of the estimates from the truth, per individual. We observe that only one of the estimators recovers the optimal MSD, which underscores the impact of the objective function decomposition on the learning quality. Finally, we provide an upper bound on the regret of the proposed methods, measured as an average of errors in estimating the parameter in a finite time. 1
4 0.16131888 248 nips-2013-Point Based Value Iteration with Optimal Belief Compression for Dec-POMDPs
Author: Liam C. MacDermed, Charles Isbell
Abstract: We present four major results towards solving decentralized partially observable Markov decision problems (DecPOMDPs) culminating in an algorithm that outperforms all existing algorithms on all but one standard infinite-horizon benchmark problems. (1) We give an integer program that solves collaborative Bayesian games (CBGs). The program is notable because its linear relaxation is very often integral. (2) We show that a DecPOMDP with bounded belief can be converted to a POMDP (albeit with actions exponential in the number of beliefs). These actions correspond to strategies of a CBG. (3) We present a method to transform any DecPOMDP into a DecPOMDP with bounded beliefs (the number of beliefs is a free parameter) using optimal (not lossless) belief compression. (4) We show that the combination of these results opens the door for new classes of DecPOMDP algorithms based on previous POMDP algorithms. We choose one such algorithm, point-based valued iteration, and modify it to produce the first tractable value iteration method for DecPOMDPs that outperforms existing algorithms. 1
5 0.13972318 278 nips-2013-Reward Mapping for Transfer in Long-Lived Agents
Author: Xiaoxiao Guo, Satinder Singh, Richard L. Lewis
Abstract: We consider how to transfer knowledge from previous tasks (MDPs) to a current task in long-lived and bounded agents that must solve a sequence of tasks over a finite lifetime. A novel aspect of our transfer approach is that we reuse reward functions. While this may seem counterintuitive, we build on the insight of recent work on the optimal rewards problem that guiding an agent’s behavior with reward functions other than the task-specifying reward function can help overcome computational bounds of the agent. Specifically, we use good guidance reward functions learned on previous tasks in the sequence to incrementally train a reward mapping function that maps task-specifying reward functions into good initial guidance reward functions for subsequent tasks. We demonstrate that our approach can substantially improve the agent’s performance relative to other approaches, including an approach that transfers policies. 1
6 0.13436376 162 nips-2013-Learning Trajectory Preferences for Manipulators via Iterative Improvement
7 0.10062405 164 nips-2013-Learning and using language via recursive pragmatic reasoning about other agents
8 0.09517128 293 nips-2013-Sign Cauchy Projections and Chi-Square Kernel
9 0.07805638 168 nips-2013-Learning to Pass Expectation Propagation Messages
10 0.074841179 255 nips-2013-Probabilistic Movement Primitives
11 0.072376348 123 nips-2013-Flexible sampling of discrete data correlations without the marginal distributions
12 0.072144739 79 nips-2013-DESPOT: Online POMDP Planning with Regularization
13 0.072042406 146 nips-2013-Large Scale Distributed Sparse Precision Estimation
14 0.071682654 318 nips-2013-Structured Learning via Logistic Regression
15 0.067144975 32 nips-2013-Aggregating Optimistic Planning Trees for Solving Markov Decision Processes
16 0.064111292 17 nips-2013-A multi-agent control framework for co-adaptation in brain-computer interfaces
17 0.059600621 150 nips-2013-Learning Adaptive Value of Information for Structured Prediction
18 0.059090689 323 nips-2013-Synthesizing Robust Plans under Incomplete Domain Models
19 0.057817094 43 nips-2013-Auxiliary-variable Exact Hamiltonian Monte Carlo Samplers for Binary Distributions
20 0.055617526 53 nips-2013-Bayesian inference for low rank spatiotemporal neural receptive fields
topicId topicWeight
[(0, 0.156), (1, -0.018), (2, -0.028), (3, 0.021), (4, 0.01), (5, 0.049), (6, 0.04), (7, -0.016), (8, -0.02), (9, 0.002), (10, 0.024), (11, -0.006), (12, 0.047), (13, 0.001), (14, -0.197), (15, 0.274), (16, 0.01), (17, 0.093), (18, 0.125), (19, 0.006), (20, -0.263), (21, -0.097), (22, -0.004), (23, 0.026), (24, -0.145), (25, -0.055), (26, 0.002), (27, 0.173), (28, 0.119), (29, -0.0), (30, 0.025), (31, 0.086), (32, 0.058), (33, -0.116), (34, -0.027), (35, 0.107), (36, 0.027), (37, -0.041), (38, 0.01), (39, -0.066), (40, -0.037), (41, 0.021), (42, -0.033), (43, -0.003), (44, 0.021), (45, -0.05), (46, 0.04), (47, 0.005), (48, -0.037), (49, -0.013)]
simIndex simValue paperId paperTitle
same-paper 1 0.94240803 16 nips-2013-A message-passing algorithm for multi-agent trajectory planning
Author: Jose Bento, Nate Derbinsky, Javier Alonso-Mora, Jonathan S. Yedidia
Abstract: We describe a novel approach for computing collision-free global trajectories for p agents with specified initial and final configurations, based on an improved version of the alternating direction method of multipliers (ADMM). Compared with existing methods, our approach is naturally parallelizable and allows for incorporating different cost functionals with only minor adjustments. We apply our method to classical challenging instances and observe that its computational requirements scale well with p for several cost functionals. We also show that a specialization of our algorithm can be used for local motion planning by solving the problem of joint optimization in velocity space. 1
2 0.83420634 129 nips-2013-Generalized Random Utility Models with Multiple Types
Author: Hossein Azari Soufiani, Hansheng Diao, Zhenyu Lai, David C. Parkes
Abstract: We propose a model for demand estimation in multi-agent, differentiated product settings and present an estimation algorithm that uses reversible jump MCMC techniques to classify agents’ types. Our model extends the popular setup in Berry, Levinsohn and Pakes (1995) to allow for the data-driven classification of agents’ types using agent-level data. We focus on applications involving data on agents’ ranking over alternatives, and present theoretical conditions that establish the identifiability of the model and uni-modality of the likelihood/posterior. Results on both real and simulated data provide support for the scalability of our approach. 1
3 0.71347779 248 nips-2013-Point Based Value Iteration with Optimal Belief Compression for Dec-POMDPs
Author: Liam C. MacDermed, Charles Isbell
Abstract: We present four major results towards solving decentralized partially observable Markov decision problems (DecPOMDPs) culminating in an algorithm that outperforms all existing algorithms on all but one standard infinite-horizon benchmark problems. (1) We give an integer program that solves collaborative Bayesian games (CBGs). The program is notable because its linear relaxation is very often integral. (2) We show that a DecPOMDP with bounded belief can be converted to a POMDP (albeit with actions exponential in the number of beliefs). These actions correspond to strategies of a CBG. (3) We present a method to transform any DecPOMDP into a DecPOMDP with bounded beliefs (the number of beliefs is a free parameter) using optimal (not lossless) belief compression. (4) We show that the combination of these results opens the door for new classes of DecPOMDP algorithms based on previous POMDP algorithms. We choose one such algorithm, point-based valued iteration, and modify it to produce the first tractable value iteration method for DecPOMDPs that outperforms existing algorithms. 1
4 0.6474359 228 nips-2013-Online Learning of Dynamic Parameters in Social Networks
Author: Shahin Shahrampour, Sasha Rakhlin, Ali Jadbabaie
Abstract: This paper addresses the problem of online learning in a dynamic setting. We consider a social network in which each individual observes a private signal about the underlying state of the world and communicates with her neighbors at each time period. Unlike many existing approaches, the underlying state is dynamic, and evolves according to a geometric random walk. We view the scenario as an optimization problem where agents aim to learn the true state while suffering the smallest possible loss. Based on the decomposition of the global loss function, we introduce two update mechanisms, each of which generates an estimate of the true state. We establish a tight bound on the rate of change of the underlying state, under which individuals can track the parameter with a bounded variance. Then, we characterize explicit expressions for the steady state mean-square deviation(MSD) of the estimates from the truth, per individual. We observe that only one of the estimators recovers the optimal MSD, which underscores the impact of the objective function decomposition on the learning quality. Finally, we provide an upper bound on the regret of the proposed methods, measured as an average of errors in estimating the parameter in a finite time. 1
5 0.56649828 71 nips-2013-Convergence of Monte Carlo Tree Search in Simultaneous Move Games
Author: Viliam Lisy, Vojta Kovarik, Marc Lanctot, Branislav Bosansky
Abstract: unkown-abstract
6 0.54563814 164 nips-2013-Learning and using language via recursive pragmatic reasoning about other agents
7 0.49943855 128 nips-2013-Generalized Method-of-Moments for Rank Aggregation
8 0.47674519 278 nips-2013-Reward Mapping for Transfer in Long-Lived Agents
9 0.4331055 162 nips-2013-Learning Trajectory Preferences for Manipulators via Iterative Improvement
10 0.39570922 255 nips-2013-Probabilistic Movement Primitives
11 0.38602111 17 nips-2013-A multi-agent control framework for co-adaptation in brain-computer interfaces
12 0.35778904 165 nips-2013-Learning from Limited Demonstrations
13 0.33915311 94 nips-2013-Distributed $k$-means and $k$-median Clustering on General Topologies
14 0.33727744 189 nips-2013-Message Passing Inference with Chemical Reaction Networks
15 0.31250107 332 nips-2013-Tracking Time-varying Graphical Structure
16 0.30485815 8 nips-2013-A Graphical Transformation for Belief Propagation: Maximum Weight Matchings and Odd-Sized Cycles
17 0.30445486 79 nips-2013-DESPOT: Online POMDP Planning with Regularization
18 0.3041096 250 nips-2013-Policy Shaping: Integrating Human Feedback with Reinforcement Learning
19 0.3000046 245 nips-2013-Pass-efficient unsupervised feature selection
20 0.29904598 168 nips-2013-Learning to Pass Expectation Propagation Messages
topicId topicWeight
[(2, 0.011), (16, 0.03), (33, 0.08), (34, 0.131), (41, 0.026), (49, 0.036), (56, 0.067), (70, 0.432), (85, 0.036), (89, 0.014), (93, 0.05), (95, 0.013)]
simIndex simValue paperId paperTitle
1 0.92892921 267 nips-2013-Recurrent networks of coupled Winner-Take-All oscillators for solving constraint satisfaction problems
Author: Hesham Mostafa, Lorenz. K. Mueller, Giacomo Indiveri
Abstract: We present a recurrent neuronal network, modeled as a continuous-time dynamical system, that can solve constraint satisfaction problems. Discrete variables are represented by coupled Winner-Take-All (WTA) networks, and their values are encoded in localized patterns of oscillations that are learned by the recurrent weights in these networks. Constraints over the variables are encoded in the network connectivity. Although there are no sources of noise, the network can escape from local optima in its search for solutions that satisfy all constraints by modifying the effective network connectivity through oscillations. If there is no solution that satisfies all constraints, the network state changes in a seemingly random manner and its trajectory approximates a sampling procedure that selects a variable assignment with a probability that increases with the fraction of constraints satisfied by this assignment. External evidence, or input to the network, can force variables to specific values. When new inputs are applied, the network re-evaluates the entire set of variables in its search for states that satisfy the maximum number of constraints, while being consistent with the external input. Our results demonstrate that the proposed network architecture can perform a deterministic search for the optimal solution to problems with non-convex cost functions. The network is inspired by canonical microcircuit models of the cortex and suggests possible dynamical mechanisms to solve constraint satisfaction problems that can be present in biological networks, or implemented in neuromorphic electronic circuits. 1
same-paper 2 0.83893818 16 nips-2013-A message-passing algorithm for multi-agent trajectory planning
Author: Jose Bento, Nate Derbinsky, Javier Alonso-Mora, Jonathan S. Yedidia
Abstract: We describe a novel approach for computing collision-free global trajectories for p agents with specified initial and final configurations, based on an improved version of the alternating direction method of multipliers (ADMM). Compared with existing methods, our approach is naturally parallelizable and allows for incorporating different cost functionals with only minor adjustments. We apply our method to classical challenging instances and observe that its computational requirements scale well with p for several cost functionals. We also show that a specialization of our algorithm can be used for local motion planning by solving the problem of joint optimization in velocity space. 1
3 0.83724886 84 nips-2013-Deep Neural Networks for Object Detection
Author: Christian Szegedy, Alexander Toshev, Dumitru Erhan
Abstract: Deep Neural Networks (DNNs) have recently shown outstanding performance on image classification tasks [14]. In this paper we go one step further and address the problem of object detection using DNNs, that is not only classifying but also precisely localizing objects of various classes. We present a simple and yet powerful formulation of object detection as a regression problem to object bounding box masks. We define a multi-scale inference procedure which is able to produce high-resolution object detections at a low cost by a few network applications. State-of-the-art performance of the approach is shown on Pascal VOC. 1
4 0.83440763 157 nips-2013-Learning Multi-level Sparse Representations
Author: Ferran Diego Andilla, Fred A. Hamprecht
Abstract: Bilinear approximation of a matrix is a powerful paradigm of unsupervised learning. In some applications, however, there is a natural hierarchy of concepts that ought to be reflected in the unsupervised analysis. For example, in the neurosciences image sequence considered here, there are the semantic concepts of pixel → neuron → assembly that should find their counterpart in the unsupervised analysis. Driven by this concrete problem, we propose a decomposition of the matrix of observations into a product of more than two sparse matrices, with the rank decreasing from lower to higher levels. In contrast to prior work, we allow for both hierarchical and heterarchical relations of lower-level to higher-level concepts. In addition, we learn the nature of these relations rather than imposing them. Finally, we describe an optimization scheme that allows to optimize the decomposition over all levels jointly, rather than in a greedy level-by-level fashion. The proposed bilevel SHMF (sparse heterarchical matrix factorization) is the first formalism that allows to simultaneously interpret a calcium imaging sequence in terms of the constituent neurons, their membership in assemblies, and the time courses of both neurons and assemblies. Experiments show that the proposed model fully recovers the structure from difficult synthetic data designed to imitate the experimental data. More importantly, bilevel SHMF yields plausible interpretations of real-world Calcium imaging data. 1
5 0.81029916 90 nips-2013-Direct 0-1 Loss Minimization and Margin Maximization with Boosting
Author: Shaodan Zhai, Tian Xia, Ming Tan, Shaojun Wang
Abstract: We propose a boosting method, DirectBoost, a greedy coordinate descent algorithm that builds an ensemble classifier of weak classifiers through directly minimizing empirical classification error over labeled training examples; once the training classification error is reduced to a local coordinatewise minimum, DirectBoost runs a greedy coordinate ascent algorithm that continuously adds weak classifiers to maximize any targeted arbitrarily defined margins until reaching a local coordinatewise maximum of the margins in a certain sense. Experimental results on a collection of machine-learning benchmark datasets show that DirectBoost gives better results than AdaBoost, LogitBoost, LPBoost with column generation and BrownBoost, and is noise tolerant when it maximizes an n′ th order bottom sample margin. 1
6 0.75354993 56 nips-2013-Better Approximation and Faster Algorithm Using the Proximal Average
7 0.73879045 15 nips-2013-A memory frontier for complex synapses
8 0.61811197 141 nips-2013-Inferring neural population dynamics from multiple partial recordings of the same neural circuit
9 0.61102957 121 nips-2013-Firing rate predictions in optimal balanced networks
10 0.60991639 77 nips-2013-Correlations strike back (again): the case of associative memory retrieval
11 0.57148409 162 nips-2013-Learning Trajectory Preferences for Manipulators via Iterative Improvement
13 0.54384869 264 nips-2013-Reciprocally Coupled Local Estimators Implement Bayesian Information Integration Distributively
14 0.54233539 22 nips-2013-Action is in the Eye of the Beholder: Eye-gaze Driven Model for Spatio-Temporal Action Localization
15 0.53089619 64 nips-2013-Compete to Compute
16 0.52489829 255 nips-2013-Probabilistic Movement Primitives
17 0.51742101 86 nips-2013-Demixing odors - fast inference in olfaction
18 0.5073368 114 nips-2013-Extracting regions of interest from biological images with convolutional sparse block coding
19 0.49710339 49 nips-2013-Bayesian Inference and Online Experimental Design for Mapping Neural Microcircuits
20 0.48849624 163 nips-2013-Learning a Deep Compact Image Representation for Visual Tracking