nips nips2003 nips2003-36 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Curt Bererton, Geoffrey J. Gordon, Sebastian Thrun
Abstract: The design of cooperative multi-robot systems is a highly active research area in robotics. Two lines of research in particular have generated interest: the solution of large, weakly coupled MDPs, and the design and implementation of market architectures. We propose a new algorithm which joins together these two lines of research. For a class of coupled MDPs, our algorithm automatically designs a market architecture which causes a decentralized multi-robot system to converge to a consistent policy. We can show that this policy is the same as the one which would be produced by a particular centralized planning algorithm. We demonstrate the new algorithm on three simulation examples: multi-robot towing, multi-robot path planning with a limited fuel resource, and coordinating behaviors in a game of paint ball. 1
Reference: text
sentIndex sentText sentNum sentScore
1 Two lines of research in particular have generated interest: the solution of large, weakly coupled MDPs, and the design and implementation of market architectures. [sent-4, score-0.198]
2 For a class of coupled MDPs, our algorithm automatically designs a market architecture which causes a decentralized multi-robot system to converge to a consistent policy. [sent-6, score-0.226]
3 We can show that this policy is the same as the one which would be produced by a particular centralized planning algorithm. [sent-7, score-0.407]
4 We demonstrate the new algorithm on three simulation examples: multi-robot towing, multi-robot path planning with a limited fuel resource, and coordinating behaviors in a game of paint ball. [sent-8, score-0.61]
5 1 Introduction In recent years, the design of cooperative multi-robot systems has become a highly active research area within robotics [1, 2, 3, 4, 5, 6]. [sent-9, score-0.189]
6 Many planning problems in robotics are best phrased as MDPs, defined over world states or—in case of partial observability—belief states [7]. [sent-10, score-0.403]
7 This enormous complexity has confined MDP planning techniques largely to single-robot systems. [sent-12, score-0.303]
8 In many cases, robots in a multi-robot system interact only in limited ways. [sent-13, score-0.388]
9 Robots might seek not to collide with each other [1], coordinate their locations to carry out a joint task [4, 6], or consume a joint resource with limited availability [8, 9, 10]. [sent-14, score-0.515]
10 Marketbased algorithms are particularly attractive for multi-robot planning because many common types of interactions can be phrased as constraints on resources such as space (two robots can’t occupy the same location at once) and time (a robot can only work on a limited number of tasks at once). [sent-18, score-1.201]
11 The resource usage and state depend on the robots’ plans between now and time t, which in turn depend on the price. [sent-20, score-0.504]
12 Worse yet, future resource usage depends on random events which can’t be predicted exactly. [sent-21, score-0.435]
13 In particular, we propose a general technique for decomposing multi-robot MDP problems into “loosely coupled” MDPs which interact only through resource production and consumption constraints. [sent-23, score-0.577]
14 The decomposition works by turning all interactions into streams of payments between robots, thereby allowing each robot to learn its own local value function. [sent-24, score-0.443]
15 The actual prices for these resources are set by a “master” agent; the master agent takes into account the possibility of re-allocating resources at each step, but it approximates the effect of interactions between robots. [sent-26, score-0.683]
16 Our algorithm can be distributed so that each robot reasons only about its own local interactions, and it always produces the same answer as a particular centralized planning algorithm. [sent-28, score-0.73]
17 ca ∈ N is the immediate reward for taking action a and Ta ∈ N ×N is the matrix representation of the transition probabilities for action a. [sent-41, score-0.218]
18 minV α · V (1) ∀a : V ≥ ca + γTa V maxfa a fa −γ a ca · f a T a Ta fa = α (2) ∀a : fa ≥ 0 The dual of the Bellman LP gives us an interesting alternative from which to view the problem of finding an optimal policy. [sent-44, score-0.289]
19 The vector fa represents the expected number of times we perform action a from each state. [sent-46, score-0.182]
20 , fi or Ai ) will distinguish the planning problems for different robots. [sent-50, score-0.47]
21 1 Algorithm Loosely coupled MDPs Our algorithm is designed for multi-robot problems that can be decomposed into separate single-robot MDPs which interact through the production or consumption of fictitious resources. [sent-52, score-0.243]
22 These resources may be physical goods such as fuel; or they may be logical resources such as the right to pass over a bridge at a particular time, the right to explore an area of the environment, or the right to collect reward for achieving a particular subgoal. [sent-53, score-0.287]
23 Time may be part of the individual robot states, in which case a resource could be the right to consume a unit of fuel at a particular time (a futures contract). [sent-54, score-0.949]
24 In more detail, each robot has a vector of state-action visitation frequencies f i which must satisfy its own local dynamics Ai fi = bi . [sent-55, score-0.564]
25 Its production or consumption of resources is defined by a matrix Ci : element (j, k) of Ci is the amount of resource j which is produced or consumed by robot i in state-action pair k. [sent-56, score-0.961]
26 (So, Ci fi is the vector of expected resource usages for robot i. [sent-57, score-0.956]
27 ) The robots interact through resource constraints: the instantaneous production and consumption of each resource must balance exactly. [sent-59, score-1.264]
28 This representation is in many ways related to an undirected dynamic Bayes network: each node of the network corresponds to the state and action of a single MDP, and a resource constraint involving a subset of the MDPs plays the role of a clique potential on the corresponding nodes. [sent-60, score-0.512]
29 In the same (trivial) sense as Bayes nets, our representation is completely general: by collapsing all robots into a single giant agent we can represent an arbitrary MDP. [sent-62, score-0.39]
30 More importantly, in the more-typical case that some pieces of our model can be written as resource constraints, we can achieve an exponential savings in representation size compared to the monolithic planning problem. [sent-63, score-0.693]
31 2 Approximation The resource constraints are what make loosely-coupled MDPs difficult to solve. [sent-65, score-0.436]
32 However, by making a simple approximation we can remove the nonlinearity and so factor our planning problem: we relax the resource constraints so that they must only be satisfied in expectation over all time steps, rather than deterministically on each time step. [sent-67, score-0.739]
33 Under this assumption, knowing the expected resources available to a robot allows that robot to plan independently: since Ci fi is the vector of expected resource usages for robot i, adding the constraint Ci fi = k to equation (2) gives us the single-robot resourceconstrained planning problem. [sent-68, score-2.284]
34 The (approximate) global planning problem then becomes to determine an optimal resource allocation among robots and corresponding single-robot plans, or equivalently to determine the optimal resource prices and corresponding single-robot value functions. [sent-69, score-1.682]
35 More formally, the planning problem is to solve (3): maxfi i ci · fi ∀i : Ai fi = bi (3) (∗) i C i fi = d ∀i : fi ≥ 0 Without the constraints marked (∗), this LP would represent a set of completely uncoupled robot planning problems. [sent-70, score-1.847]
36 The constraints (∗) are the approximated resource constraints: they say that expected production must equal expected consumption for each resource. [sent-71, score-0.62]
37 The resource prices are the dual variables for (∗), and the local value functions are the dual variables for the remaining equality constraints. [sent-72, score-0.707]
38 The quality of our prices and value functions will depend on whether it is valid to assume a single price for each resource: if the prices stay constant then our approximate plan will translate perfectly to the physical world. [sent-73, score-0.477]
39 4, two robots might each plan to break down at the same time and be towed by the other. [sent-76, score-0.423]
40 The only way to fix this problem is to make a more accurate model; in the worst case we will have to combine several robots into one large MDP so that we can track their joint allocation of resources at all times. [sent-77, score-0.449]
41 3 Action selection Because the value functions incorporate information about future actions and random events, the robots only need to look ahead a short time to choose good actions. [sent-79, score-0.297]
42 So, the robots can run a simple auction to determine their best joint action: each individual robot estimates its future cost for each action by a single-step backup from its value function. [sent-80, score-0.911]
43 The difference between these future costs then tells the robot how much it is willing to bid for the right to execute each action. [sent-81, score-0.375]
44 The optimal joint action is then the feasible action with the highest sum of bids. [sent-82, score-0.234]
45 4 Example R3 R2 G Re R1 Figure 1: A simple example (left panel): the objective is to have all robots (R1,R2,R3) reach the goal (G) where they receive a reward. [sent-84, score-0.297]
46 Any action may result in a robot becoming disabled, in which case it must be towed to the repair area (Re) to continue with the task. [sent-85, score-0.661]
47 Each robot receives a large reward upon reaching the goal but incurs a small cost for each step it takes. [sent-88, score-0.377]
48 Robots can break whenever they take a step, but a functioning robot may tow a failed robot to the repair area and the repaired robot may then proceed to the goal. [sent-89, score-1.268]
49 Each robot has the action set A = { 8connected move, pickup for towing, request tow}. [sent-90, score-0.459]
50 The state of each robot is its x position, its y position and its status {towing, going to goal, being towed, doing nothing}. [sent-91, score-0.371]
51 The joint state space of all three robots is |Sjoint | = |S|3 and the joint action space is |A| = 103 . [sent-94, score-0.477]
52 However, this problem lends itself to resource-based decomposition because the robots only interact through towing. [sent-96, score-0.411]
53 Specifically, we design our Ci matrices to represent the constraint that the expected number of times a robot executes a pickup action at a position should be equal to the expected number of times some other robot executes a request-tow action. [sent-97, score-0.932]
54 Thus, we have a weakly coupled MDP with robot interactions that can be modeled by linear constraints. [sent-98, score-0.467]
55 5 Dantzig-Wolfe decomposition We have reduced the multi-robot planning problem to the problem of solving the LP (3). [sent-104, score-0.384]
56 So, one possible planning algorithm is just to pass this LP to a pre-packaged linear-program solver. [sent-105, score-0.303]
57 This planning algorithm can be fairly efficient, but it is completely centralized: each agent must communicate its entire dynamics to a central location and wait to receive its value function in return. [sent-106, score-0.374]
58 This decomposition splits our original LP (3) into a master LP (4) and one slave LP (5) for each robot i. [sent-109, score-0.724]
59 It then solves each slave program repeatedly, generating a new value for fi each time, and combines these solutions by inserting them into the master LP (Figure 2). [sent-110, score-0.538]
60 Each slave LP is the same as the corresponding robot’s MDP except that it has different state-action costs; so, the robots can run standard MDP planners (which are often much faster than general LP solvers) to produce their plans. [sent-112, score-0.443]
61 And, instead of sending whole MDPs and value functions back and forth, the Dantzig-Wolfe decomposition only needs to send resource prices and expected usages. [sent-113, score-0.692]
62 The master program can be located on a separate agent, or on an arbitrary robot. [sent-114, score-0.247]
63 In more detail, the master and slave LPs are: maxqi i cT Fi qi i (∗) Ci (Fi qi ) = d i ∀i : qi ≥ 0 ∀i : j qij = 1 (4) maxfi (cT − pT Ci )fi i A i fi = b i fi ≥ 0 (5) The master LP is the same as the original problem (3) except that fi has been replaced by Fi qi . [sent-115, score-1.236]
64 Each column of Fi is one of the solutions fi which we have computed for the ith slave LP. [sent-116, score-0.313]
65 ) So, i solving the master LP means finding a convex combination qi of the known solutions for each slave LP. [sent-118, score-0.399]
66 The slave LP is the same as a single-robot planning problem (2) except that its costs have been altered by subtracting pT Ci . [sent-119, score-0.485]
67 The vector p is the dual variable for the constraints (∗) from the last time we solved the master LP. [sent-120, score-0.297]
68 6 An economic interpretation We have described how to use the Dantzig-Wolfe decomposition to derive an efficient distributed planning algorithm for loosely-coupled MDPs. [sent-122, score-0.442]
69 Associated with each row of the constraint matrices Ci in the master program (4) is a dual variable; that is, there is one dual variable pj for each resource j. [sent-125, score-0.753]
70 We can interpret this dual variable as a price for resource j. [sent-126, score-0.527]
71 To see why, notice that the slave program charges robot i a cost of pj [Ci ]j,k each time it visits state-action pair k, and that visiting state-action pair k consumes an amount [Ci ]j,k of resource j. [sent-127, score-0.948]
72 The Dantzig-Wolfe algorithm can be interpreted as a search for optimal resource prices. [sent-128, score-0.415]
73 The master agent repeatedly asks the robots what they would do if the prices were p, then tries to combine their answers to produce a good plan for all the robots together. [sent-129, score-1.077]
74 5 1 1 2 3 4 5 6 7 8 9 10 Iterations Figure 3: Auctions for multi-robot path planning with limited fuel usage. [sent-134, score-0.561]
75 Left to right: in an auction based on the assumption of cheap fuel, all robots go to the globally most tempting goal. [sent-135, score-0.427]
76 If we assume very expensive fuel, each robot crashes through obstacles and goes to its closest goal. [sent-136, score-0.378]
77 With the optimal fuel price, the auction trades goal quality against distance to achieve the best possible total cost. [sent-137, score-0.342]
78 We then place 15 robots in random starting locations and ask them to plan paths to 10 random goals. [sent-143, score-0.348]
79 Each robot can choose whichever goal it wants, but must pay a random goal-specific price. [sent-144, score-0.339]
80 The robots are coupled through a constraint on fuel usage: there is a quadratic penalty on total path length. [sent-145, score-0.577]
81 In this problem, our algorithm starts from an arbitrary initial guess at the value of a unit of fuel (which causes the individual robots to make poor policy decisions) and rapidly improves the estimated value by examining the individual robot plans. [sent-146, score-0.894]
82 To demonstrate scaling, we used our learning algorithm to coordinate the robot towing problem in the simulation shown in figure 4, with a grid size of 300 × 300 and 9 robots. [sent-148, score-0.47]
83 Many more robots could be handled, but because we only coordinated towing and not path Figure 4: Left: an example of the output of the algorithm on a towing problem on a map generated using the robots on the right. [sent-149, score-0.945]
84 Note that the nearest live robot (R1) tows the damaged robot to the repair area before heading to the goal. [sent-150, score-0.835]
85 planning in this example, there was a bottleneck at the repair area due to the unmodeled coordination. [sent-153, score-0.46]
86 Because our algorithm uses an arbitrary MDP planner as a subroutine, very large problems can be solved by combining our approach with fast planning algorithms. [sent-155, score-0.408]
87 The rules of the game are that the last team standing wins and that it takes 4 hits to cause a robot to fail. [sent-157, score-0.376]
88 There is a repair area to which a tagged teammate may be towed in order to repair it so that it may continue to play. [sent-158, score-0.475]
89 Policies used are: do nothing, attack target i, coordinated attack (with a teammate) target i, tow teammate i, and be repaired. [sent-161, score-0.365]
90 The objective of our multi-robot planner is to determine at a given time which fixed policy each robot on the team should be executing so that the team will perform better. [sent-163, score-0.571]
91 Coordination constraints are that any coordinated attacks or towing/repairing must be consistent: if teammate 1 requests a tow from teammate 2, then teammate 2 must perform a tow of teammate 1. [sent-164, score-0.81]
92 Enemy positions are sampled from the particle filters at the beginning of each rollout and each policy is evaluated over several possible enemy position combinations to determine the performance of a policy. [sent-167, score-0.181]
93 The robots replan at fixed intervals; the simulation is halted while planning occurs. [sent-168, score-0.6]
94 We compared our coordination planner to a similar planner without coordination. [sent-169, score-0.219]
95 The coordinated planner won 48 of 50 games against the default behavior. [sent-172, score-0.226]
96 Thus, the addition of coordination (via our factored planning algorithm) significantly improved the performance. [sent-173, score-0.356]
97 5 Conclusions We have developed a decentralized method for solving large loosely-coupled multi-robot planning problems. [sent-174, score-0.408]
98 Our algorithm works by finding an optimal solution to an approximate planning problem in which resource constraints hold only in expectation. [sent-175, score-0.764]
99 We have applied our algorithm to multirobot towing, optimal use of fuel in a multi-robot path planning problem, and planning for multi-robot paintball. [sent-178, score-0.911]
100 Optimizing schedules for prioritized path planning of multi-robot systems. [sent-184, score-0.34]
wordName wordTfidf (topN-words)
[('resource', 0.39), ('robot', 0.339), ('planning', 0.303), ('robots', 0.297), ('fuel', 0.187), ('master', 0.182), ('prices', 0.179), ('mdp', 0.168), ('fi', 0.167), ('ci', 0.151), ('slave', 0.146), ('lp', 0.137), ('teammate', 0.131), ('towing', 0.131), ('auction', 0.13), ('repair', 0.112), ('mdps', 0.106), ('resources', 0.102), ('tow', 0.094), ('action', 0.09), ('market', 0.089), ('planner', 0.083), ('decentralized', 0.081), ('enemy', 0.075), ('towed', 0.075), ('agent', 0.071), ('dual', 0.069), ('price', 0.068), ('robotics', 0.067), ('fa', 0.065), ('consumption', 0.065), ('production', 0.065), ('decomposition', 0.057), ('interact', 0.057), ('coupled', 0.056), ('multirobot', 0.056), ('centralized', 0.055), ('gi', 0.055), ('coordination', 0.053), ('coordinated', 0.052), ('plan', 0.051), ('economic', 0.049), ('cooperative', 0.049), ('paint', 0.049), ('policy', 0.049), ('qi', 0.047), ('interactions', 0.047), ('constraints', 0.046), ('usage', 0.045), ('area', 0.045), ('attack', 0.044), ('ta', 0.044), ('program', 0.043), ('obstacles', 0.039), ('send', 0.039), ('reward', 0.038), ('dias', 0.037), ('icra', 0.037), ('maxfi', 0.037), ('visitation', 0.037), ('plans', 0.037), ('team', 0.037), ('path', 0.037), ('costs', 0.036), ('bellman', 0.034), ('ct', 0.034), ('limited', 0.034), ('distributed', 0.033), ('default', 0.033), ('usages', 0.033), ('consume', 0.033), ('burgard', 0.033), ('phrased', 0.033), ('state', 0.032), ('gordon', 0.032), ('particle', 0.031), ('games', 0.03), ('consumes', 0.03), ('pickup', 0.03), ('loosely', 0.03), ('automation', 0.03), ('joint', 0.029), ('design', 0.028), ('won', 0.028), ('expected', 0.027), ('executes', 0.026), ('observability', 0.026), ('guestrin', 0.026), ('determine', 0.026), ('optimal', 0.025), ('simulator', 0.025), ('weakly', 0.025), ('solving', 0.024), ('robotic', 0.024), ('geoffrey', 0.024), ('sebastian', 0.023), ('nothing', 0.022), ('arbitrary', 0.022), ('frequencies', 0.021), ('allocation', 0.021)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000006 36 nips-2003-Auction Mechanism Design for Multi-Robot Coordination
Author: Curt Bererton, Geoffrey J. Gordon, Sebastian Thrun
Abstract: The design of cooperative multi-robot systems is a highly active research area in robotics. Two lines of research in particular have generated interest: the solution of large, weakly coupled MDPs, and the design and implementation of market architectures. We propose a new algorithm which joins together these two lines of research. For a class of coupled MDPs, our algorithm automatically designs a market architecture which causes a decentralized multi-robot system to converge to a consistent policy. We can show that this policy is the same as the one which would be produced by a particular centralized planning algorithm. We demonstrate the new algorithm on three simulation examples: multi-robot towing, multi-robot path planning with a limited fuel resource, and coordinating behaviors in a game of paint ball. 1
2 0.17970926 62 nips-2003-Envelope-based Planning in Relational MDPs
Author: Natalia H. Gardiol, Leslie P. Kaelbling
Abstract: A mobile robot acting in the world is faced with a large amount of sensory data and uncertainty in its action outcomes. Indeed, almost all interesting sequential decision-making domains involve large state spaces and large, stochastic action sets. We investigate a way to act intelligently as quickly as possible in domains where finding a complete policy would take a hopelessly long time. This approach, Relational Envelopebased Planning (REBP) tackles large, noisy problems along two axes. First, describing a domain as a relational MDP (instead of as an atomic or propositionally-factored MDP) allows problem structure and dynamics to be captured compactly with a small set of probabilistic, relational rules. Second, an envelope-based approach to planning lets an agent begin acting quickly within a restricted part of the full state space and to judiciously expand its envelope as resources permit. 1
3 0.1764545 33 nips-2003-Approximate Planning in POMDPs with Macro-Actions
Author: Georgios Theocharous, Leslie P. Kaelbling
Abstract: Recent research has demonstrated that useful POMDP solutions do not require consideration of the entire belief space. We extend this idea with the notion of temporal abstraction. We present and explore a new reinforcement learning algorithm over grid-points in belief space, which uses macro-actions and Monte Carlo updates of the Q-values. We apply the algorithm to a large scale robot navigation task and demonstrate that with temporal abstraction we can consider an even smaller part of the belief space, we can learn POMDP policies faster, and we can do information gathering more efficiently.
4 0.15523018 34 nips-2003-Approximate Policy Iteration with a Policy Language Bias
Author: Alan Fern, Sungwook Yoon, Robert Givan
Abstract: We explore approximate policy iteration, replacing the usual costfunction learning step with a learning step in policy space. We give policy-language biases that enable solution of very large relational Markov decision processes (MDPs) that no previous technique can solve. In particular, we induce high-quality domain-specific planners for classical planning domains (both deterministic and stochastic variants) by solving such domains as extremely large MDPs. 1
5 0.14047423 29 nips-2003-Applying Metric-Trees to Belief-Point POMDPs
Author: Joelle Pineau, Geoffrey J. Gordon, Sebastian Thrun
Abstract: Recent developments in grid-based and point-based approximation algorithms for POMDPs have greatly improved the tractability of POMDP planning. These approaches operate on sets of belief points by individually learning a value function for each point. In reality, belief points exist in a highly-structured metric simplex, but current POMDP algorithms do not exploit this property. This paper presents a new metric-tree algorithm which can be used in the context of POMDP planning to sort belief points spatially, and then perform fast value function updates over groups of points. We present results showing that this approach can reduce computation in point-based POMDP algorithms for a wide range of problems. 1
6 0.13299403 21 nips-2003-An Autonomous Robotic System for Mapping Abandoned Mines
7 0.12272006 61 nips-2003-Entrainment of Silicon Central Pattern Generators for Legged Locomotory Control
8 0.10703656 20 nips-2003-All learning is Local: Multi-agent Learning in Global Reward Games
9 0.10691212 116 nips-2003-Linear Program Approximations for Factored Continuous-State Markov Decision Processes
10 0.10181134 158 nips-2003-Policy Search by Dynamic Programming
11 0.10042307 26 nips-2003-An MDP-Based Approach to Online Mechanism Design
12 0.099581569 2 nips-2003-ARA*: Anytime A* with Provable Bounds on Sub-Optimality
13 0.093246028 68 nips-2003-Eye Movements for Reward Maximization
14 0.083363347 65 nips-2003-Extending Q-Learning to General Adaptive Multi-Agent Systems
15 0.0815209 105 nips-2003-Learning Near-Pareto-Optimal Conventions in Polynomial Time
16 0.080833763 167 nips-2003-Robustness in Markov Decision Problems with Uncertain Transition Matrices
17 0.080467209 64 nips-2003-Estimating Internal Variables and Paramters of a Learning Agent by a Particle Filter
18 0.078815259 110 nips-2003-Learning a World Model and Planning with a Self-Organizing, Dynamic Neural System
19 0.072994716 42 nips-2003-Bounded Finite State Controllers
20 0.071469508 78 nips-2003-Gaussian Processes in Reinforcement Learning
topicId topicWeight
[(0, -0.18), (1, 0.25), (2, -0.107), (3, 0.034), (4, -0.046), (5, -0.057), (6, -0.144), (7, -0.094), (8, 0.058), (9, 0.017), (10, 0.053), (11, -0.123), (12, -0.02), (13, 0.13), (14, -0.019), (15, 0.085), (16, 0.095), (17, -0.051), (18, -0.024), (19, 0.085), (20, 0.008), (21, -0.041), (22, -0.098), (23, -0.001), (24, 0.126), (25, 0.045), (26, -0.118), (27, 0.015), (28, -0.064), (29, 0.057), (30, 0.075), (31, -0.165), (32, -0.056), (33, -0.004), (34, 0.233), (35, -0.133), (36, -0.152), (37, 0.012), (38, -0.017), (39, 0.126), (40, -0.111), (41, -0.053), (42, 0.119), (43, -0.019), (44, 0.046), (45, 0.059), (46, -0.09), (47, -0.007), (48, 0.054), (49, 0.07)]
simIndex simValue paperId paperTitle
same-paper 1 0.97225714 36 nips-2003-Auction Mechanism Design for Multi-Robot Coordination
Author: Curt Bererton, Geoffrey J. Gordon, Sebastian Thrun
Abstract: The design of cooperative multi-robot systems is a highly active research area in robotics. Two lines of research in particular have generated interest: the solution of large, weakly coupled MDPs, and the design and implementation of market architectures. We propose a new algorithm which joins together these two lines of research. For a class of coupled MDPs, our algorithm automatically designs a market architecture which causes a decentralized multi-robot system to converge to a consistent policy. We can show that this policy is the same as the one which would be produced by a particular centralized planning algorithm. We demonstrate the new algorithm on three simulation examples: multi-robot towing, multi-robot path planning with a limited fuel resource, and coordinating behaviors in a game of paint ball. 1
2 0.82134169 2 nips-2003-ARA*: Anytime A* with Provable Bounds on Sub-Optimality
Author: Maxim Likhachev, Geoffrey J. Gordon, Sebastian Thrun
Abstract: In real world planning problems, time for deliberation is often limited. Anytime planners are well suited for these problems: they find a feasible solution quickly and then continually work on improving it until time runs out. In this paper we propose an anytime heuristic search, ARA*, which tunes its performance bound based on available search time. It starts by finding a suboptimal solution quickly using a loose bound, then tightens the bound progressively as time allows. Given enough time it finds a provably optimal solution. While improving its bound, ARA* reuses previous search efforts and, as a result, is significantly more efficient than other anytime search methods. In addition to our theoretical analysis, we demonstrate the practical utility of ARA* with experiments on a simulated robot kinematic arm and a dynamic path planning problem for an outdoor rover. 1
3 0.73981571 21 nips-2003-An Autonomous Robotic System for Mapping Abandoned Mines
Author: David Ferguson, Aaron Morris, Dirk Hähnel, Christopher Baker, Zachary Omohundro, Carlos Reverte, Scott Thayer, Charles Whittaker, William Whittaker, Wolfram Burgard, Sebastian Thrun
Abstract: We present the software architecture of a robotic system for mapping abandoned mines. The software is capable of acquiring consistent 2D maps of large mines with many cycles, represented as Markov random £elds. 3D C-space maps are acquired from local 3D range scans, which are used to identify navigable paths using A* search. Our system has been deployed in three abandoned mines, two of which inaccessible to people, where it has acquired maps of unprecedented detail and accuracy. 1
4 0.59806609 62 nips-2003-Envelope-based Planning in Relational MDPs
Author: Natalia H. Gardiol, Leslie P. Kaelbling
Abstract: A mobile robot acting in the world is faced with a large amount of sensory data and uncertainty in its action outcomes. Indeed, almost all interesting sequential decision-making domains involve large state spaces and large, stochastic action sets. We investigate a way to act intelligently as quickly as possible in domains where finding a complete policy would take a hopelessly long time. This approach, Relational Envelopebased Planning (REBP) tackles large, noisy problems along two axes. First, describing a domain as a relational MDP (instead of as an atomic or propositionally-factored MDP) allows problem structure and dynamics to be captured compactly with a small set of probabilistic, relational rules. Second, an envelope-based approach to planning lets an agent begin acting quickly within a restricted part of the full state space and to judiciously expand its envelope as resources permit. 1
5 0.47567087 33 nips-2003-Approximate Planning in POMDPs with Macro-Actions
Author: Georgios Theocharous, Leslie P. Kaelbling
Abstract: Recent research has demonstrated that useful POMDP solutions do not require consideration of the entire belief space. We extend this idea with the notion of temporal abstraction. We present and explore a new reinforcement learning algorithm over grid-points in belief space, which uses macro-actions and Monte Carlo updates of the Q-values. We apply the algorithm to a large scale robot navigation task and demonstrate that with temporal abstraction we can consider an even smaller part of the belief space, we can learn POMDP policies faster, and we can do information gathering more efficiently.
6 0.4708176 34 nips-2003-Approximate Policy Iteration with a Policy Language Bias
7 0.42089847 116 nips-2003-Linear Program Approximations for Factored Continuous-State Markov Decision Processes
8 0.39214298 158 nips-2003-Policy Search by Dynamic Programming
9 0.36091989 68 nips-2003-Eye Movements for Reward Maximization
10 0.34755218 28 nips-2003-Application of SVMs for Colour Classification and Collision Detection with AIBO Robots
11 0.33375874 29 nips-2003-Applying Metric-Trees to Belief-Point POMDPs
12 0.32004723 105 nips-2003-Learning Near-Pareto-Optimal Conventions in Polynomial Time
13 0.30853978 110 nips-2003-Learning a World Model and Planning with a Self-Organizing, Dynamic Neural System
14 0.30381969 20 nips-2003-All learning is Local: Multi-agent Learning in Global Reward Games
15 0.30373904 61 nips-2003-Entrainment of Silicon Central Pattern Generators for Legged Locomotory Control
16 0.29615527 44 nips-2003-Can We Learn to Beat the Best Stock
17 0.28934878 43 nips-2003-Bounded Invariance and the Formation of Place Fields
18 0.28750587 6 nips-2003-A Fast Multi-Resolution Method for Detection of Significant Spatial Disease Clusters
19 0.2720978 26 nips-2003-An MDP-Based Approach to Online Mechanism Design
20 0.26658154 8 nips-2003-A Holistic Approach to Compositional Semantics: a connectionist model and robot experiments
topicId topicWeight
[(0, 0.07), (11, 0.026), (21, 0.319), (29, 0.02), (30, 0.03), (35, 0.074), (53, 0.066), (69, 0.018), (71, 0.034), (76, 0.025), (82, 0.016), (85, 0.092), (91, 0.106), (99, 0.023)]
simIndex simValue paperId paperTitle
same-paper 1 0.81315839 36 nips-2003-Auction Mechanism Design for Multi-Robot Coordination
Author: Curt Bererton, Geoffrey J. Gordon, Sebastian Thrun
Abstract: The design of cooperative multi-robot systems is a highly active research area in robotics. Two lines of research in particular have generated interest: the solution of large, weakly coupled MDPs, and the design and implementation of market architectures. We propose a new algorithm which joins together these two lines of research. For a class of coupled MDPs, our algorithm automatically designs a market architecture which causes a decentralized multi-robot system to converge to a consistent policy. We can show that this policy is the same as the one which would be produced by a particular centralized planning algorithm. We demonstrate the new algorithm on three simulation examples: multi-robot towing, multi-robot path planning with a limited fuel resource, and coordinating behaviors in a game of paint ball. 1
2 0.72522593 116 nips-2003-Linear Program Approximations for Factored Continuous-State Markov Decision Processes
Author: Milos Hauskrecht, Branislav Kveton
Abstract: Approximate linear programming (ALP) has emerged recently as one of the most promising methods for solving complex factored MDPs with finite state spaces. In this work we show that ALP solutions are not limited only to MDPs with finite state spaces, but that they can also be applied successfully to factored continuous-state MDPs (CMDPs). We show how one can build an ALP-based approximation for such a model and contrast it to existing solution methods. We argue that this approach offers a robust alternative for solving high dimensional continuous-state space problems. The point is supported by experiments on three CMDP problems with 24-25 continuous state factors. 1
3 0.50024569 20 nips-2003-All learning is Local: Multi-agent Learning in Global Reward Games
Author: Yu-han Chang, Tracey Ho, Leslie P. Kaelbling
Abstract: In large multiagent games, partial observability, coordination, and credit assignment persistently plague attempts to design good learning algorithms. We provide a simple and efficient algorithm that in part uses a linear system to model the world from a single agent’s limited perspective, and takes advantage of Kalman filtering to allow an agent to construct a good training signal and learn an effective policy. 1
4 0.49481162 78 nips-2003-Gaussian Processes in Reinforcement Learning
Author: Malte Kuss, Carl E. Rasmussen
Abstract: We exploit some useful properties of Gaussian process (GP) regression models for reinforcement learning in continuous state spaces and discrete time. We demonstrate how the GP model allows evaluation of the value function in closed form. The resulting policy iteration algorithm is demonstrated on a simple problem with a two dimensional state space. Further, we speculate that the intrinsic ability of GP models to characterise distributions of functions would allow the method to capture entire distributions over future values instead of merely their expectation, which has traditionally been the focus of much of reinforcement learning.
5 0.49362972 91 nips-2003-Inferring State Sequences for Non-linear Systems with Embedded Hidden Markov Models
Author: Radford M. Neal, Matthew J. Beal, Sam T. Roweis
Abstract: We describe a Markov chain method for sampling from the distribution of the hidden state sequence in a non-linear dynamical system, given a sequence of observations. This method updates all states in the sequence simultaneously using an embedded Hidden Markov Model (HMM). An update begins with the creation of “pools” of candidate states at each time. We then define an embedded HMM whose states are indexes within these pools. Using a forward-backward dynamic programming algorithm, we can efficiently choose a state sequence with the appropriate probabilities from the exponentially large number of state sequences that pass through states in these pools. We illustrate the method in a simple one-dimensional example, and in an example showing how an embedded HMM can be used to in effect discretize the state space without any discretization error. We also compare the embedded HMM to a particle smoother on a more substantial problem of inferring human motion from 2D traces of markers. 1
6 0.48677948 158 nips-2003-Policy Search by Dynamic Programming
7 0.48177296 101 nips-2003-Large Margin Classifiers: Convex Loss, Low Noise, and Convergence Rates
8 0.48170939 4 nips-2003-A Biologically Plausible Algorithm for Reinforcement-shaped Representational Learning
9 0.48142469 33 nips-2003-Approximate Planning in POMDPs with Macro-Actions
10 0.48051703 113 nips-2003-Learning with Local and Global Consistency
11 0.48030791 68 nips-2003-Eye Movements for Reward Maximization
12 0.47991845 34 nips-2003-Approximate Policy Iteration with a Policy Language Bias
13 0.47979027 109 nips-2003-Learning a Rare Event Detection Cascade by Direct Feature Selection
14 0.47904032 50 nips-2003-Denoising and Untangling Graphs Using Degree Priors
15 0.4713583 146 nips-2003-Online Learning of Non-stationary Sequences
16 0.46984458 57 nips-2003-Dynamical Modeling with Kernels for Nonlinear Time Series Prediction
17 0.46868852 72 nips-2003-Fast Feature Selection from Microarray Expression Data via Multiplicative Large Margin Algorithms
18 0.4686684 65 nips-2003-Extending Q-Learning to General Adaptive Multi-Agent Systems
19 0.46826756 55 nips-2003-Distributed Optimization in Adaptive Networks
20 0.46826422 24 nips-2003-An Iterative Improvement Procedure for Hierarchical Clustering