nips nips2003 nips2003-62 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Natalia H. Gardiol, Leslie P. Kaelbling
Abstract: A mobile robot acting in the world is faced with a large amount of sensory data and uncertainty in its action outcomes. Indeed, almost all interesting sequential decision-making domains involve large state spaces and large, stochastic action sets. We investigate a way to act intelligently as quickly as possible in domains where finding a complete policy would take a hopelessly long time. This approach, Relational Envelopebased Planning (REBP) tackles large, noisy problems along two axes. First, describing a domain as a relational MDP (instead of as an atomic or propositionally-factored MDP) allows problem structure and dynamics to be captured compactly with a small set of probabilistic, relational rules. Second, an envelope-based approach to planning lets an agent begin acting quickly within a restricted part of the full state space and to judiciously expand its envelope as resources permit. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract A mobile robot acting in the world is faced with a large amount of sensory data and uncertainty in its action outcomes. [sent-6, score-0.236]
2 Indeed, almost all interesting sequential decision-making domains involve large state spaces and large, stochastic action sets. [sent-7, score-0.371]
3 We investigate a way to act intelligently as quickly as possible in domains where finding a complete policy would take a hopelessly long time. [sent-8, score-0.313]
4 First, describing a domain as a relational MDP (instead of as an atomic or propositionally-factored MDP) allows problem structure and dynamics to be captured compactly with a small set of probabilistic, relational rules. [sent-10, score-0.766]
5 Second, an envelope-based approach to planning lets an agent begin acting quickly within a restricted part of the full state space and to judiciously expand its envelope as resources permit. [sent-11, score-0.814]
6 1 Introduction Quickly generating generating usable plans when the world abounds with uncertainty is an important and difficult enterprise. [sent-12, score-0.201]
7 Consider the classic blocks world domain: the number of ways to make a stack of a certain height grows exponentially with the number of blocks on the table; and if the outcomes of actions are uncertain, the task becomes even more daunting. [sent-13, score-0.687]
8 We want planning techniques that can deal with large state spaces and large, stochastic action sets since most compelling, realistic domains have these characteristics. [sent-14, score-0.648]
9 In this paper we propose a method for planning in very large domains by using expressive rules to restrict attention to high-utility subsets of the state space. [sent-15, score-0.518]
10 Much of the work in traditional planning techniques centers on propositional, deterministic domains. [sent-16, score-0.302]
11 Efforts to extend classical planning approaches into stochastic domains include mainly techniques that work with fully-ground state spaces [13, 2]. [sent-18, score-0.519]
12 Conversely, efforts to move beyond propositional STRIPS-based planning involve work in mainly deterministic domains [6, 10]. [sent-19, score-0.606]
13 But the world is not deterministic: for an agent to act robustly, it must handle uncertain dynamics as well as large state and action spaces. [sent-20, score-0.403]
14 In order to describe large stochastic domains compactly, we need relational structures that can represent uncertainty in the dynamics. [sent-23, score-0.468]
15 Relational representations allow the structure of the domain to be expressed in terms of object properties rather than object identities and thus yield a much more compact representation of a domain than the equivalent propositional version can. [sent-24, score-0.333]
16 Other approaches in relational MDPs represent the value function as a decision-tree [5] or as a sum of local subfunctions [8]. [sent-28, score-0.282]
17 These approaches all compute full policies over complete state and action spaces, however, and so are of a different spirit than the work presented here. [sent-30, score-0.305]
18 Since fully-ground representations grow too big to be useful and purely logical representations are as yet unwieldy, we propose a middle path: we agree to ground things out, but in a principled, restricted way. [sent-32, score-0.282]
19 We represent world dynamics by a compact set of relational rules, and we extend the envelope method of Dean et al. [sent-33, score-0.745]
20 We quickly come up with an initial trajectory (an envelope of states) to the goal and then to refine the policy by gradually incorporating nearby states into the envelope. [sent-35, score-0.742]
21 Our approach strikes a balance along two axes: between fully ground and purely logical representations, and between straight-line plans and full MDP policies. [sent-37, score-0.328]
22 2 Planning with an Envelope in Relational Domains The envelope method was initially designed for planning in atomic-state MDPs. [sent-38, score-0.592]
23 Goals of achievement are encoded as reward functions, and planning now becomes finding a policy that maximizes a long-term measure of reward. [sent-39, score-0.424]
24 Extending the approach to a relational setting lets us cast the problem of planning in stochastic, relational domains in terms of finding a policy for a restricted Markovian state space. [sent-40, score-1.197]
25 1 Encoding Markovian dynamics with rules The first step to extending the envelope method to relational domains is to encode the world dynamics relationally. [sent-42, score-1.002]
26 A rule applies in a state if its precondition can be matched against some subset of the state ground predicates. [sent-47, score-0.393]
27 Given this structured representation of action dynamics, we define a relational MDP as a tuple P, Z, O, T ,R : States: The set of states is defined by a finite set P of relational predicates, representing the properties and relations that can hold among the finite set of domain objects, O. [sent-49, score-0.963]
28 Each RMDP state is a ground interpretation of the domain predicates over the domain objects. [sent-50, score-0.526]
29 Actions: The set of ground actions depends on the set of rules Z and the objects in the world. [sent-51, score-0.294]
30 For example, move(A, B) can be bound to the table arrangement in Figure 2(a) by binding A to block 1 and B to block 4 to yield the ground action move(1, 4). [sent-52, score-0.68]
31 Transition Dynamics: For each action, the distribution over next states is given compactly by the distribution over outcomes encoded in the schema. [sent-53, score-0.197]
32 00 ] (hold(A), clear(A, f), on(A, nil), clear(B , t), height(A,-1)) Figure 1: The set of relational rules, Z, for blocks-world dynamics. [sent-62, score-0.282]
33 3 chance of landing in a state where block 1 falls on the table, and a 0. [sent-65, score-0.324]
34 7 chance of landing in a state where block 1 is correctly put on block 4. [sent-66, score-0.505]
35 The rule outcomes themselves usually only specify a subset of the domain predicates, effectively describing a set of possible ground states. [sent-67, score-0.302]
36 We assume a static frame: state predicates not directly changed by the rule are assumed to remain the same. [sent-68, score-0.262]
37 2 Initial trajectory planning The next step is finding an initial path. [sent-71, score-0.353]
38 In a relational setting, when the underlying MDP space implied by the full instantiation of the representation is potentially huge, a good initial envelope is crucial. [sent-72, score-0.726]
39 It determines the quality of the early envelope policies and sets the stage for more elaborate policies later on. [sent-73, score-0.46]
40 For planning in traditional STRIPS domains, the Graphplan algorithm is known to be effective [1]. [sent-74, score-0.248]
41 Graphplan finds the shortest straight-line plan by iteratively growing a forwardchaining structure called a plangraph and testing for the presence of goal conditions at each step. [sent-75, score-0.282]
42 Nevertheless, this should give us pause: we have just said that our relational MDP describes a large underlying MDP. [sent-79, score-0.282]
43 Large numbers of actions cause severe problems for Graphplan-based planners [11] since the branching factor quickly chokes the forward-chaining plangraph construction. [sent-81, score-0.323]
44 A (a) B H H C hold(nil) 1 1 4 4 5 5 3 3 3 1 4 5 4 5 1 5 4 1 1 4 5 3 3 3 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 table table table table table table 2 2 2 table table table clear(table, t) (b) broke(f) on(b1,b2) on(b1,table) . [sent-83, score-0.684]
45 3 broke(t) on(b1,table) height(b2,0) clear(b2, t) (c) Figure 2: (a) Given this world configuration, the move action produces three types of effects. [sent-85, score-0.344]
46 (c) A plangraph fragment with a particular instance of move chained forward. [sent-87, score-0.299]
47 3 Equivalence-class sampling: reducing the planning action apace STRIPS rules require every variable in the rule schema to appear in the argument list, so move(A, B) becomes move(A, B, H, H , C). [sent-89, score-0.553]
48 The meaning of the operator shifts from “move A onto B ” to “move A at height H onto B at height H from C ”. [sent-90, score-0.575]
49 Not only is this awkward, but specifying all the variables in the argument list yields an exponential number of ground actions as the number of domain objects grows. [sent-91, score-0.41]
50 That is, when the operator move(A, B) takes two arguments, A and B , it means that the other variables (such as C , the block under A) are derivable from the relations in the rule schema. [sent-93, score-0.301]
51 There is one group of actions that move a block from one block and onto another; a group that moves a block from the table and onto a block of height zero; and another group that moves a block off the table and onto a block of height one. [sent-98, score-2.054]
52 Except for the identities of the argument blocks A and B , the actions in each class produce equivalent groundings for the properties of the related domain objects. [sent-99, score-0.404]
53 Rather than using all the actions, then, the plangraph can be constructed chaining forward only a sampled action from each class. [sent-100, score-0.321]
54 We call this equivalence-class sampling; the sampled action is representative of the effects of any action from that class. [sent-101, score-0.258]
55 We define a planning problem as containing: Rules: These are the relational operators that describe the action effects. [sent-104, score-0.708]
56 Initial World State: The set of ground predicates that describes the starting state. [sent-106, score-0.252]
57 REBP does not make the closed world assumption, so all predicates and objects required in the planning task must appear in the initial state. [sent-107, score-0.559]
58 7 Figure 3: An initial envelope corresponding to the plangraph segment of Figure 2(c) followed fringe sampling and envelope expansion. [sent-127, score-1.07]
59 Given a planning problem, there are now three main components to REBP: finding an initial plan, converting the plan into an MDP, and envelope manipulation. [sent-132, score-0.781]
60 1 Finding an initial plan The process for making the initial trajectory essentially follows the TGP algorithm described by Blum and Langford [2]. [sent-136, score-0.294]
61 The TGP algorithm starts with the initial world state as the first layer in the graph, a minimum probability cutoff for the plan, and a maximum plan depth. [sent-137, score-0.352]
62 The next step is to turn the sequence of action-effects into a well-defined envelope MDP; that is, we must compute the set of states and the transitions. [sent-142, score-0.448]
63 Usually, the sequence of action-effects alone leaves many state predicates unspecified. [sent-143, score-0.225]
64 To compute the set of actions, REBP loops through the list of operators and accumulates all the ground actions whose preconditions bind to any state in the envelope. [sent-146, score-0.373]
65 Transitions that initiate in an envelope state but do not land in an envelope state are redirected to OUT. [sent-147, score-0.86]
66 The leftmost MDP in Figure 3 shows the initial envelope corresponding to the one-step plan of Figure 2(c). [sent-148, score-0.533]
67 As a first step, we considered the simple precursor deliberation model, in which deliberation occurs for some number r times and is completed before execution takes place. [sent-154, score-0.241]
68 A round of deliberation involves sampling from the current policy to estimate which fringe states — states one step outside of the envelope — are likely. [sent-155, score-0.988]
69 In each round, REBP draws d · M samples (drawing from an exploratory action with probability ) and keeps counts of which fringe states are reached. [sent-156, score-0.34]
70 The f · M most likely fringes are added to the envelope, where M is number of states in the current envelope and d and f are scalars. [sent-157, score-0.448]
71 Figure 3 shows a sequence of fringe sampling and envelope expansion. [sent-159, score-0.497]
72 We see the incorporation of the fringe state in which the hand breaks as a result of move. [sent-160, score-0.235]
73 While simple, blocks world is a reasonably interesting first domain because, with enough blocks, it exposes the weaknesses of purely propositional approaches. [sent-164, score-0.369]
74 Its regular dynamics, on other hand, lend themselves to relational descriptions. [sent-165, score-0.282]
75 In this domain, blocks are stacked on one another, with the top block in a stack being clear. [sent-168, score-0.324]
76 Each block has a color and is at some height in the stack. [sent-169, score-0.47]
77 The pickup(A) action is deterministic and puts a clear block into the empty hand; a block in the hand is no longer clear, and its height and and on-ness are no longer defined. [sent-171, score-0.968]
78 The fix() action takes a broken hand and fixes it with some probability. [sent-172, score-0.196]
79 The stackon() action comes in two flavors: first, stackon(B), takes a block from the hand and puts it on block B , which may be dropped onto the table with a small probability; second, stackon(table), always puts the block from the hand onto the table. [sent-173, score-0.999]
80 5 Empirical Results We compared the quality of the policies generated by the following algorithms: REBP; envelope expansion starting from empty initial plan (i. [sent-180, score-0.644]
81 , the initial envelope containing only the initial world state); and policy iteration on the fully ground MDP. [sent-182, score-0.817]
82 4 In all cases, the policy was computed by simple policy iteration with a discount of 0. [sent-183, score-0.294]
83 4 Starting with the initial state, the set of states is generated by exhaustively applying our operators until no more new states are found; this yields the true set of reachable states. [sent-195, score-0.325]
84 The bottom plots show policy value against number of states for REBP and deliberation only (empty initial plan). [sent-198, score-0.427]
85 domain of 7 blocks results in an MDP of over 37,000 states with 1,191 actions, a combined state and action space is too overwhelming for the full MDP solution. [sent-199, score-0.522]
86 The REBP agent, on the other hand, is able to find plans for making stacks in domains of more than 12 blocks, which corresponds to an MDP of about 88,000 states and 3,000 ground actions. [sent-200, score-0.435]
87 The top row shows the value of the policy against execution time (as measured by a monitoring package) showing that the REBP algorithm produces good quality plans quickly. [sent-202, score-0.241]
88 For REBP, we start measuring the value of the policy at the point when initial trajectory finding ends and deliberation begins; for the full MDP solution, we measure the value of the policy at the end of each round of policy iteration. [sent-203, score-0.714]
89 Without the equivalence-class sampling, plangraph construction takes on the order of a couple of hours; with it, it takes a couple of minutes. [sent-205, score-0.263]
90 The bottom row shows the value of the policy against the number of states in the envelope so far and shows that the a good initial envelope is key for behaving well with fewer states. [sent-206, score-1.007]
91 6 Discussion and Conclusions Using the relational envelope method, we can take real advantage of relational generalization to produce good initial plans efficiently, and use envelope-growing techniques to improve the robustness of our plans incrementally as time permits. [sent-207, score-1.193]
92 REBP is a planning system that tries to dynamically reformulate an apparently intractable problem into a small, easily handled problem at run time. [sent-208, score-0.248]
93 Currently, the action sampling is a purely local decision made at each step of the plangraph. [sent-211, score-0.225]
94 If, on the other hand, the goal was to make a stack of height n − 1 with a green block on top, it could be problematic to construct the plangraph without considering block color in the sampled actions. [sent-213, score-0.878]
95 Furthermore, the current envelope-extension method is relatively undirected; it might be possible to diagnose more effectively which fringe states would be most profitable to add. [sent-215, score-0.211]
96 [4] could be employed to decide when to stop envelope growth, and to manage the eventual interleaving of envelope-growth and execution. [sent-217, score-0.344]
97 Currently the states in the envelope are essentially atomic; it ought to be possible to exploit the factored nature of relational representations to allow abstraction in the MDP model, with aggregate “states” in the MDP actually representing sets of states in the underlying world. [sent-218, score-0.874]
98 It produces an initial plan quickly by taking advantage of generalization among action effects, and as a result behaves smarter in a large space much sooner than it could by waiting for a full solution. [sent-220, score-0.392]
99 Speeding up relational reinforcement learning through the use of an incremental first order decision tree learner. [sent-239, score-0.282]
100 High-level planning and control with incomplete information using POMDPs. [sent-250, score-0.248]
wordName wordTfidf (topN-words)
[('envelope', 0.344), ('rebp', 0.285), ('relational', 0.282), ('planning', 0.248), ('height', 0.238), ('mdp', 0.198), ('broke', 0.196), ('block', 0.181), ('nil', 0.161), ('plangraph', 0.161), ('policy', 0.147), ('tgp', 0.143), ('predicates', 0.139), ('move', 0.138), ('action', 0.129), ('domains', 0.124), ('plan', 0.121), ('ground', 0.113), ('clear', 0.112), ('deliberation', 0.108), ('fringe', 0.107), ('stackon', 0.107), ('states', 0.104), ('plans', 0.094), ('domain', 0.094), ('actions', 0.094), ('graphplan', 0.089), ('state', 0.086), ('blocks', 0.077), ('world', 0.077), ('table', 0.076), ('hold', 0.072), ('precondition', 0.071), ('propositional', 0.071), ('initial', 0.068), ('stack', 0.066), ('rules', 0.06), ('pre', 0.059), ('outcomes', 0.058), ('policies', 0.058), ('pickup', 0.057), ('groundings', 0.054), ('orf', 0.054), ('dean', 0.053), ('color', 0.051), ('argument', 0.051), ('mdps', 0.05), ('purely', 0.05), ('operators', 0.049), ('strips', 0.046), ('sampling', 0.046), ('blum', 0.043), ('dynamics', 0.042), ('quickly', 0.042), ('hand', 0.042), ('european', 0.041), ('representations', 0.04), ('logical', 0.039), ('trajectory', 0.037), ('rule', 0.037), ('onto', 0.036), ('avrim', 0.036), ('incr', 0.036), ('koehler', 0.036), ('rmdp', 0.036), ('weld', 0.036), ('uncertain', 0.035), ('compactly', 0.035), ('puts', 0.035), ('agent', 0.034), ('identities', 0.034), ('craig', 0.034), ('full', 0.032), ('stochastic', 0.032), ('landing', 0.031), ('atomic', 0.031), ('bindings', 0.031), ('chaining', 0.031), ('derivable', 0.031), ('list', 0.031), ('extending', 0.031), ('uncertainty', 0.03), ('reward', 0.029), ('techniques', 0.029), ('expansion', 0.028), ('round', 0.028), ('lets', 0.028), ('nebel', 0.028), ('schema', 0.028), ('operator', 0.027), ('objects', 0.027), ('markovian', 0.026), ('couple', 0.026), ('pack', 0.026), ('branching', 0.026), ('falling', 0.026), ('chance', 0.026), ('empty', 0.025), ('takes', 0.025), ('deterministic', 0.025)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999934 62 nips-2003-Envelope-based Planning in Relational MDPs
Author: Natalia H. Gardiol, Leslie P. Kaelbling
Abstract: A mobile robot acting in the world is faced with a large amount of sensory data and uncertainty in its action outcomes. Indeed, almost all interesting sequential decision-making domains involve large state spaces and large, stochastic action sets. We investigate a way to act intelligently as quickly as possible in domains where finding a complete policy would take a hopelessly long time. This approach, Relational Envelopebased Planning (REBP) tackles large, noisy problems along two axes. First, describing a domain as a relational MDP (instead of as an atomic or propositionally-factored MDP) allows problem structure and dynamics to be captured compactly with a small set of probabilistic, relational rules. Second, an envelope-based approach to planning lets an agent begin acting quickly within a restricted part of the full state space and to judiciously expand its envelope as resources permit. 1
2 0.31136706 34 nips-2003-Approximate Policy Iteration with a Policy Language Bias
Author: Alan Fern, Sungwook Yoon, Robert Givan
Abstract: We explore approximate policy iteration, replacing the usual costfunction learning step with a learning step in policy space. We give policy-language biases that enable solution of very large relational Markov decision processes (MDPs) that no previous technique can solve. In particular, we induce high-quality domain-specific planners for classical planning domains (both deterministic and stochastic variants) by solving such domains as extremely large MDPs. 1
3 0.17970926 36 nips-2003-Auction Mechanism Design for Multi-Robot Coordination
Author: Curt Bererton, Geoffrey J. Gordon, Sebastian Thrun
Abstract: The design of cooperative multi-robot systems is a highly active research area in robotics. Two lines of research in particular have generated interest: the solution of large, weakly coupled MDPs, and the design and implementation of market architectures. We propose a new algorithm which joins together these two lines of research. For a class of coupled MDPs, our algorithm automatically designs a market architecture which causes a decentralized multi-robot system to converge to a consistent policy. We can show that this policy is the same as the one which would be produced by a particular centralized planning algorithm. We demonstrate the new algorithm on three simulation examples: multi-robot towing, multi-robot path planning with a limited fuel resource, and coordinating behaviors in a game of paint ball. 1
4 0.14376576 33 nips-2003-Approximate Planning in POMDPs with Macro-Actions
Author: Georgios Theocharous, Leslie P. Kaelbling
Abstract: Recent research has demonstrated that useful POMDP solutions do not require consideration of the entire belief space. We extend this idea with the notion of temporal abstraction. We present and explore a new reinforcement learning algorithm over grid-points in belief space, which uses macro-actions and Monte Carlo updates of the Q-values. We apply the algorithm to a large scale robot navigation task and demonstrate that with temporal abstraction we can consider an even smaller part of the belief space, we can learn POMDP policies faster, and we can do information gathering more efficiently.
5 0.13624336 158 nips-2003-Policy Search by Dynamic Programming
Author: J. A. Bagnell, Sham M. Kakade, Jeff G. Schneider, Andrew Y. Ng
Abstract: We consider the policy search approach to reinforcement learning. We show that if a “baseline distribution” is given (indicating roughly how often we expect a good policy to visit each state), then we can derive a policy search algorithm that terminates in a finite number of steps, and for which we can provide non-trivial performance guarantees. We also demonstrate this algorithm on several grid-world POMDPs, a planar biped walking robot, and a double-pole balancing problem. 1
6 0.1334915 78 nips-2003-Gaussian Processes in Reinforcement Learning
7 0.13144247 20 nips-2003-All learning is Local: Multi-agent Learning in Global Reward Games
8 0.11649421 29 nips-2003-Applying Metric-Trees to Belief-Point POMDPs
9 0.11051514 118 nips-2003-Link Prediction in Relational Data
10 0.10834534 64 nips-2003-Estimating Internal Variables and Paramters of a Learning Agent by a Particle Filter
11 0.10672526 116 nips-2003-Linear Program Approximations for Factored Continuous-State Markov Decision Processes
12 0.094225936 65 nips-2003-Extending Q-Learning to General Adaptive Multi-Agent Systems
13 0.090209052 167 nips-2003-Robustness in Markov Decision Problems with Uncertain Transition Matrices
14 0.088598125 42 nips-2003-Bounded Finite State Controllers
15 0.085387744 110 nips-2003-Learning a World Model and Planning with a Self-Organizing, Dynamic Neural System
16 0.084537558 70 nips-2003-Fast Algorithms for Large-State-Space HMMs with Applications to Web Usage Analysis
17 0.077661462 55 nips-2003-Distributed Optimization in Adaptive Networks
18 0.076599486 68 nips-2003-Eye Movements for Reward Maximization
19 0.072631292 26 nips-2003-An MDP-Based Approach to Online Mechanism Design
20 0.060241599 105 nips-2003-Learning Near-Pareto-Optimal Conventions in Polynomial Time
topicId topicWeight
[(0, -0.177), (1, 0.305), (2, -0.128), (3, 0.031), (4, -0.105), (5, -0.007), (6, -0.144), (7, -0.092), (8, 0.09), (9, 0.055), (10, 0.006), (11, -0.154), (12, -0.023), (13, 0.11), (14, 0.001), (15, 0.047), (16, 0.054), (17, -0.086), (18, -0.032), (19, 0.064), (20, -0.026), (21, 0.046), (22, 0.03), (23, 0.033), (24, -0.096), (25, -0.057), (26, -0.005), (27, -0.071), (28, 0.047), (29, -0.101), (30, 0.035), (31, -0.193), (32, 0.004), (33, 0.005), (34, 0.076), (35, -0.063), (36, -0.092), (37, 0.043), (38, -0.11), (39, -0.001), (40, 0.05), (41, -0.045), (42, -0.012), (43, -0.084), (44, -0.02), (45, 0.136), (46, 0.095), (47, -0.163), (48, -0.052), (49, -0.033)]
simIndex simValue paperId paperTitle
same-paper 1 0.97377717 62 nips-2003-Envelope-based Planning in Relational MDPs
Author: Natalia H. Gardiol, Leslie P. Kaelbling
Abstract: A mobile robot acting in the world is faced with a large amount of sensory data and uncertainty in its action outcomes. Indeed, almost all interesting sequential decision-making domains involve large state spaces and large, stochastic action sets. We investigate a way to act intelligently as quickly as possible in domains where finding a complete policy would take a hopelessly long time. This approach, Relational Envelopebased Planning (REBP) tackles large, noisy problems along two axes. First, describing a domain as a relational MDP (instead of as an atomic or propositionally-factored MDP) allows problem structure and dynamics to be captured compactly with a small set of probabilistic, relational rules. Second, an envelope-based approach to planning lets an agent begin acting quickly within a restricted part of the full state space and to judiciously expand its envelope as resources permit. 1
2 0.87216383 34 nips-2003-Approximate Policy Iteration with a Policy Language Bias
Author: Alan Fern, Sungwook Yoon, Robert Givan
Abstract: We explore approximate policy iteration, replacing the usual costfunction learning step with a learning step in policy space. We give policy-language biases that enable solution of very large relational Markov decision processes (MDPs) that no previous technique can solve. In particular, we induce high-quality domain-specific planners for classical planning domains (both deterministic and stochastic variants) by solving such domains as extremely large MDPs. 1
3 0.65305859 158 nips-2003-Policy Search by Dynamic Programming
Author: J. A. Bagnell, Sham M. Kakade, Jeff G. Schneider, Andrew Y. Ng
Abstract: We consider the policy search approach to reinforcement learning. We show that if a “baseline distribution” is given (indicating roughly how often we expect a good policy to visit each state), then we can derive a policy search algorithm that terminates in a finite number of steps, and for which we can provide non-trivial performance guarantees. We also demonstrate this algorithm on several grid-world POMDPs, a planar biped walking robot, and a double-pole balancing problem. 1
4 0.62397528 36 nips-2003-Auction Mechanism Design for Multi-Robot Coordination
Author: Curt Bererton, Geoffrey J. Gordon, Sebastian Thrun
Abstract: The design of cooperative multi-robot systems is a highly active research area in robotics. Two lines of research in particular have generated interest: the solution of large, weakly coupled MDPs, and the design and implementation of market architectures. We propose a new algorithm which joins together these two lines of research. For a class of coupled MDPs, our algorithm automatically designs a market architecture which causes a decentralized multi-robot system to converge to a consistent policy. We can show that this policy is the same as the one which would be produced by a particular centralized planning algorithm. We demonstrate the new algorithm on three simulation examples: multi-robot towing, multi-robot path planning with a limited fuel resource, and coordinating behaviors in a game of paint ball. 1
5 0.56816739 116 nips-2003-Linear Program Approximations for Factored Continuous-State Markov Decision Processes
Author: Milos Hauskrecht, Branislav Kveton
Abstract: Approximate linear programming (ALP) has emerged recently as one of the most promising methods for solving complex factored MDPs with finite state spaces. In this work we show that ALP solutions are not limited only to MDPs with finite state spaces, but that they can also be applied successfully to factored continuous-state MDPs (CMDPs). We show how one can build an ALP-based approximation for such a model and contrast it to existing solution methods. We argue that this approach offers a robust alternative for solving high dimensional continuous-state space problems. The point is supported by experiments on three CMDP problems with 24-25 continuous state factors. 1
6 0.51936448 118 nips-2003-Link Prediction in Relational Data
7 0.46411744 38 nips-2003-Autonomous Helicopter Flight via Reinforcement Learning
8 0.4521606 2 nips-2003-ARA*: Anytime A* with Provable Bounds on Sub-Optimality
9 0.42351145 110 nips-2003-Learning a World Model and Planning with a Self-Organizing, Dynamic Neural System
10 0.41471636 78 nips-2003-Gaussian Processes in Reinforcement Learning
11 0.38161013 65 nips-2003-Extending Q-Learning to General Adaptive Multi-Agent Systems
12 0.38069859 68 nips-2003-Eye Movements for Reward Maximization
13 0.37895754 33 nips-2003-Approximate Planning in POMDPs with Macro-Actions
14 0.37686762 167 nips-2003-Robustness in Markov Decision Problems with Uncertain Transition Matrices
15 0.34546632 42 nips-2003-Bounded Finite State Controllers
16 0.31785235 20 nips-2003-All learning is Local: Multi-agent Learning in Global Reward Games
17 0.3005774 105 nips-2003-Learning Near-Pareto-Optimal Conventions in Polynomial Time
18 0.29863256 70 nips-2003-Fast Algorithms for Large-State-Space HMMs with Applications to Web Usage Analysis
19 0.28183115 39 nips-2003-Bayesian Color Constancy with Non-Gaussian Models
20 0.27504972 55 nips-2003-Distributed Optimization in Adaptive Networks
topicId topicWeight
[(0, 0.037), (29, 0.013), (30, 0.496), (35, 0.055), (53, 0.05), (71, 0.034), (76, 0.026), (85, 0.056), (88, 0.012), (91, 0.094), (99, 0.032)]
simIndex simValue paperId paperTitle
1 0.94661909 157 nips-2003-Plasticity Kernels and Temporal Statistics
Author: Peter Dayan, Michael Häusser, Michael London
Abstract: Computational mysteries surround the kernels relating the magnitude and sign of changes in efficacy as a function of the time difference between pre- and post-synaptic activity at a synapse. One important idea34 is that kernels result from filtering, ie an attempt by synapses to eliminate noise corrupting learning. This idea has hitherto been applied to trace learning rules; we apply it to experimentally-defined kernels, using it to reverse-engineer assumed signal statistics. We also extend it to consider the additional goal for filtering of weighting learning according to statistical surprise, as in the Z-score transform. This provides a fresh view of observed kernels and can lead to different, and more natural, signal statistics.
same-paper 2 0.92326385 62 nips-2003-Envelope-based Planning in Relational MDPs
Author: Natalia H. Gardiol, Leslie P. Kaelbling
Abstract: A mobile robot acting in the world is faced with a large amount of sensory data and uncertainty in its action outcomes. Indeed, almost all interesting sequential decision-making domains involve large state spaces and large, stochastic action sets. We investigate a way to act intelligently as quickly as possible in domains where finding a complete policy would take a hopelessly long time. This approach, Relational Envelopebased Planning (REBP) tackles large, noisy problems along two axes. First, describing a domain as a relational MDP (instead of as an atomic or propositionally-factored MDP) allows problem structure and dynamics to be captured compactly with a small set of probabilistic, relational rules. Second, an envelope-based approach to planning lets an agent begin acting quickly within a restricted part of the full state space and to judiciously expand its envelope as resources permit. 1
3 0.87323296 71 nips-2003-Fast Embedding of Sparse Similarity Graphs
Author: John C. Platt
Abstract: This paper applies fast sparse multidimensional scaling (MDS) to a large graph of music similarity, with 267K vertices that represent artists, albums, and tracks; and 3.22M edges that represent similarity between those entities. Once vertices are assigned locations in a Euclidean space, the locations can be used to browse music and to generate playlists. MDS on very large sparse graphs can be effectively performed by a family of algorithms called Rectangular Dijsktra (RD) MDS algorithms. These RD algorithms operate on a dense rectangular slice of the distance matrix, created by calling Dijsktra a constant number of times. Two RD algorithms are compared: Landmark MDS, which uses the Nyström approximation to perform MDS; and a new algorithm called Fast Sparse Embedding, which uses FastMap. These algorithms compare favorably to Laplacian Eigenmaps, both in terms of speed and embedding quality. 1
4 0.85509676 144 nips-2003-One Microphone Blind Dereverberation Based on Quasi-periodicity of Speech Signals
Author: Tomohiro Nakatani, Masato Miyoshi, Keisuke Kinoshita
Abstract: Speech dereverberation is desirable with a view to achieving, for example, robust speech recognition in the real world. However, it is still a challenging problem, especially when using a single microphone. Although blind equalization techniques have been exploited, they cannot deal with speech signals appropriately because their assumptions are not satisfied by speech signals. We propose a new dereverberation principle based on an inherent property of speech signals, namely quasi-periodicity. The present methods learn the dereverberation filter from a lot of speech data with no prior knowledge of the data, and can achieve high quality speech dereverberation especially when the reverberation time is long. 1
5 0.54417878 34 nips-2003-Approximate Policy Iteration with a Policy Language Bias
Author: Alan Fern, Sungwook Yoon, Robert Givan
Abstract: We explore approximate policy iteration, replacing the usual costfunction learning step with a learning step in policy space. We give policy-language biases that enable solution of very large relational Markov decision processes (MDPs) that no previous technique can solve. In particular, we induce high-quality domain-specific planners for classical planning domains (both deterministic and stochastic variants) by solving such domains as extremely large MDPs. 1
6 0.50098413 162 nips-2003-Probabilistic Inference of Speech Signals from Phaseless Spectrograms
7 0.41106895 5 nips-2003-A Classification-based Cocktail-party Processor
8 0.400709 20 nips-2003-All learning is Local: Multi-agent Learning in Global Reward Games
9 0.39072561 15 nips-2003-A Probabilistic Model of Auditory Space Representation in the Barn Owl
10 0.38564816 159 nips-2003-Predicting Speech Intelligibility from a Population of Neurons
11 0.3671014 50 nips-2003-Denoising and Untangling Graphs Using Degree Priors
12 0.34906605 184 nips-2003-The Diffusion-Limited Biochemical Signal-Relay Channel
13 0.34710038 116 nips-2003-Linear Program Approximations for Factored Continuous-State Markov Decision Processes
14 0.34683019 27 nips-2003-Analytical Solution of Spike-timing Dependent Plasticity Based on Synaptic Biophysics
15 0.34351858 128 nips-2003-Minimax Embeddings
16 0.34235698 36 nips-2003-Auction Mechanism Design for Multi-Robot Coordination
17 0.33858582 57 nips-2003-Dynamical Modeling with Kernels for Nonlinear Time Series Prediction
18 0.33835757 158 nips-2003-Policy Search by Dynamic Programming
19 0.33774313 115 nips-2003-Linear Dependent Dimensionality Reduction
20 0.3328076 123 nips-2003-Markov Models for Automated ECG Interval Analysis