nips nips2003 nips2003-116 knowledge-graph by maker-knowledge-mining

116 nips-2003-Linear Program Approximations for Factored Continuous-State Markov Decision Processes

Source: pdf

Author: Milos Hauskrecht, Branislav Kveton

Abstract: Approximate linear programming (ALP) has emerged recently as one of the most promising methods for solving complex factored MDPs with ﬁnite state spaces. In this work we show that ALP solutions are not limited only to MDPs with ﬁnite state spaces, but that they can also be applied successfully to factored continuous-state MDPs (CMDPs). We show how one can build an ALP-based approximation for such a model and contrast it to existing solution methods. We argue that this approach offers a robust alternative for solving high dimensional continuous-state space problems. The point is supported by experiments on three CMDP problems with 24-25 continuous state factors. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu ¡ Abstract Approximate linear programming (ALP) has emerged recently as one of the most promising methods for solving complex factored MDPs with ﬁnite state spaces. [sent-3, score-0.628]

2 In this work we show that ALP solutions are not limited only to MDPs with ﬁnite state spaces, but that they can also be applied successfully to factored continuous-state MDPs (CMDPs). [sent-4, score-0.51]

3 We show how one can build an ALP-based approximation for such a model and contrast it to existing solution methods. [sent-5, score-0.096]

4 We argue that this approach offers a robust alternative for solving high dimensional continuous-state space problems. [sent-6, score-0.095]

5 The point is supported by experiments on three CMDP problems with 24-25 continuous state factors. [sent-7, score-0.16]

6 1 Introduction Markov decision processes (MDPs) offer an elegant mathematical framework for representing and solving decision problems in the presence of uncertainty. [sent-8, score-0.136]

7 While standard solution techniques, such as value and policy iteration, scale-up well in terms of the number of states, the state space of more realistic MDP problems is factorized and thus becomes exponential in the number of state components. [sent-9, score-0.474]

8 Much of the recent work in the AI community has focused on factored structured representations of ﬁnite-state MDPs and their efﬁcient solutions. [sent-10, score-0.312]

9 Approximate linear programming (ALP) has emerged recently as one of the most promising methods for solving complex factored MDPs with discrete state components. [sent-11, score-0.628]

10 The approach uses a linear combination of local feature functions to model the value function. [sent-12, score-0.119]

11 The coefﬁcients of the model are ﬁt using linear program methods. [sent-13, score-0.093]

12 In this work we show how the same set of linear programming (LP) methods can be extended also to solutions of factored continuous-state MDPs. [sent-16, score-0.486]

13 To address this problem, CMDPs and their solutions are usually approximated and solved either through state space discretization or by ﬁtting a surrogate and (often much simpler) parametric value function model. [sent-18, score-0.308]

14 2 The disadvantage of discretizations is their accu1 2 We assume that action spaces stay ﬁnite. [sent-20, score-0.105]

15 racy and the fact that higher accuracy solutions are paid for by the exponential increase in the complexity of discretizations. [sent-23, score-0.089]

16 On the other hand, parametric value-function approximations may become unstable when combined with the dynamic programming methods and least squares error [1]. [sent-24, score-0.174]

17 The ALP solution that is developed in this work eliminates the disadvantages of discretization and function approximation approaches while preserving their good properties. [sent-25, score-0.096]

18 It extends the approach of Trick and Zin [17] to factored multidimensional continuous state spaces. [sent-26, score-0.472]

19 Its main beneﬁts are good running time performance, stability of the solution, and good quality policies. [sent-27, score-0.081]

20 In this work we study factored CMDPs with state spaces restricted to . [sent-30, score-0.481]

21 We show that the solution for such a model can be approximated by an ALP with inﬁnite number of constraints that decompose locally. [sent-31, score-0.164]

22 In addition, we show that by choosing transition models based on beta densities (or their mixtures) and basis functions deﬁned by products of polynomials one obtains an ALP in which both the objective function and constraints are in closed form. [sent-32, score-0.44]

23 In order to alleviate the problem of inﬁnite number of constraints, we develop and study approximation based on constraint sampling [5, 6]. [sent-33, score-0.125]

24 We show that even under a relatively simple random constraint sampling we are able to very quickly calculate solutions of a high quality that are comparable to other existing CMDP solution methods. [sent-34, score-0.253]

25 First we review ﬁnite-state MDPs and approximate linear programming (ALP) methods developed for their factored reﬁnements. [sent-36, score-0.446]

26 Next we show how to extend the LP approximations to factored continuous-state MDPs and discuss assumptions underlying the model. [sent-37, score-0.352]

27 ¥ ¥ ¥ © § ¥ 876£ 54210)'& ¢ 3 ( ( % 3 CBA@9" ( % Given an MDP our objective is to ﬁnd the policy maximizing the inﬁnite, where is a discount factor horizon discounted reward criterion: and is a reward obtained in step . [sent-41, score-0.251]

28 Given the value function , the optimal policy is deﬁned by the action optimizing Eqn 1. [sent-44, score-0.189]

29 h i h h Vp # i b E where (1) h d i D Methods for solving an MDP include value iteration, policy iteration, and linear programming [12, 2]. [sent-45, score-0.277]

30 In the linear program (LP) formulation we solve the following problem: h ¥ £ ¦fe6y0F d # ¥ i " # i # ¥ i R i i a h # i Fb for every state are treated as variables. [sent-46, score-0.256]

31 ¥ i hg h # i b where values of h subject to: (2) # i b minimize Factorizations and LP approximations In factored MDPs, the state space is deﬁned in terms of state variables . [sent-47, score-0.659]

32 As a result, the state space becomes exponential in the number of variables. [sent-48, score-0.153]

33 Compact parameterizations of MDPs based on dynamic belief networks [7] and decomposable reward functions are routinely used to represent such MDPs more efﬁciently. [sent-49, score-0.12]

34 To address this problem Koller and Parr [9] and Guestrin at al [8] propose to use a linear model [13]: ¡ m # W m W i ¥ k k k ¥ j flffRfB ¥ i W p # i ! [sent-51, score-0.073]

35 Here are the linear coefﬁcients to be found (ﬁt) and s denote feature functions deﬁned over subsets of state variables. [sent-53, score-0.248]

36 Note that while the objective function can be computed efﬁciently, the number of constraints one has to satisfy remains exponential in the number of random variables. [sent-55, score-0.139]

37 However, only a subset of these constraints becomes active and affect the solution. [sent-56, score-0.114]

38 Guestrin et al [8] showed how to ﬁnd active constraints by solving a cost network problem. [sent-57, score-0.23]

39 An alternative approach for ﬁnding active constraints was devised by Schuurmans and Patrascu [15]. [sent-59, score-0.172]

40 The idea is to greedily search for maximally violated constraints which can be done efﬁciently by solving a linear optimization problem. [sent-61, score-0.195]

41 These constraints are included in the linear program and the process is repeated until no violated constraints are found. [sent-62, score-0.292]

42 # ¥ i y" W w W i # 6¥ w i w " # ¥ w i w " i i X S 3 Factored continuous-state MDPs Many stochastic controlled processes are more naturally deﬁned using continuous state variables. [sent-64, score-0.206]

43 In this work we focus on continuous-state MDPs (CMDPs) where state spaces are restricted to . [sent-65, score-0.169]

44 3 We assume factored representations where transition probabilities are deﬁned in terms of densities over state variable subspaces: where and denote the current and previous states. [sent-66, score-0.583]

45 Rewards are represented compactly over subsets of state variables, similarly to factored ﬁnite-state MDPs. [sent-67, score-0.466]

46 The solutions attempt to replace the value function or the optimal policy with a ﬁnite approximation. [sent-71, score-0.218]

47 A typical solution is to discretize the state space to a set of grid points and approximate value functions over such points. [sent-73, score-0.342]

48 Unfortunately, classic grid algorithms scale up exponentially with the number of state variables [4]. [sent-74, score-0.248]

49 Let be a set of grid points over the state space . [sent-75, score-0.225]

50 Then the Bellman operator can be approximated with an operator that is restricted to grid points . [sent-76, score-0.206]

51 Equation 4 applied to grid points deﬁnes a ﬁnite state MDP with states. [sent-78, score-0.225]

52 Convergence properties of the approximation scheme in Equation 4 for random or pseudo-random samples were analyzed by Rust [14]. [sent-80, score-0.092]

53 An alternative way to solve a continuous-state with an appropriate parametric MDP is to approximate the optimal value function function model [3]. [sent-82, score-0.136]

54 The parameters of the model are ﬁtted iteratively by applying one step Bellman backups to a ﬁnite set of state points arranged on a ﬁxed grid or obtained through Monte Carlo sampling. [sent-83, score-0.225]

55 2 LP approximations of CMDPs Our objective is to develop an alternative to the above solutions that is based on ALP techniques and that takes advantage of model factorizations. [sent-88, score-0.165]

56 It is easy to see that for a general continuous-state model the exact solution cannot be formulated as a linear program as was done in Equation 2 since the number of states is inﬁnite. [sent-89, score-0.159]

57 However, using linear representations of the value functions we need to optimize only over a ﬁnite number of weights combining feature functions. [sent-90, score-0.119]

58 First, the integrals may be improper and not computable. [sent-93, score-0.088]

59 In the following we give i satisfy inﬁnite number of constraints (for all values of solutions to both issues. [sent-95, score-0.153]

60 Closed form solutions Integrals in the objective function and constraints depend on the choice of transition models and basis functions. [sent-96, score-0.259]

61 We want all these integrals to be proper Riemannian integrals. [sent-97, score-0.088]

62 To this point, we have identiﬁed conjugate classes of transition models and basis functions leading to closed form expressions. [sent-99, score-0.125]

63 w Ì ¥ w 8# j i w i Ì ¥ w «# w j w i w Ì i i w Ë Ê È »r'¥ É p # È F i is the parent set of a variable under action , and deﬁne the parameters of the beta model. [sent-100, score-0.171]

64 £ CÎ# Í © § ¥ £ 876¤¢ w Ì i Ñ ¹ Ð Ï w i w d Ï © § ¥ £ T¨¦¤¢ where for we propose to use beta ¥ # «# Beta transitions. [sent-101, score-0.133]

65 To parameterize the transition model over densities or their mixtures. [sent-102, score-0.14]

66 The beta transition is deﬁned as: i Feature functions. [sent-103, score-0.208]

67 For example, assuming features with products of state , the ALP formulation becomes: w minimize where variables: º ¹ ¸ ALP solution. [sent-105, score-0.163]

68 Existing ALP methods for factored ﬁnite-state MDPs search for this subset more efﬁciently by taking advantage of local constraint decompositions and various heuristics. [sent-107, score-0.381]

69 However, at the end these methods always rely on the fact the decompositions are deﬁned on a ﬁnite state subspace. [sent-108, score-0.164]

70 Unfortunately, constraints in our model decompose over smaller but still continuous subspaces, so the existing solutions for the ﬁnite-state MDPs cannot be applied directly. [sent-109, score-0.245]

71 To avoid the problem of continuous state spaces we approximate the ALP solution using a ﬁnite set of constraints deﬁned by a ﬁnite set of state space points and actions in . [sent-111, score-0.502]

72 These state space points can be deﬁned by regular grids on state subspaces or via random sampling of states . [sent-112, score-0.38]

73 For the ﬁnite state spaces such a technique has been devised and analyzed by de Farias and Van Roy [5]. [sent-123, score-0.259]

74 4 However, despite many possible heuristic improvements, we believe that the crucial beneﬁt comes from the ALP formulation that “ﬁts” the linear model and subsequent constraint and subspace decompositions. [sent-125, score-0.129]

75 4 Experiments To test the ALP method we use a continuous-state modiﬁcation of the computer network example proposed by Guestrin et al [8]. [sent-126, score-0.071]

76 The state of a machine is represented by a number between 0 and 1 reﬂecting its processing capacity (the ability to process tasks). [sent-129, score-0.153]

77 The network performance can be controlled through activities of a human operator: the operator can attend a machine (one at time) or do nothing. [sent-130, score-0.128]

78 The transition model represents the dynamics of the computer network. [sent-135, score-0.075]

79 The model is factorized and deﬁned in terms of beta densities: , where is the current state of the th describes the previous-step state of the computers affecting . [sent-136, score-0.446]

80 We use: computer, and and for transitions when the human does not attend the computer, and and when the operator is present at the computer. [sent-137, score-0.119]

81 To deﬁne the ALP approximation, we used a linear combination of linear (for every node) and quadratic (for every link) feature functions. [sent-142, score-0.104]

82 To demonstrate the practical beneﬁt of the approach we have compared it to the grid-based approximation (Equation 4) and leastsquare value iteration approach (with the same linear value function model as in the ALP). [sent-143, score-0.122]

83 Figure 2a illustrates the average quality (value) of a policy obtained by different approximation methods while varying the number of samples. [sent-151, score-0.197]

84 The average is computed over 30 solutions obtained for 30 different sample sets and 100 different (random) start states. [sent-152, score-0.067]

85 Figure 2b illustrates the scale-up potential of the methods in terms of running times. [sent-154, score-0.073]

86 The ALP outperformed the grid-based approach (GMDP) in both the policy quality and running times. [sent-161, score-0.181]

87 The gap in the policy quality was more pronounced for smaller sample sizes. [sent-162, score-0.135]

88 This can be explained by the ability of the model to “cover” complete state space as opposed to individual grid points. [sent-163, score-0.225]

89 Better running times for the ALP can be explained by the fact that the number of free variables to be optimized is ﬁxed (they are equal to weights ), while in grid methods free variables correspond to grid samples and their number grows linearly. [sent-164, score-0.303]

90 î 5 Conclusions We have extended the application of linear program approximation methods and their beneﬁts to factored MDPs with continuous states. [sent-165, score-0.469]

91 5 We have proposed a factored transition model based on beta densities and identiﬁed feature functions that match well such a model. [sent-166, score-0.642]

92 compared to grid methods and provides a better way of “smoothing” value function to unseen examples; (4) its running time scales up better than grid methods. [sent-168, score-0.259]

93 First, the random sampling of constraints can be improved using various heuristics. [sent-171, score-0.14]

94 We report results of some heuristic solutions in a separate work [10]. [sent-172, score-0.067]

95 Second, we did not give any complexity bounds for the random constraint sampling approach. [sent-173, score-0.09]

96 On constraint sampling for the linear programming approach to approximate dynamic programming. [sent-206, score-0.257]

97 Computing factored value functions for policies in structured MDPs. [sent-228, score-0.364]

98 Heuristics reﬁnements of approximate linear programming for factored continuous-state Markov decision processes. [sent-233, score-0.48]

99 Learning and value function approximation in complex decision problems. [sent-249, score-0.094]

100 A linear programming approach to solving stochastic dynamic programs, TR, 1993. [sent-270, score-0.208]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('alp', 0.715), ('factored', 0.312), ('mdps', 0.213), ('cmdp', 0.189), ('beta', 0.133), ('state', 0.131), ('cmdps', 0.126), ('mdp', 0.122), ('farias', 0.11), ('policy', 0.1), ('grid', 0.094), ('integrals', 0.088), ('constraints', 0.086), ('gmdp', 0.084), ('transition', 0.075), ('guestrin', 0.073), ('programming', 0.07), ('solutions', 0.067), ('densities', 0.065), ('bellman', 0.064), ('rust', 0.063), ('ls', 0.062), ('lp', 0.061), ('van', 0.061), ('reward', 0.06), ('roy', 0.058), ('operator', 0.056), ('program', 0.056), ('nite', 0.055), ('patrascu', 0.055), ('sampling', 0.054), ('running', 0.046), ('solving', 0.045), ('kveton', 0.042), ('ir', 0.042), ('schuurmans', 0.042), ('approximations', 0.04), ('koller', 0.04), ('decompose', 0.04), ('action', 0.038), ('spaces', 0.038), ('solution', 0.038), ('linear', 0.037), ('nes', 0.037), ('attend', 0.037), ('fb', 0.037), ('al', 0.036), ('constraint', 0.036), ('subspaces', 0.036), ('quality', 0.035), ('approximation', 0.035), ('network', 0.035), ('analyzed', 0.034), ('decision', 0.034), ('emerged', 0.033), ('nements', 0.033), ('factorizations', 0.033), ('decompositions', 0.033), ('dynamic', 0.033), ('formulation', 0.032), ('bene', 0.031), ('parametric', 0.031), ('athena', 0.031), ('surrogate', 0.031), ('devised', 0.031), ('objective', 0.031), ('feature', 0.03), ('disadvantage', 0.029), ('xl', 0.029), ('continuous', 0.029), ('active', 0.028), ('states', 0.028), ('equation', 0.028), ('illustrates', 0.027), ('alternative', 0.027), ('approximate', 0.027), ('functions', 0.027), ('factorized', 0.027), ('violated', 0.027), ('transitions', 0.026), ('optimal', 0.026), ('de', 0.025), ('parents', 0.025), ('value', 0.025), ('subspace', 0.024), ('computers', 0.024), ('closed', 0.023), ('compact', 0.023), ('subsets', 0.023), ('stochastic', 0.023), ('offers', 0.023), ('discretization', 0.023), ('existing', 0.023), ('processes', 0.023), ('variables', 0.023), ('samples', 0.023), ('capacity', 0.022), ('actions', 0.022), ('exponential', 0.022), ('subject', 0.022)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0 116 nips-2003-Linear Program Approximations for Factored Continuous-State Markov Decision Processes

Author: Milos Hauskrecht, Branislav Kveton

2 0.13142265 78 nips-2003-Gaussian Processes in Reinforcement Learning

Author: Malte Kuss, Carl E. Rasmussen

Abstract: We exploit some useful properties of Gaussian process (GP) regression models for reinforcement learning in continuous state spaces and discrete time. We demonstrate how the GP model allows evaluation of the value function in closed form. The resulting policy iteration algorithm is demonstrated on a simple problem with a two dimensional state space. Further, we speculate that the intrinsic ability of GP models to characterise distributions of functions would allow the method to capture entire distributions over future values instead of merely their expectation, which has traditionally been the focus of much of reinforcement learning.

3 0.12044165 167 nips-2003-Robustness in Markov Decision Problems with Uncertain Transition Matrices

Author: Arnab Nilim, Laurent El Ghaoui

Abstract: Optimal solutions to Markov Decision Problems (MDPs) are very sensitive with respect to the state transition probabilities. In many practical problems, the estimation of those probabilities is far from accurate. Hence, estimation errors are limiting factors in applying MDPs to realworld problems. We propose an algorithm for solving ﬁnite-state and ﬁnite-action MDPs, where the solution is guaranteed to be robust with respect to estimation errors on the state transition probabilities. Our algorithm involves a statistically accurate yet numerically efﬁcient representation of uncertainty, via Kullback-Leibler divergence bounds. The worst-case complexity of the robust algorithm is the same as the original Bellman recursion. Hence, robustness can be added at practically no extra computing cost.

4 0.11890547 33 nips-2003-Approximate Planning in POMDPs with Macro-Actions

Author: Georgios Theocharous, Leslie P. Kaelbling

Abstract: Recent research has demonstrated that useful POMDP solutions do not require consideration of the entire belief space. We extend this idea with the notion of temporal abstraction. We present and explore a new reinforcement learning algorithm over grid-points in belief space, which uses macro-actions and Monte Carlo updates of the Q-values. We apply the algorithm to a large scale robot navigation task and demonstrate that with temporal abstraction we can consider an even smaller part of the belief space, we can learn POMDP policies faster, and we can do information gathering more efﬁciently.

5 0.11388678 34 nips-2003-Approximate Policy Iteration with a Policy Language Bias

Author: Alan Fern, Sungwook Yoon, Robert Givan

Abstract: We explore approximate policy iteration, replacing the usual costfunction learning step with a learning step in policy space. We give policy-language biases that enable solution of very large relational Markov decision processes (MDPs) that no previous technique can solve. In particular, we induce high-quality domain-speciﬁc planners for classical planning domains (both deterministic and stochastic variants) by solving such domains as extremely large MDPs. 1

6 0.1097085 158 nips-2003-Policy Search by Dynamic Programming

7 0.10691212 36 nips-2003-Auction Mechanism Design for Multi-Robot Coordination

8 0.10672526 62 nips-2003-Envelope-based Planning in Relational MDPs

9 0.088874787 20 nips-2003-All learning is Local: Multi-agent Learning in Global Reward Games

10 0.081841357 65 nips-2003-Extending Q-Learning to General Adaptive Multi-Agent Systems

11 0.077485941 42 nips-2003-Bounded Finite State Controllers

12 0.074048802 70 nips-2003-Fast Algorithms for Large-State-Space HMMs with Applications to Web Usage Analysis

13 0.073299617 55 nips-2003-Distributed Optimization in Adaptive Networks

14 0.066367865 64 nips-2003-Estimating Internal Variables and Paramters of a Learning Agent by a Particle Filter

15 0.061958823 124 nips-2003-Max-Margin Markov Networks

16 0.058957599 68 nips-2003-Eye Movements for Reward Maximization

17 0.057787996 91 nips-2003-Inferring State Sequences for Non-linear Systems with Embedded Hidden Markov Models

18 0.055982485 88 nips-2003-Image Reconstruction by Linear Programming

19 0.055965181 52 nips-2003-Different Cortico-Basal Ganglia Loops Specialize in Reward Prediction at Different Time Scales

20 0.053339168 29 nips-2003-Applying Metric-Trees to Belief-Point POMDPs

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.177), (1, 0.178), (2, -0.108), (3, 0.025), (4, -0.029), (5, 0.018), (6, -0.061), (7, -0.086), (8, 0.082), (9, 0.022), (10, -0.007), (11, -0.096), (12, 0.012), (13, 0.097), (14, -0.077), (15, 0.041), (16, 0.009), (17, -0.021), (18, -0.016), (19, -0.005), (20, -0.045), (21, 0.03), (22, -0.006), (23, -0.05), (24, -0.046), (25, -0.074), (26, 0.03), (27, -0.096), (28, 0.007), (29, -0.03), (30, -0.087), (31, -0.013), (32, -0.019), (33, 0.024), (34, 0.001), (35, 0.003), (36, -0.037), (37, -0.088), (38, -0.016), (39, -0.099), (40, 0.05), (41, -0.009), (42, 0.086), (43, 0.022), (44, -0.05), (45, 0.01), (46, -0.046), (47, 0.026), (48, 0.008), (49, -0.017)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94261324 116 nips-2003-Linear Program Approximations for Factored Continuous-State Markov Decision Processes

Author: Milos Hauskrecht, Branislav Kveton

2 0.78995997 158 nips-2003-Policy Search by Dynamic Programming

Author: J. A. Bagnell, Sham M. Kakade, Jeff G. Schneider, Andrew Y. Ng

Abstract: We consider the policy search approach to reinforcement learning. We show that if a “baseline distribution” is given (indicating roughly how often we expect a good policy to visit each state), then we can derive a policy search algorithm that terminates in a ﬁnite number of steps, and for which we can provide non-trivial performance guarantees. We also demonstrate this algorithm on several grid-world POMDPs, a planar biped walking robot, and a double-pole balancing problem. 1

3 0.76573819 34 nips-2003-Approximate Policy Iteration with a Policy Language Bias

Author: Alan Fern, Sungwook Yoon, Robert Givan

4 0.69433767 167 nips-2003-Robustness in Markov Decision Problems with Uncertain Transition Matrices

Author: Arnab Nilim, Laurent El Ghaoui

5 0.69286084 62 nips-2003-Envelope-based Planning in Relational MDPs

Author: Natalia H. Gardiol, Leslie P. Kaelbling

Abstract: A mobile robot acting in the world is faced with a large amount of sensory data and uncertainty in its action outcomes. Indeed, almost all interesting sequential decision-making domains involve large state spaces and large, stochastic action sets. We investigate a way to act intelligently as quickly as possible in domains where ﬁnding a complete policy would take a hopelessly long time. This approach, Relational Envelopebased Planning (REBP) tackles large, noisy problems along two axes. First, describing a domain as a relational MDP (instead of as an atomic or propositionally-factored MDP) allows problem structure and dynamics to be captured compactly with a small set of probabilistic, relational rules. Second, an envelope-based approach to planning lets an agent begin acting quickly within a restricted part of the full state space and to judiciously expand its envelope as resources permit. 1

6 0.66836369 38 nips-2003-Autonomous Helicopter Flight via Reinforcement Learning

7 0.59840745 78 nips-2003-Gaussian Processes in Reinforcement Learning

8 0.57649553 123 nips-2003-Markov Models for Automated ECG Interval Analysis

9 0.55885404 70 nips-2003-Fast Algorithms for Large-State-Space HMMs with Applications to Web Usage Analysis

10 0.52705675 36 nips-2003-Auction Mechanism Design for Multi-Robot Coordination

11 0.52492249 33 nips-2003-Approximate Planning in POMDPs with Macro-Actions

12 0.52346617 42 nips-2003-Bounded Finite State Controllers

13 0.50903159 75 nips-2003-From Algorithmic to Subjective Randomness

14 0.45860785 55 nips-2003-Distributed Optimization in Adaptive Networks

15 0.45734957 2 nips-2003-ARA*: Anytime A* with Provable Bounds on Sub-Optimality

16 0.43096787 68 nips-2003-Eye Movements for Reward Maximization

17 0.42394653 65 nips-2003-Extending Q-Learning to General Adaptive Multi-Agent Systems

18 0.41553453 91 nips-2003-Inferring State Sequences for Non-linear Systems with Embedded Hidden Markov Models

19 0.37710336 171 nips-2003-Semi-Definite Programming by Perceptron Learning

20 0.34788907 174 nips-2003-Semidefinite Relaxations for Approximate Inference on Graphs with Cycles

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.068), (11, 0.03), (21, 0.274), (29, 0.019), (30, 0.028), (35, 0.069), (53, 0.069), (69, 0.016), (71, 0.069), (76, 0.047), (85, 0.063), (91, 0.091), (99, 0.052)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.83298105 36 nips-2003-Auction Mechanism Design for Multi-Robot Coordination

Author: Curt Bererton, Geoffrey J. Gordon, Sebastian Thrun

Abstract: The design of cooperative multi-robot systems is a highly active research area in robotics. Two lines of research in particular have generated interest: the solution of large, weakly coupled MDPs, and the design and implementation of market architectures. We propose a new algorithm which joins together these two lines of research. For a class of coupled MDPs, our algorithm automatically designs a market architecture which causes a decentralized multi-robot system to converge to a consistent policy. We can show that this policy is the same as the one which would be produced by a particular centralized planning algorithm. We demonstrate the new algorithm on three simulation examples: multi-robot towing, multi-robot path planning with a limited fuel resource, and coordinating behaviors in a game of paint ball. 1

same-paper 2 0.77227032 116 nips-2003-Linear Program Approximations for Factored Continuous-State Markov Decision Processes

Author: Milos Hauskrecht, Branislav Kveton

3 0.55143231 158 nips-2003-Policy Search by Dynamic Programming

Author: J. A. Bagnell, Sham M. Kakade, Jeff G. Schneider, Andrew Y. Ng

4 0.54654539 78 nips-2003-Gaussian Processes in Reinforcement Learning

Author: Malte Kuss, Carl E. Rasmussen

5 0.54398561 34 nips-2003-Approximate Policy Iteration with a Policy Language Bias

Author: Alan Fern, Sungwook Yoon, Robert Givan

6 0.53939867 189 nips-2003-Tree-structured Approximations by Expectation Propagation

7 0.53152013 113 nips-2003-Learning with Local and Global Consistency

8 0.53113496 174 nips-2003-Semidefinite Relaxations for Approximate Inference on Graphs with Cycles

9 0.53034776 101 nips-2003-Large Margin Classifiers: Convex Loss, Low Noise, and Convergence Rates