nips nips2003 nips2003-116 knowledge-graph by maker-knowledge-mining

116 nips-2003-Linear Program Approximations for Factored Continuous-State Markov Decision Processes


Source: pdf

Author: Milos Hauskrecht, Branislav Kveton

Abstract: Approximate linear programming (ALP) has emerged recently as one of the most promising methods for solving complex factored MDPs with finite state spaces. In this work we show that ALP solutions are not limited only to MDPs with finite state spaces, but that they can also be applied successfully to factored continuous-state MDPs (CMDPs). We show how one can build an ALP-based approximation for such a model and contrast it to existing solution methods. We argue that this approach offers a robust alternative for solving high dimensional continuous-state space problems. The point is supported by experiments on three CMDP problems with 24-25 continuous state factors. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu   ¡ Abstract Approximate linear programming (ALP) has emerged recently as one of the most promising methods for solving complex factored MDPs with finite state spaces. [sent-3, score-0.628]

2 In this work we show that ALP solutions are not limited only to MDPs with finite state spaces, but that they can also be applied successfully to factored continuous-state MDPs (CMDPs). [sent-4, score-0.51]

3 We show how one can build an ALP-based approximation for such a model and contrast it to existing solution methods. [sent-5, score-0.096]

4 We argue that this approach offers a robust alternative for solving high dimensional continuous-state space problems. [sent-6, score-0.095]

5 The point is supported by experiments on three CMDP problems with 24-25 continuous state factors. [sent-7, score-0.16]

6 1 Introduction Markov decision processes (MDPs) offer an elegant mathematical framework for representing and solving decision problems in the presence of uncertainty. [sent-8, score-0.136]

7 While standard solution techniques, such as value and policy iteration, scale-up well in terms of the number of states, the state space of more realistic MDP problems is factorized and thus becomes exponential in the number of state components. [sent-9, score-0.474]

8 Much of the recent work in the AI community has focused on factored structured representations of finite-state MDPs and their efficient solutions. [sent-10, score-0.312]

9 Approximate linear programming (ALP) has emerged recently as one of the most promising methods for solving complex factored MDPs with discrete state components. [sent-11, score-0.628]

10 The approach uses a linear combination of local feature functions to model the value function. [sent-12, score-0.119]

11 The coefficients of the model are fit using linear program methods. [sent-13, score-0.093]

12 In this work we show how the same set of linear programming (LP) methods can be extended also to solutions of factored continuous-state MDPs. [sent-16, score-0.486]

13 To address this problem, CMDPs and their solutions are usually approximated and solved either through state space discretization or by fitting a surrogate and (often much simpler) parametric value function model. [sent-18, score-0.308]

14 2 The disadvantage of discretizations is their accu1 2 We assume that action spaces stay finite. [sent-20, score-0.105]

15 racy and the fact that higher accuracy solutions are paid for by the exponential increase in the complexity of discretizations. [sent-23, score-0.089]

16 On the other hand, parametric value-function approximations may become unstable when combined with the dynamic programming methods and least squares error [1]. [sent-24, score-0.174]

17 The ALP solution that is developed in this work eliminates the disadvantages of discretization and function approximation approaches while preserving their good properties. [sent-25, score-0.096]

18 It extends the approach of Trick and Zin [17] to factored multidimensional continuous state spaces. [sent-26, score-0.472]

19 Its main benefits are good running time performance, stability of the solution, and good quality policies. [sent-27, score-0.081]

20 In this work we study factored CMDPs with state spaces restricted to . [sent-30, score-0.481]

21 We show that the solution for such a model can be approximated by an ALP with infinite number of constraints that decompose locally. [sent-31, score-0.164]

22 In addition, we show that by choosing transition models based on beta densities (or their mixtures) and basis functions defined by products of polynomials one obtains an ALP in which both the objective function and constraints are in closed form. [sent-32, score-0.44]

23 In order to alleviate the problem of infinite number of constraints, we develop and study approximation based on constraint sampling [5, 6]. [sent-33, score-0.125]

24 We show that even under a relatively simple random constraint sampling we are able to very quickly calculate solutions of a high quality that are comparable to other existing CMDP solution methods. [sent-34, score-0.253]

25 First we review finite-state MDPs and approximate linear programming (ALP) methods developed for their factored refinements. [sent-36, score-0.446]

26 Next we show how to extend the LP approximations to factored continuous-state MDPs and discuss assumptions underlying the model. [sent-37, score-0.352]

27   ¥ ¥  ¥  © § ¥ 876£ 54210)'& ¢ 3  (  (  %   3 CBA@9"  (  % Given an MDP our objective is to find the policy maximizing the infinite, where is a discount factor horizon discounted reward criterion: and is a reward obtained in step . [sent-41, score-0.251]

28 Given the value function , the optimal policy is defined by the action optimizing Eqn 1. [sent-44, score-0.189]

29 h  i – h h – Vp # i b E where (1)  h d i D Methods for solving an MDP include value iteration, policy iteration, and linear programming [12, 2]. [sent-45, score-0.277]

30 In the linear program (LP) formulation we solve the following problem: h ¥ £ ¦fe6€y0F™… d #  ¥ i  " — #  i ‰ #  ¥ ƒ€i ˜…R ‘  i  i ˆ † a h — # i Fb for every state are treated as variables. [sent-46, score-0.256]

31  ¥ i €hg ˆ † h # i b where values of h subject to: (2) # i b minimize Factorizations and LP approximations In factored MDPs, the state space is defined in terms of state variables . [sent-47, score-0.659]

32 As a result, the state space becomes exponential in the number of variables. [sent-48, score-0.153]

33 Compact parameterizations of MDPs based on dynamic belief networks [7] and decomposable reward functions are routinely used to represent such MDPs more efficiently. [sent-49, score-0.12]

34 To address this problem Koller and Parr [9] and Guestrin at al [8] propose to use a linear model [13]:    ¡ m # W m W i   ¥ k k k ¥ j ‚flffR‚fB  ¥ i  W † p # i ! [sent-51, score-0.073]

35 Here are the linear coefficients to be found (fit) and s denote feature functions defined over subsets of state variables. [sent-53, score-0.248]

36 Note that while the objective function can be computed efficiently, the number of constraints one has to satisfy remains exponential in the number of random variables. [sent-55, score-0.139]

37 However, only a subset of these constraints becomes active and affect the solution. [sent-56, score-0.114]

38 Guestrin et al [8] showed how to find active constraints by solving a cost network problem. [sent-57, score-0.23]

39 An alternative approach for finding active constraints was devised by Schuurmans and Patrascu [15]. [sent-59, score-0.172]

40 The idea is to greedily search for maximally violated constraints which can be done efficiently by solving a linear optimization problem. [sent-61, score-0.195]

41 These constraints are included in the linear program and the process is repeated until no violated constraints are found. [sent-62, score-0.292]

42 #  ¥ i  ƒ‚€y" W  w W — i  #  6¥ š — w i  š — w " #  ƒ‚¥ š — w i  š — w " i i ˜š X ™S 3 Factored continuous-state MDPs Many stochastic controlled processes are more naturally defined using continuous state variables. [sent-64, score-0.206]

43 In this work we focus on continuous-state MDPs (CMDPs) where state spaces are restricted to . [sent-65, score-0.169]

44 3 We assume factored representations where transition probabilities are defined in terms of densities over state variable subspaces: where and denote the current and previous states. [sent-66, score-0.583]

45 Rewards are represented compactly over subsets of state variables, similarly to factored finite-state MDPs. [sent-67, score-0.466]

46 The solutions attempt to replace the value function or the optimal policy with a finite approximation. [sent-71, score-0.218]

47 A typical solution is to discretize the state space to a set of grid points and approximate value functions over such points. [sent-73, score-0.342]

48 Unfortunately, classic grid algorithms scale up exponentially with the number of state variables [4]. [sent-74, score-0.248]

49 Let be a set of grid points over the state space . [sent-75, score-0.225]

50 Then the Bellman operator can be approximated with an operator that is restricted to grid points . [sent-76, score-0.206]

51 Equation 4 applied to grid points defines a finite state MDP with states. [sent-78, score-0.225]

52 Convergence properties of the approximation scheme in Equation 4 for random or pseudo-random samples were analyzed by Rust [14]. [sent-80, score-0.092]

53 An alternative way to solve a continuous-state with an appropriate parametric MDP is to approximate the optimal value function function model [3]. [sent-82, score-0.136]

54 The parameters of the model are fitted iteratively by applying one step Bellman backups to a finite set of state points arranged on a fixed grid or obtained through Monte Carlo sampling. [sent-83, score-0.225]

55 2 LP approximations of CMDPs Our objective is to develop an alternative to the above solutions that is based on ALP techniques and that takes advantage of model factorizations. [sent-88, score-0.165]

56 It is easy to see that for a general continuous-state model the exact solution cannot be formulated as a linear program as was done in Equation 2 since the number of states is infinite. [sent-89, score-0.159]

57 However, using linear representations of the value functions we need to optimize only over a finite number of weights combining feature functions. [sent-90, score-0.119]

58 First, the integrals may be improper and not computable. [sent-93, score-0.088]

59 In the following we give  i satisfy infinite number of constraints (for all values of solutions to both issues. [sent-95, score-0.153]

60 Closed form solutions Integrals in the objective function and constraints depend on the choice of transition models and basis functions. [sent-96, score-0.259]

61 We want all these integrals to be proper Riemannian integrals. [sent-97, score-0.088]

62 To this point, we have identified conjugate classes of transition models and basis functions leading to closed form expressions. [sent-99, score-0.125]

63 w ž — Ì ¥ w ž ’8# — j i  w ‚ž — i  Ì ¥ w ž «# w — — ž j w ‚ž — i  w ž Ì ‘ — i i  w ž — ž Ÿ   Ë Ê È …’»r'ƒ¥ É p #  ž Ÿ  È F› i ‘ is the parent set of a variable under action , and define the parameters of the beta model. [sent-100, score-0.171]

64 £ CÎ# Í © § ¥ £ 876¤¢ w ž Ì — i ž  ˆ Ÿ Ñ •¹ Ð Ï w ž — i w ‚ž — d Ï © § ¥ £ T¨¦¤¢ where for we propose to use beta ¥ # «# Beta transitions. [sent-101, score-0.133]

65 To parameterize the transition model over densities or their mixtures. [sent-102, score-0.14]

66 The beta transition is defined as: i Feature functions. [sent-103, score-0.208]

67 For example, assuming features with products of state , the ALP formulation becomes:  ’Œ w minimize Œ where variables: º ¹ ¸ ALP solution. [sent-105, score-0.163]

68 Existing ALP methods for factored finite-state MDPs search for this subset more efficiently by taking advantage of local constraint decompositions and various heuristics. [sent-107, score-0.381]

69 However, at the end these methods always rely on the fact the decompositions are defined on a finite state subspace. [sent-108, score-0.164]

70 Unfortunately, constraints in our model decompose over smaller but still continuous subspaces, so the existing solutions for the finite-state MDPs cannot be applied directly. [sent-109, score-0.245]

71 To avoid the problem of continuous state spaces we approximate the ALP solution using a finite set of constraints defined by a finite set of state space points and actions in . [sent-111, score-0.502]

72 These state space points can be defined by regular grids on state subspaces or via random sampling of states . [sent-112, score-0.38]

73 For the finite state spaces such a technique has been devised and analyzed by de Farias and Van Roy [5]. [sent-123, score-0.259]

74 4 However, despite many possible heuristic improvements, we believe that the crucial benefit comes from the ALP formulation that “fits” the linear model and subsequent constraint and subspace decompositions. [sent-125, score-0.129]

75 4 Experiments To test the ALP method we use a continuous-state modification of the computer network example proposed by Guestrin et al [8]. [sent-126, score-0.071]

76 The state of a machine is represented by a number between 0 and 1 reflecting its processing capacity (the ability to process tasks). [sent-129, score-0.153]

77 The network performance can be controlled through activities of a human operator: the operator can attend a machine (one at time) or do nothing. [sent-130, score-0.128]

78 The transition model represents the dynamics of the computer network. [sent-135, score-0.075]

79 The model is factorized and defined in terms of beta densities: , where is the current state of the th describes the previous-step state of the computers affecting . [sent-136, score-0.446]

80 We use: computer, and and for transitions when the human does not attend the computer, and and when the operator is present at the computer. [sent-137, score-0.119]

81 To define the ALP approximation, we used a linear combination of linear (for every node) and quadratic (for every link) feature functions. [sent-142, score-0.104]

82 To demonstrate the practical benefit of the approach we have compared it to the grid-based approximation (Equation 4) and leastsquare value iteration approach (with the same linear value function model as in the ALP). [sent-143, score-0.122]

83 Figure 2a illustrates the average quality (value) of a policy obtained by different approximation methods while varying the number of samples. [sent-151, score-0.197]

84 The average is computed over 30 solutions obtained for 30 different sample sets and 100 different (random) start states. [sent-152, score-0.067]

85 Figure 2b illustrates the scale-up potential of the methods in terms of running times. [sent-154, score-0.073]

86 The ALP outperformed the grid-based approach (GMDP) in both the policy quality and running times. [sent-161, score-0.181]

87 The gap in the policy quality was more pronounced for smaller sample sizes. [sent-162, score-0.135]

88 This can be explained by the ability of the model to “cover” complete state space as opposed to individual grid points. [sent-163, score-0.225]

89 Better running times for the ALP can be explained by the fact that the number of free variables to be optimized is fixed (they are equal to weights ), while in grid methods free variables correspond to grid samples and their number grows linearly. [sent-164, score-0.303]

90 î 5 Conclusions We have extended the application of linear program approximation methods and their benefits to factored MDPs with continuous states. [sent-165, score-0.469]

91 5 We have proposed a factored transition model based on beta densities and identified feature functions that match well such a model. [sent-166, score-0.642]

92 compared to grid methods and provides a better way of “smoothing” value function to unseen examples; (4) its running time scales up better than grid methods. [sent-168, score-0.259]

93 First, the random sampling of constraints can be improved using various heuristics. [sent-171, score-0.14]

94 We report results of some heuristic solutions in a separate work [10]. [sent-172, score-0.067]

95 Second, we did not give any complexity bounds for the random constraint sampling approach. [sent-173, score-0.09]

96 On constraint sampling for the linear programming approach to approximate dynamic programming. [sent-206, score-0.257]

97 Computing factored value functions for policies in structured MDPs. [sent-228, score-0.364]

98 Heuristics refinements of approximate linear programming for factored continuous-state Markov decision processes. [sent-233, score-0.48]

99 Learning and value function approximation in complex decision problems. [sent-249, score-0.094]

100 A linear programming approach to solving stochastic dynamic programs, TR, 1993. [sent-270, score-0.208]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('alp', 0.715), ('factored', 0.312), ('mdps', 0.213), ('cmdp', 0.189), ('beta', 0.133), ('state', 0.131), ('cmdps', 0.126), ('mdp', 0.122), ('farias', 0.11), ('policy', 0.1), ('grid', 0.094), ('integrals', 0.088), ('constraints', 0.086), ('gmdp', 0.084), ('transition', 0.075), ('guestrin', 0.073), ('programming', 0.07), ('solutions', 0.067), ('densities', 0.065), ('bellman', 0.064), ('rust', 0.063), ('ls', 0.062), ('lp', 0.061), ('van', 0.061), ('reward', 0.06), ('roy', 0.058), ('operator', 0.056), ('program', 0.056), ('nite', 0.055), ('patrascu', 0.055), ('sampling', 0.054), ('running', 0.046), ('solving', 0.045), ('kveton', 0.042), ('ir', 0.042), ('schuurmans', 0.042), ('approximations', 0.04), ('koller', 0.04), ('decompose', 0.04), ('action', 0.038), ('spaces', 0.038), ('solution', 0.038), ('linear', 0.037), ('nes', 0.037), ('attend', 0.037), ('fb', 0.037), ('al', 0.036), ('constraint', 0.036), ('subspaces', 0.036), ('quality', 0.035), ('approximation', 0.035), ('network', 0.035), ('analyzed', 0.034), ('decision', 0.034), ('emerged', 0.033), ('nements', 0.033), ('factorizations', 0.033), ('decompositions', 0.033), ('dynamic', 0.033), ('formulation', 0.032), ('bene', 0.031), ('parametric', 0.031), ('athena', 0.031), ('surrogate', 0.031), ('devised', 0.031), ('objective', 0.031), ('feature', 0.03), ('disadvantage', 0.029), ('xl', 0.029), ('continuous', 0.029), ('active', 0.028), ('states', 0.028), ('equation', 0.028), ('illustrates', 0.027), ('alternative', 0.027), ('approximate', 0.027), ('functions', 0.027), ('factorized', 0.027), ('violated', 0.027), ('transitions', 0.026), ('optimal', 0.026), ('de', 0.025), ('parents', 0.025), ('value', 0.025), ('subspace', 0.024), ('computers', 0.024), ('closed', 0.023), ('compact', 0.023), ('subsets', 0.023), ('stochastic', 0.023), ('offers', 0.023), ('discretization', 0.023), ('existing', 0.023), ('processes', 0.023), ('variables', 0.023), ('samples', 0.023), ('capacity', 0.022), ('actions', 0.022), ('exponential', 0.022), ('subject', 0.022)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0 116 nips-2003-Linear Program Approximations for Factored Continuous-State Markov Decision Processes

Author: Milos Hauskrecht, Branislav Kveton

Abstract: Approximate linear programming (ALP) has emerged recently as one of the most promising methods for solving complex factored MDPs with finite state spaces. In this work we show that ALP solutions are not limited only to MDPs with finite state spaces, but that they can also be applied successfully to factored continuous-state MDPs (CMDPs). We show how one can build an ALP-based approximation for such a model and contrast it to existing solution methods. We argue that this approach offers a robust alternative for solving high dimensional continuous-state space problems. The point is supported by experiments on three CMDP problems with 24-25 continuous state factors. 1

2 0.13142265 78 nips-2003-Gaussian Processes in Reinforcement Learning

Author: Malte Kuss, Carl E. Rasmussen

Abstract: We exploit some useful properties of Gaussian process (GP) regression models for reinforcement learning in continuous state spaces and discrete time. We demonstrate how the GP model allows evaluation of the value function in closed form. The resulting policy iteration algorithm is demonstrated on a simple problem with a two dimensional state space. Further, we speculate that the intrinsic ability of GP models to characterise distributions of functions would allow the method to capture entire distributions over future values instead of merely their expectation, which has traditionally been the focus of much of reinforcement learning.

3 0.12044165 167 nips-2003-Robustness in Markov Decision Problems with Uncertain Transition Matrices

Author: Arnab Nilim, Laurent El Ghaoui

Abstract: Optimal solutions to Markov Decision Problems (MDPs) are very sensitive with respect to the state transition probabilities. In many practical problems, the estimation of those probabilities is far from accurate. Hence, estimation errors are limiting factors in applying MDPs to realworld problems. We propose an algorithm for solving finite-state and finite-action MDPs, where the solution is guaranteed to be robust with respect to estimation errors on the state transition probabilities. Our algorithm involves a statistically accurate yet numerically efficient representation of uncertainty, via Kullback-Leibler divergence bounds. The worst-case complexity of the robust algorithm is the same as the original Bellman recursion. Hence, robustness can be added at practically no extra computing cost.

4 0.11890547 33 nips-2003-Approximate Planning in POMDPs with Macro-Actions

Author: Georgios Theocharous, Leslie P. Kaelbling

Abstract: Recent research has demonstrated that useful POMDP solutions do not require consideration of the entire belief space. We extend this idea with the notion of temporal abstraction. We present and explore a new reinforcement learning algorithm over grid-points in belief space, which uses macro-actions and Monte Carlo updates of the Q-values. We apply the algorithm to a large scale robot navigation task and demonstrate that with temporal abstraction we can consider an even smaller part of the belief space, we can learn POMDP policies faster, and we can do information gathering more efficiently.

5 0.11388678 34 nips-2003-Approximate Policy Iteration with a Policy Language Bias

Author: Alan Fern, Sungwook Yoon, Robert Givan

Abstract: We explore approximate policy iteration, replacing the usual costfunction learning step with a learning step in policy space. We give policy-language biases that enable solution of very large relational Markov decision processes (MDPs) that no previous technique can solve. In particular, we induce high-quality domain-specific planners for classical planning domains (both deterministic and stochastic variants) by solving such domains as extremely large MDPs. 1

6 0.1097085 158 nips-2003-Policy Search by Dynamic Programming

7 0.10691212 36 nips-2003-Auction Mechanism Design for Multi-Robot Coordination

8 0.10672526 62 nips-2003-Envelope-based Planning in Relational MDPs

9 0.088874787 20 nips-2003-All learning is Local: Multi-agent Learning in Global Reward Games

10 0.081841357 65 nips-2003-Extending Q-Learning to General Adaptive Multi-Agent Systems

11 0.077485941 42 nips-2003-Bounded Finite State Controllers

12 0.074048802 70 nips-2003-Fast Algorithms for Large-State-Space HMMs with Applications to Web Usage Analysis

13 0.073299617 55 nips-2003-Distributed Optimization in Adaptive Networks

14 0.066367865 64 nips-2003-Estimating Internal Variables and Paramters of a Learning Agent by a Particle Filter

15 0.061958823 124 nips-2003-Max-Margin Markov Networks

16 0.058957599 68 nips-2003-Eye Movements for Reward Maximization

17 0.057787996 91 nips-2003-Inferring State Sequences for Non-linear Systems with Embedded Hidden Markov Models

18 0.055982485 88 nips-2003-Image Reconstruction by Linear Programming

19 0.055965181 52 nips-2003-Different Cortico-Basal Ganglia Loops Specialize in Reward Prediction at Different Time Scales

20 0.053339168 29 nips-2003-Applying Metric-Trees to Belief-Point POMDPs


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.177), (1, 0.178), (2, -0.108), (3, 0.025), (4, -0.029), (5, 0.018), (6, -0.061), (7, -0.086), (8, 0.082), (9, 0.022), (10, -0.007), (11, -0.096), (12, 0.012), (13, 0.097), (14, -0.077), (15, 0.041), (16, 0.009), (17, -0.021), (18, -0.016), (19, -0.005), (20, -0.045), (21, 0.03), (22, -0.006), (23, -0.05), (24, -0.046), (25, -0.074), (26, 0.03), (27, -0.096), (28, 0.007), (29, -0.03), (30, -0.087), (31, -0.013), (32, -0.019), (33, 0.024), (34, 0.001), (35, 0.003), (36, -0.037), (37, -0.088), (38, -0.016), (39, -0.099), (40, 0.05), (41, -0.009), (42, 0.086), (43, 0.022), (44, -0.05), (45, 0.01), (46, -0.046), (47, 0.026), (48, 0.008), (49, -0.017)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94261324 116 nips-2003-Linear Program Approximations for Factored Continuous-State Markov Decision Processes

Author: Milos Hauskrecht, Branislav Kveton

Abstract: Approximate linear programming (ALP) has emerged recently as one of the most promising methods for solving complex factored MDPs with finite state spaces. In this work we show that ALP solutions are not limited only to MDPs with finite state spaces, but that they can also be applied successfully to factored continuous-state MDPs (CMDPs). We show how one can build an ALP-based approximation for such a model and contrast it to existing solution methods. We argue that this approach offers a robust alternative for solving high dimensional continuous-state space problems. The point is supported by experiments on three CMDP problems with 24-25 continuous state factors. 1

2 0.78995997 158 nips-2003-Policy Search by Dynamic Programming

Author: J. A. Bagnell, Sham M. Kakade, Jeff G. Schneider, Andrew Y. Ng

Abstract: We consider the policy search approach to reinforcement learning. We show that if a “baseline distribution” is given (indicating roughly how often we expect a good policy to visit each state), then we can derive a policy search algorithm that terminates in a finite number of steps, and for which we can provide non-trivial performance guarantees. We also demonstrate this algorithm on several grid-world POMDPs, a planar biped walking robot, and a double-pole balancing problem. 1

3 0.76573819 34 nips-2003-Approximate Policy Iteration with a Policy Language Bias

Author: Alan Fern, Sungwook Yoon, Robert Givan

Abstract: We explore approximate policy iteration, replacing the usual costfunction learning step with a learning step in policy space. We give policy-language biases that enable solution of very large relational Markov decision processes (MDPs) that no previous technique can solve. In particular, we induce high-quality domain-specific planners for classical planning domains (both deterministic and stochastic variants) by solving such domains as extremely large MDPs. 1

4 0.69433767 167 nips-2003-Robustness in Markov Decision Problems with Uncertain Transition Matrices

Author: Arnab Nilim, Laurent El Ghaoui

Abstract: Optimal solutions to Markov Decision Problems (MDPs) are very sensitive with respect to the state transition probabilities. In many practical problems, the estimation of those probabilities is far from accurate. Hence, estimation errors are limiting factors in applying MDPs to realworld problems. We propose an algorithm for solving finite-state and finite-action MDPs, where the solution is guaranteed to be robust with respect to estimation errors on the state transition probabilities. Our algorithm involves a statistically accurate yet numerically efficient representation of uncertainty, via Kullback-Leibler divergence bounds. The worst-case complexity of the robust algorithm is the same as the original Bellman recursion. Hence, robustness can be added at practically no extra computing cost.

5 0.69286084 62 nips-2003-Envelope-based Planning in Relational MDPs

Author: Natalia H. Gardiol, Leslie P. Kaelbling

Abstract: A mobile robot acting in the world is faced with a large amount of sensory data and uncertainty in its action outcomes. Indeed, almost all interesting sequential decision-making domains involve large state spaces and large, stochastic action sets. We investigate a way to act intelligently as quickly as possible in domains where finding a complete policy would take a hopelessly long time. This approach, Relational Envelopebased Planning (REBP) tackles large, noisy problems along two axes. First, describing a domain as a relational MDP (instead of as an atomic or propositionally-factored MDP) allows problem structure and dynamics to be captured compactly with a small set of probabilistic, relational rules. Second, an envelope-based approach to planning lets an agent begin acting quickly within a restricted part of the full state space and to judiciously expand its envelope as resources permit. 1

6 0.66836369 38 nips-2003-Autonomous Helicopter Flight via Reinforcement Learning

7 0.59840745 78 nips-2003-Gaussian Processes in Reinforcement Learning

8 0.57649553 123 nips-2003-Markov Models for Automated ECG Interval Analysis

9 0.55885404 70 nips-2003-Fast Algorithms for Large-State-Space HMMs with Applications to Web Usage Analysis

10 0.52705675 36 nips-2003-Auction Mechanism Design for Multi-Robot Coordination

11 0.52492249 33 nips-2003-Approximate Planning in POMDPs with Macro-Actions

12 0.52346617 42 nips-2003-Bounded Finite State Controllers

13 0.50903159 75 nips-2003-From Algorithmic to Subjective Randomness

14 0.45860785 55 nips-2003-Distributed Optimization in Adaptive Networks

15 0.45734957 2 nips-2003-ARA*: Anytime A* with Provable Bounds on Sub-Optimality

16 0.43096787 68 nips-2003-Eye Movements for Reward Maximization

17 0.42394653 65 nips-2003-Extending Q-Learning to General Adaptive Multi-Agent Systems

18 0.41553453 91 nips-2003-Inferring State Sequences for Non-linear Systems with Embedded Hidden Markov Models

19 0.37710336 171 nips-2003-Semi-Definite Programming by Perceptron Learning

20 0.34788907 174 nips-2003-Semidefinite Relaxations for Approximate Inference on Graphs with Cycles


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.068), (11, 0.03), (21, 0.274), (29, 0.019), (30, 0.028), (35, 0.069), (53, 0.069), (69, 0.016), (71, 0.069), (76, 0.047), (85, 0.063), (91, 0.091), (99, 0.052)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.83298105 36 nips-2003-Auction Mechanism Design for Multi-Robot Coordination

Author: Curt Bererton, Geoffrey J. Gordon, Sebastian Thrun

Abstract: The design of cooperative multi-robot systems is a highly active research area in robotics. Two lines of research in particular have generated interest: the solution of large, weakly coupled MDPs, and the design and implementation of market architectures. We propose a new algorithm which joins together these two lines of research. For a class of coupled MDPs, our algorithm automatically designs a market architecture which causes a decentralized multi-robot system to converge to a consistent policy. We can show that this policy is the same as the one which would be produced by a particular centralized planning algorithm. We demonstrate the new algorithm on three simulation examples: multi-robot towing, multi-robot path planning with a limited fuel resource, and coordinating behaviors in a game of paint ball. 1

same-paper 2 0.77227032 116 nips-2003-Linear Program Approximations for Factored Continuous-State Markov Decision Processes

Author: Milos Hauskrecht, Branislav Kveton

Abstract: Approximate linear programming (ALP) has emerged recently as one of the most promising methods for solving complex factored MDPs with finite state spaces. In this work we show that ALP solutions are not limited only to MDPs with finite state spaces, but that they can also be applied successfully to factored continuous-state MDPs (CMDPs). We show how one can build an ALP-based approximation for such a model and contrast it to existing solution methods. We argue that this approach offers a robust alternative for solving high dimensional continuous-state space problems. The point is supported by experiments on three CMDP problems with 24-25 continuous state factors. 1

3 0.55143231 158 nips-2003-Policy Search by Dynamic Programming

Author: J. A. Bagnell, Sham M. Kakade, Jeff G. Schneider, Andrew Y. Ng

Abstract: We consider the policy search approach to reinforcement learning. We show that if a “baseline distribution” is given (indicating roughly how often we expect a good policy to visit each state), then we can derive a policy search algorithm that terminates in a finite number of steps, and for which we can provide non-trivial performance guarantees. We also demonstrate this algorithm on several grid-world POMDPs, a planar biped walking robot, and a double-pole balancing problem. 1

4 0.54654539 78 nips-2003-Gaussian Processes in Reinforcement Learning

Author: Malte Kuss, Carl E. Rasmussen

Abstract: We exploit some useful properties of Gaussian process (GP) regression models for reinforcement learning in continuous state spaces and discrete time. We demonstrate how the GP model allows evaluation of the value function in closed form. The resulting policy iteration algorithm is demonstrated on a simple problem with a two dimensional state space. Further, we speculate that the intrinsic ability of GP models to characterise distributions of functions would allow the method to capture entire distributions over future values instead of merely their expectation, which has traditionally been the focus of much of reinforcement learning.

5 0.54398561 34 nips-2003-Approximate Policy Iteration with a Policy Language Bias

Author: Alan Fern, Sungwook Yoon, Robert Givan

Abstract: We explore approximate policy iteration, replacing the usual costfunction learning step with a learning step in policy space. We give policy-language biases that enable solution of very large relational Markov decision processes (MDPs) that no previous technique can solve. In particular, we induce high-quality domain-specific planners for classical planning domains (both deterministic and stochastic variants) by solving such domains as extremely large MDPs. 1

6 0.53939867 189 nips-2003-Tree-structured Approximations by Expectation Propagation

7 0.53152013 113 nips-2003-Learning with Local and Global Consistency

8 0.53113496 174 nips-2003-Semidefinite Relaxations for Approximate Inference on Graphs with Cycles

9 0.53034776 101 nips-2003-Large Margin Classifiers: Convex Loss, Low Noise, and Convergence Rates

10 0.52874374 20 nips-2003-All learning is Local: Multi-agent Learning in Global Reward Games

11 0.52841747 50 nips-2003-Denoising and Untangling Graphs Using Degree Priors

12 0.52436388 103 nips-2003-Learning Bounds for a Generalized Family of Bayesian Posterior Distributions

13 0.52321124 191 nips-2003-Unsupervised Context Sensitive Language Acquisition from a Large Corpus

14 0.52227855 109 nips-2003-Learning a Rare Event Detection Cascade by Direct Feature Selection

15 0.52196473 91 nips-2003-Inferring State Sequences for Non-linear Systems with Embedded Hidden Markov Models

16 0.52119619 122 nips-2003-Margin Maximizing Loss Functions

17 0.52110541 33 nips-2003-Approximate Planning in POMDPs with Macro-Actions

18 0.52095389 81 nips-2003-Geometric Analysis of Constrained Curves

19 0.52079207 55 nips-2003-Distributed Optimization in Adaptive Networks

20 0.52070796 107 nips-2003-Learning Spectral Clustering