nips nips2003 nips2003-29 knowledge-graph by maker-knowledge-mining

29 nips-2003-Applying Metric-Trees to Belief-Point POMDPs


Source: pdf

Author: Joelle Pineau, Geoffrey J. Gordon, Sebastian Thrun

Abstract: Recent developments in grid-based and point-based approximation algorithms for POMDPs have greatly improved the tractability of POMDP planning. These approaches operate on sets of belief points by individually learning a value function for each point. In reality, belief points exist in a highly-structured metric simplex, but current POMDP algorithms do not exploit this property. This paper presents a new metric-tree algorithm which can be used in the context of POMDP planning to sort belief points spatially, and then perform fast value function updates over groups of points. We present results showing that this approach can reduce computation in point-based POMDP algorithms for a wide range of problems. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 These approaches operate on sets of belief points by individually learning a value function for each point. [sent-5, score-0.749]

2 In reality, belief points exist in a highly-structured metric simplex, but current POMDP algorithms do not exploit this property. [sent-6, score-0.883]

3 This paper presents a new metric-tree algorithm which can be used in the context of POMDP planning to sort belief points spatially, and then perform fast value function updates over groups of points. [sent-7, score-1.035]

4 However the practical use of POMDPs has been severely limited by the computational requirement of planning in such a rich representation. [sent-11, score-0.221]

5 POMDP planning is difficult because it involves learning action selection strategies contingent on all possible types of state uncertainty. [sent-12, score-0.275]

6 This means that whenever the robot’s world state cannot be observed, the planner must maintain a belief (namely a probability distribution over possible states) to summarize the robot’s recent history of actions taken and observations received. [sent-13, score-0.635]

7 The POMDP planner then learns an optimal future action selection for each possible belief. [sent-14, score-0.092]

8 As the planning horizon grows (linearly), so does the number of possible beliefs (exponentially), which causes the computational intractability of exact POMDP planning. [sent-15, score-0.357]

9 In recent years, a number of approximate algorithms have been proposed which overcome this issue by simply refusing to consider all possible beliefs, and instead selecting (and planning for) a small set of representative belief points. [sent-16, score-0.776]

10 During execution, should the robot encounter a belief for which it has no plan, it finds the nearest known belief point and follows its plan. [sent-17, score-1.216]

11 Such approaches, often known as grid-based [1, 4, 13], or point-based [8, 9] algorithms, have had significant success with increasingly large planning domains. [sent-18, score-0.221]

12 They formulate the plan optimization problem as a value iteration procedure, and estimate the cost/reward of applying a sequence of actions from a given belief point. [sent-19, score-0.775]

13 The value of each action sequence can be expressed as an α-vector, and a key step in many algorithms consists of evaluating many candidate α-vectors (set Γ) at each belief point (set B). [sent-20, score-0.711]

14 Recent work has shown that for these problems, one can significantly reduce the number of necessary comparisons by using appropriate metric data structures, such as KD-trees and ball-trees [3, 6, 12]. [sent-22, score-0.287]

15 This paper describes our algorithm for building and searching a metric-tree over belief points. [sent-24, score-0.649]

16 For example, when using trees for POMDPs, we move away from point-to-point search procedures for which the trees are typically used, and leverage metric constraints to prune point-to-vector comparisons. [sent-26, score-0.328]

17 We show how it is often possible to evaluate the usefulness of an α-vector over an entire sub-region of the belief simplex without explicitly evaluating it at each belief point in that sub-region. [sent-27, score-1.197]

18 While our new metric-tree approach offers significant potential for all point-based approaches, in this paper we apply it in the context of the PBVI algorithm [8], and show that it can effectively reduce computation without compromising plan quality. [sent-28, score-0.121]

19 An |S|-dimensional vector, bt , represents the agent’s belief about the state of the world at time t, and is expressed as a probability distribution over states. [sent-31, score-0.611]

20 This belief is updated after each time step—to reflect the latest pair (at−1 , zt )—using a Bayesian filter: bt (s ) := c O(s , at−1 , zt ) s∈S T (s, at−1 , s )bt−1 (s), where c is a normalizing constant. [sent-32, score-0.679]

21 The goal of POMDP planning is to find a sequence of actions maximizing the expected sum of rewards E[ t γ t R(st , at )], for all belief. [sent-33, score-0.263]

22 The corresponding value function can be formulated as a Bellman equation: V (b) = maxa∈A R(b, a) + γ b ∈B T (b, a, b )V (b ) By definition there exist an infinite number of belief points. [sent-34, score-0.62]

23 However when optimized exactly, the value function is always piecewise linear and convex in the belief (Fig. [sent-35, score-0.667]

24 Each α-vector represents an |S|-dimensional hyper-plane, and defines the value function over a bounded region of the belief: Vn (b) = maxα∈Vn s∈S α(s)b(s). [sent-41, score-0.098]

25 When performing exact value updates, the set of α-vectors can (and often does) grow exponentially with the planning horizon. [sent-42, score-0.292]

26 We leave out a full discussion of exact POMDP planning (see [5] for more) and focus instead on the much more tractable point-based approximate algorithm. [sent-44, score-0.256]

27 3 Point-based value iteration for POMDPs The main motivation behind the point-based algorithm is to exploit the fact that most beliefs are never, or very rarely, encountered, and thus resources are better spent planning for those beliefs that are most likely to be reached. [sent-45, score-0.514]

28 Point-based value iteration algorithms on the other hand apply value backups only to a finite set of pre-selected (and likely to be encountered) belief points B = {b0 , b1 , . [sent-47, score-0.918]

29 As shown in Figure 1b, by maintaining a full α-vector for each belief point, we can preserve the piecewise linearity and convexity of the value function, and define a value function over the entire belief simplex. [sent-52, score-1.333]

30 This is an approximation, as some vectors may be missed, but by appropriately selecting points, we can bound the approximation error (see [8] for details). [sent-53, score-0.084]

31 V={ α 0 ,α 1 ,α 2 ,α 3 } V={ α 0 ,α 1 ,α 3 } b2 b1 (a) b0 b3 (b) Figure 1: (a) Value iteration with exact updates. [sent-54, score-0.092]

32 First, a set of belief points is selected, and second, a series of backup operations are applied over α-vectors for that set of points. [sent-57, score-0.757]

33 In practice, steps of value iteration and steps of belief set expansion can be repeatedly interleaved to produce an anytime algorithm that can gradually trade-off computation time and solution quality. [sent-58, score-0.692]

34 The question of how to best select belief points is somewhat orthogonal to the ideas in this paper and is discussed in detail in [8]. [sent-59, score-0.713]

35 We therefore focus on describing how to do point-based value backups, before showing how this step can be significantly accelerated by the use of appropriate metric data structures. [sent-60, score-0.174]

36 It also completely ignores the highly structured nature of the belief space. [sent-70, score-0.555]

37 Belief points exist in a metric space and there is much to be gained from exploiting this property. [sent-71, score-0.294]

38 For example, given the piecewise linearity and convexity of the value function, it is more likely that two nearby points will share similar values (and policies) than points that are far away. [sent-72, score-0.504]

39 Consequently it could be much more efficient to evaluate an α-vector over sets of nearby points, rather than by exhaustively looking at all the points separately. [sent-73, score-0.202]

40 In the next section, we describe a new type of metric-tree which structures data points based on a distance metric over the belief simplex. [sent-74, score-0.856]

41 We then show how this kind of tree can be used to efficiently evaluate α-vectors over sets of belief points (or belief regions). [sent-75, score-1.479]

42 4 Metric-trees for belief spaces Metric data structures offer a way to organize large sets of data points according to distances between the points. [sent-76, score-0.749]

43 Instances of metric data structures such as KD-trees, ball-trees and metric-trees have been shown to be useful for a wide range of learning tasks (e. [sent-78, score-0.143]

44 It consists of a hierarchical tree built by recursively splitting the set of points into spatially tighter subsets, assuming only that the distance between points is a metric. [sent-82, score-0.622]

45 1 Building a metric-tree from belief points Each node η in a metric-tree is represented by its center ηc , its radius ηr , and a set of points ηB that fall within its radius. [sent-84, score-1.097]

46 To recursively construct the tree—starting with node η and building children nodes η 1 and η 2 —we first pick two candidate centers (one per child) at 1 2 1 the extremes of the η’s region: ηc = maxb∈ηD D(ηc , b), and ηc = maxb∈ηD D(ηc , b). [sent-85, score-0.392]

47 For the centers, the most common choice is the centroid of the points and this is what we use when building a tree over belief points. [sent-90, score-0.962]

48 For the distance metric, we select the max-norm: D(ηc , b) = ||ηc −b||∞ , which allows for fast searching as described in the next section. [sent-92, score-0.086]

49 While the radius determines the size of the region enclosed by each node, the choice of distance metric determines its shape (e. [sent-93, score-0.243]

50 with Euclidean distance, we would get hyper-balls of radius η r ). [sent-95, score-0.074]

51 In the case of the max-norm, each node defines an |S|-dimensional hyper-cube of length 2∗η r . [sent-96, score-0.122]

52 Figure 2 shows how the first two-levels of a tree are built, assuming a 3-state problem. [sent-97, score-0.211]

53 (d) Corresponding tree While we need to compute the center and radius for each node to build the tree, there are additional statistics which we also store about each node. [sent-104, score-0.437]

54 These are specific to using trees in the context of belief-state planning, and are necessary to evaluate α vectors over regions of the belief simplex. [sent-105, score-0.706]

55 For a given node η containing data points ηB , we compute ηmin and ηmax , the vectors containing respectively the min and max belief in each dimension: ηmin (s) = min b(s), ∀s ∈ S b∈ηB 4. [sent-106, score-1.099]

56 2 ηmax (s) = max b(s), ∀s ∈ S b∈ηB (8) Searching over sub-regions of the simplex Once the tree is built, it can be used for fast statistical queries. [sent-107, score-0.419]

57 In our case, the goal is to compute argmaxα∈Γa,z (α · b) for all belief points. [sent-108, score-0.555]

58 To do this, we consider the α vectors one at a time, and decide whether a new candidate αi is better than any of the previous vectors {α0 . [sent-109, score-0.143]

59 With the belief points organized in a tree, we can often assess this over sets of points by consulting a high-level node η, rather than by assessing this for each belief point separately. [sent-113, score-1.548]

60 There are four different situations we can encounter as we traverse the tree: first, there might be no single previous α-vector that is best for all belief points below the current node (Fig. [sent-115, score-0.887]

61 In this case we proceed to the children of the current node without performing any tests. [sent-117, score-0.185]

62 In the other three cases there is a single dominant alpha-vector at the current node; the cases are that the newest vector αi dominates it (Fig. [sent-118, score-0.113]

63 If we can prove that α i dominates or is dominated by the previous one, we can prune the search and avoid checking the current node’s children; otherwise we must check the children recursively. [sent-122, score-0.402]

64 We seek an efficient test to determine whether one vector, αi , dominates another, αj , over the belief points contained within a node. [sent-123, score-0.896]

65 The test must be conservative: it must never erroneously say that one vector dominates another. [sent-124, score-0.153]

66 It is acceptable for the test to miss some pruning opportunities—the consequence is an increase in run-time as we check more nodes than necessary—but this is best avoided if possible. [sent-125, score-0.179]

67 All positive would mean that αi dominates αj , all negative the reverse, and mixed positive and negative would mean that neither dominates the other. [sent-132, score-0.226]

68 Of course, this test renders the tree useless, since all points are checked individually. [sent-133, score-0.409]

69 Instead, we test whether ∆·b is positive or negative over a convex region R which includes all of the belief samples that belong to the current node. [sent-134, score-0.718]

70 The smaller the region, the more accurate our test will be; on the other hand, if the region is too complicated we won’t be able to carry out the test efficiently. [sent-135, score-0.142]

71 (Note that we can always test some region R by solving one linear program to find l = minb∈R b · ∆, another to find h = maxb∈R b · ∆, and testing whether l < 0 < h. [sent-136, score-0.166]

72 ) P(s2) ) (s 3 ax ηm ηmax(s1) ηmax(s2) (b) ) (s 3 (a) ηmin(s1) in P(s1) ηm ηmin(s2) (c) (d) Figure 4: Several possible convex regions over subsets of belief points, assuming a 3-state domain. [sent-138, score-0.626]

73 The simplest type is an axis-parallel bounding box (Fig. [sent-140, score-0.086]

74 We also tested the simplex defined by b ≥ ηmin and s∈S b(s) = 1 (Fig. [sent-143, score-0.087]

75 4b), as well as the simplex defined by b ≤ ηmax and s∈S b(s) = 1 (Fig. [sent-144, score-0.087]

76 The most effective test we discovered assumes R is the intersection of the bounding box ηmin ≤ b ≤ ηmax with the plane s∈S b(s) = 1 (Fig. [sent-146, score-0.126]

77 4a) we check each dimension independently, and for the simplices (Figs 4b, 4c) we check each corner exhaustively. [sent-149, score-0.128]

78 Empirical results show that checking the corners of regions (b) and (c) and taking the tightest bounds provides the fastest algorithm. [sent-156, score-0.078]

79 5 Results and Discussion We have conducted a set of experiments to test the effectiveness of the tree structure in reducing computations. [sent-158, score-0.312]

80 5(a)-(f) we show the number of B × Γ (point-to-vector) comparisons required, with and without a tree, for different numbers of belief points. [sent-163, score-0.7]

81 5(g)-(h) we show the computation time (as a function of the number of belief points) required for two of the problems. [sent-165, score-0.555]

82 The Tree results (which count comparisons on both internal and leaf nodes) were generated by embedding the tree searching procedure described in Section 4. [sent-167, score-0.443]

83 6 4 x 10 7 4 x 10 7 x 10 2 x 10 6 # comparisons 1. [sent-173, score-0.145]

84 5 5 # comparisons 8 # comparisons 2 No Tree Tree Epsilon−Tree # comparisons 10 1 0. [sent-176, score-0.435]

85 5 4 x 10 (g) SACI, |S|=12 0 0 200 400 600 # belief points 800 1000 (h) Tag, |S|=870 Figure 5: Results of PBVI algorithm with and without metric-tree. [sent-180, score-0.713]

86 These early results show that, in various proportions, the tree can cut down on the number of comparisons. [sent-181, score-0.211]

87 The -tree is particularly effective at reducing the number of comparisons in some domains (e. [sent-183, score-0.178]

88 In keeping with other metric-tree applications, our results show that computational savings increase with the number of belief points. [sent-189, score-0.555]

89 What is more surprising is to see the trees paying off with so few data points (most applications of KD-trees start seeing benefits with 1000+ data points. [sent-190, score-0.23]

90 ) This may be partially attributed to the compactness of our convex test region (Fig. [sent-191, score-0.217]

91 4d), and to the fact that we do not search on split nodes (Fig. [sent-192, score-0.079]

92 3a); however, it is most likely due to the nature of our search problem: many α vectors are accepted/rejected before visiting any leaf nodes, which is different from typical metric-tree applications. [sent-193, score-0.141]

93 We are particularly encouraged to see trees having a noticeable effect with very few data points because, in some domains, good control policies can also be extracted with few data points. [sent-194, score-0.23]

94 We notice that the effect of using trees is negligible in some larger problems (e. [sent-195, score-0.136]

95 This is likely due to the intrinsic dimensionality of each problem. [sent-200, score-0.14]

96 While this suggests that our current algorithm is not as effective in problems with intrinsic high-dimensionality, a slightly different tree structure or search procedure may well help in those cases. [sent-202, score-0.353]

97 6 Conclusion We have described a new type of metric-tree which can be used for sorting belief points and accelerating value updates in POMDPs. [sent-204, score-0.831]

98 Early experiments indicate that the tree structure, by appropriately pruning unnecessary α-vectors over large regions of the belief, can accelerate planning for a range problems. [sent-205, score-0.552]

99 3 The coffee domain is known to have an intrinsic dimensionality of 7 [10]. [sent-284, score-0.242]

100 We do not know the intrinsic dimensionality of the Tag domain, but many robot applications produce belief points that exist in sub-dimensional manifolds [11]. [sent-285, score-0.904]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('belief', 0.555), ('pomdp', 0.376), ('planning', 0.221), ('tree', 0.211), ('points', 0.158), ('tag', 0.154), ('comparisons', 0.145), ('node', 0.122), ('dominates', 0.113), ('pomdps', 0.113), ('metric', 0.107), ('coffee', 0.103), ('max', 0.091), ('saci', 0.089), ('simplex', 0.087), ('maxb', 0.078), ('pbvi', 0.078), ('radius', 0.074), ('intrinsic', 0.074), ('trees', 0.072), ('min', 0.067), ('beliefs', 0.067), ('check', 0.064), ('children', 0.063), ('centers', 0.063), ('region', 0.062), ('argmax', 0.059), ('iteration', 0.057), ('searching', 0.056), ('bt', 0.056), ('plan', 0.054), ('robot', 0.054), ('observable', 0.054), ('action', 0.054), ('box', 0.052), ('vn', 0.052), ('encounter', 0.052), ('child', 0.05), ('partially', 0.048), ('dominated', 0.047), ('secs', 0.047), ('pineau', 0.047), ('sorting', 0.047), ('appropriately', 0.045), ('piecewise', 0.045), ('anytime', 0.044), ('backup', 0.044), ('backups', 0.044), ('exhaustively', 0.044), ('actions', 0.042), ('convexity', 0.041), ('regions', 0.04), ('test', 0.04), ('nodes', 0.04), ('queries', 0.039), ('vectors', 0.039), ('search', 0.039), ('building', 0.038), ('planner', 0.038), ('prune', 0.038), ('checking', 0.038), ('bottleneck', 0.038), ('structures', 0.036), ('value', 0.036), ('attributed', 0.036), ('exact', 0.035), ('updates', 0.035), ('candidate', 0.035), ('reduce', 0.035), ('pruning', 0.035), ('negligible', 0.035), ('program', 0.034), ('dimensionality', 0.034), ('linearity', 0.034), ('gordon', 0.034), ('bounding', 0.034), ('ijcai', 0.034), ('zt', 0.034), ('horizon', 0.034), ('exploit', 0.034), ('reducing', 0.033), ('spatially', 0.032), ('offers', 0.032), ('built', 0.032), ('intelligence', 0.032), ('likely', 0.032), ('maintaining', 0.031), ('encountered', 0.031), ('leaf', 0.031), ('convex', 0.031), ('domain', 0.031), ('step', 0.031), ('applying', 0.031), ('recursively', 0.031), ('fast', 0.03), ('center', 0.03), ('whether', 0.03), ('exist', 0.029), ('problems', 0.029), ('effectiveness', 0.028)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999976 29 nips-2003-Applying Metric-Trees to Belief-Point POMDPs

Author: Joelle Pineau, Geoffrey J. Gordon, Sebastian Thrun

Abstract: Recent developments in grid-based and point-based approximation algorithms for POMDPs have greatly improved the tractability of POMDP planning. These approaches operate on sets of belief points by individually learning a value function for each point. In reality, belief points exist in a highly-structured metric simplex, but current POMDP algorithms do not exploit this property. This paper presents a new metric-tree algorithm which can be used in the context of POMDP planning to sort belief points spatially, and then perform fast value function updates over groups of points. We present results showing that this approach can reduce computation in point-based POMDP algorithms for a wide range of problems. 1

2 0.40365198 33 nips-2003-Approximate Planning in POMDPs with Macro-Actions

Author: Georgios Theocharous, Leslie P. Kaelbling

Abstract: Recent research has demonstrated that useful POMDP solutions do not require consideration of the entire belief space. We extend this idea with the notion of temporal abstraction. We present and explore a new reinforcement learning algorithm over grid-points in belief space, which uses macro-actions and Monte Carlo updates of the Q-values. We apply the algorithm to a large scale robot navigation task and demonstrate that with temporal abstraction we can consider an even smaller part of the belief space, we can learn POMDP policies faster, and we can do information gathering more efficiently.

3 0.25630203 42 nips-2003-Bounded Finite State Controllers

Author: Pascal Poupart, Craig Boutilier

Abstract: We describe a new approximation algorithm for solving partially observable MDPs. Our bounded policy iteration approach searches through the space of bounded-size, stochastic finite state controllers, combining several advantages of gradient ascent (efficiency, search through restricted controller space) and policy iteration (less vulnerability to local optima).

4 0.16824514 80 nips-2003-Generalised Propagation for Fast Fourier Transforms with Partial or Missing Data

Author: Amos J. Storkey

Abstract: Discrete Fourier transforms and other related Fourier methods have been practically implementable due to the fast Fourier transform (FFT). However there are many situations where doing fast Fourier transforms without complete data would be desirable. In this paper it is recognised that formulating the FFT algorithm as a belief network allows suitable priors to be set for the Fourier coefficients. Furthermore efficient generalised belief propagation methods between clusters of four nodes enable the Fourier coefficients to be inferred and the missing data to be estimated in near to O(n log n) time, where n is the total of the given and missing data points. This method is compared with a number of common approaches such as setting missing data to zero or to interpolation. It is tested on generated data and for a Fourier analysis of a damaged audio signal. 1

5 0.15309238 189 nips-2003-Tree-structured Approximations by Expectation Propagation

Author: Yuan Qi, Tom Minka

Abstract: Approximation structure plays an important role in inference on loopy graphs. As a tractable structure, tree approximations have been utilized in the variational method of Ghahramani & Jordan (1997) and the sequential projection method of Frey et al. (2000). However, belief propagation represents each factor of the graph with a product of single-node messages. In this paper, belief propagation is extended to represent factors with tree approximations, by way of the expectation propagation framework. That is, each factor sends a “message” to all pairs of nodes in a tree structure. The result is more accurate inferences and more frequent convergence than ordinary belief propagation, at a lower cost than variational trees or double-loop algorithms. 1

6 0.14603689 32 nips-2003-Approximate Expectation Maximization

7 0.14366706 136 nips-2003-New Algorithms for Efficient High Dimensional Non-parametric Classification

8 0.14047423 36 nips-2003-Auction Mechanism Design for Multi-Robot Coordination

9 0.11843538 35 nips-2003-Attractive People: Assembling Loose-Limbed Models using Non-parametric Belief Propagation

10 0.11649421 62 nips-2003-Envelope-based Planning in Relational MDPs

11 0.11220861 34 nips-2003-Approximate Policy Iteration with a Policy Language Bias

12 0.10905416 14 nips-2003-A Nonlinear Predictive State Representation

13 0.098512203 134 nips-2003-Near-Minimax Optimal Classification with Dyadic Classification Trees

14 0.094918169 24 nips-2003-An Iterative Improvement Procedure for Hierarchical Clustering

15 0.08901035 172 nips-2003-Semi-Supervised Learning with Trees

16 0.086980045 158 nips-2003-Policy Search by Dynamic Programming

17 0.084765874 117 nips-2003-Linear Response for Approximate Inference

18 0.081354119 169 nips-2003-Sample Propagation

19 0.080775134 70 nips-2003-Fast Algorithms for Large-State-Space HMMs with Applications to Web Usage Analysis

20 0.080661982 111 nips-2003-Learning the k in k-means


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.267), (1, 0.156), (2, -0.204), (3, 0.233), (4, -0.072), (5, -0.221), (6, -0.111), (7, 0.026), (8, 0.182), (9, 0.118), (10, 0.076), (11, -0.078), (12, 0.093), (13, 0.096), (14, 0.013), (15, 0.011), (16, 0.105), (17, 0.115), (18, 0.209), (19, 0.083), (20, 0.002), (21, 0.013), (22, 0.016), (23, 0.023), (24, 0.258), (25, 0.112), (26, -0.127), (27, 0.062), (28, 0.013), (29, 0.058), (30, 0.002), (31, 0.094), (32, 0.024), (33, -0.06), (34, -0.046), (35, 0.031), (36, 0.156), (37, 0.004), (38, 0.1), (39, 0.075), (40, 0.06), (41, 0.009), (42, -0.071), (43, 0.023), (44, -0.012), (45, 0.043), (46, 0.036), (47, 0.034), (48, -0.001), (49, -0.033)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.98160267 29 nips-2003-Applying Metric-Trees to Belief-Point POMDPs

Author: Joelle Pineau, Geoffrey J. Gordon, Sebastian Thrun

Abstract: Recent developments in grid-based and point-based approximation algorithms for POMDPs have greatly improved the tractability of POMDP planning. These approaches operate on sets of belief points by individually learning a value function for each point. In reality, belief points exist in a highly-structured metric simplex, but current POMDP algorithms do not exploit this property. This paper presents a new metric-tree algorithm which can be used in the context of POMDP planning to sort belief points spatially, and then perform fast value function updates over groups of points. We present results showing that this approach can reduce computation in point-based POMDP algorithms for a wide range of problems. 1

2 0.86057943 33 nips-2003-Approximate Planning in POMDPs with Macro-Actions

Author: Georgios Theocharous, Leslie P. Kaelbling

Abstract: Recent research has demonstrated that useful POMDP solutions do not require consideration of the entire belief space. We extend this idea with the notion of temporal abstraction. We present and explore a new reinforcement learning algorithm over grid-points in belief space, which uses macro-actions and Monte Carlo updates of the Q-values. We apply the algorithm to a large scale robot navigation task and demonstrate that with temporal abstraction we can consider an even smaller part of the belief space, we can learn POMDP policies faster, and we can do information gathering more efficiently.

3 0.78147262 42 nips-2003-Bounded Finite State Controllers

Author: Pascal Poupart, Craig Boutilier

Abstract: We describe a new approximation algorithm for solving partially observable MDPs. Our bounded policy iteration approach searches through the space of bounded-size, stochastic finite state controllers, combining several advantages of gradient ascent (efficiency, search through restricted controller space) and policy iteration (less vulnerability to local optima).

4 0.62110937 14 nips-2003-A Nonlinear Predictive State Representation

Author: Matthew R. Rudary, Satinder P. Singh

Abstract: Predictive state representations (PSRs) use predictions of a set of tests to represent the state of controlled dynamical systems. One reason why this representation is exciting as an alternative to partially observable Markov decision processes (POMDPs) is that PSR models of dynamical systems may be much more compact than POMDP models. Empirical work on PSRs to date has focused on linear PSRs, which have not allowed for compression relative to POMDPs. We introduce a new notion of tests which allows us to define a new type of PSR that is nonlinear in general and allows for exponential compression in some deterministic dynamical systems. These new tests, called e-tests, are related to the tests used by Rivest and Schapire [1] in their work with the diversity representation, but our PSR avoids some of the pitfalls of their representation—in particular, its potential to be exponentially larger than the equivalent POMDP. 1

5 0.58510172 80 nips-2003-Generalised Propagation for Fast Fourier Transforms with Partial or Missing Data

Author: Amos J. Storkey

Abstract: Discrete Fourier transforms and other related Fourier methods have been practically implementable due to the fast Fourier transform (FFT). However there are many situations where doing fast Fourier transforms without complete data would be desirable. In this paper it is recognised that formulating the FFT algorithm as a belief network allows suitable priors to be set for the Fourier coefficients. Furthermore efficient generalised belief propagation methods between clusters of four nodes enable the Fourier coefficients to be inferred and the missing data to be estimated in near to O(n log n) time, where n is the total of the given and missing data points. This method is compared with a number of common approaches such as setting missing data to zero or to interpolation. It is tested on generated data and for a Fourier analysis of a damaged audio signal. 1

6 0.56572491 32 nips-2003-Approximate Expectation Maximization

7 0.52607214 136 nips-2003-New Algorithms for Efficient High Dimensional Non-parametric Classification

8 0.4618232 2 nips-2003-ARA*: Anytime A* with Provable Bounds on Sub-Optimality

9 0.42514294 36 nips-2003-Auction Mechanism Design for Multi-Robot Coordination

10 0.40381646 189 nips-2003-Tree-structured Approximations by Expectation Propagation

11 0.3840864 24 nips-2003-An Iterative Improvement Procedure for Hierarchical Clustering

12 0.38204029 35 nips-2003-Attractive People: Assembling Loose-Limbed Models using Non-parametric Belief Propagation

13 0.36587229 187 nips-2003-Training a Quantum Neural Network

14 0.36442918 6 nips-2003-A Fast Multi-Resolution Method for Detection of Significant Spatial Disease Clusters

15 0.35556605 116 nips-2003-Linear Program Approximations for Factored Continuous-State Markov Decision Processes

16 0.34825888 74 nips-2003-Finding the M Most Probable Configurations using Loopy Belief Propagation

17 0.34492674 22 nips-2003-An Improved Scheme for Detection and Labelling in Johansson Displays

18 0.33225456 158 nips-2003-Policy Search by Dynamic Programming

19 0.32517782 70 nips-2003-Fast Algorithms for Large-State-Space HMMs with Applications to Web Usage Analysis

20 0.31273082 62 nips-2003-Envelope-based Planning in Relational MDPs


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.469), (11, 0.017), (29, 0.011), (30, 0.019), (35, 0.057), (53, 0.076), (71, 0.047), (76, 0.033), (85, 0.1), (91, 0.079), (99, 0.02)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95623404 29 nips-2003-Applying Metric-Trees to Belief-Point POMDPs

Author: Joelle Pineau, Geoffrey J. Gordon, Sebastian Thrun

Abstract: Recent developments in grid-based and point-based approximation algorithms for POMDPs have greatly improved the tractability of POMDP planning. These approaches operate on sets of belief points by individually learning a value function for each point. In reality, belief points exist in a highly-structured metric simplex, but current POMDP algorithms do not exploit this property. This paper presents a new metric-tree algorithm which can be used in the context of POMDP planning to sort belief points spatially, and then perform fast value function updates over groups of points. We present results showing that this approach can reduce computation in point-based POMDP algorithms for a wide range of problems. 1

2 0.94845498 171 nips-2003-Semi-Definite Programming by Perceptron Learning

Author: Thore Graepel, Ralf Herbrich, Andriy Kharechko, John S. Shawe-taylor

Abstract: We present a modified version of the perceptron learning algorithm (PLA) which solves semidefinite programs (SDPs) in polynomial time. The algorithm is based on the following three observations: (i) Semidefinite programs are linear programs with infinitely many (linear) constraints; (ii) every linear program can be solved by a sequence of constraint satisfaction problems with linear constraints; (iii) in general, the perceptron learning algorithm solves a constraint satisfaction problem with linear constraints in finitely many updates. Combining the PLA with a probabilistic rescaling algorithm (which, on average, increases the size of the feasable region) results in a probabilistic algorithm for solving SDPs that runs in polynomial time. We present preliminary results which demonstrate that the algorithm works, but is not competitive with state-of-the-art interior point methods. 1

3 0.9435454 53 nips-2003-Discriminating Deformable Shape Classes

Author: Salvador Ruiz-correa, Linda G. Shapiro, Marina Meila, Gabriel Berson

Abstract: We present and empirically test a novel approach for categorizing 3-D free form object shapes represented by range data . In contrast to traditional surface-signature based systems that use alignment to match specific objects, we adapted the newly introduced symbolic-signature representation to classify deformable shapes [10]. Our approach constructs an abstract description of shape classes using an ensemble of classifiers that learn object class parts and their corresponding geometrical relationships from a set of numeric and symbolic descriptors. We used our classification engine in a series of large scale discrimination experiments on two well-defined classes that share many common distinctive features. The experimental results suggest that our method outperforms traditional numeric signature-based methodologies. 1 1

4 0.87183857 22 nips-2003-An Improved Scheme for Detection and Labelling in Johansson Displays

Author: Claudio Fanti, Marzia Polito, Pietro Perona

Abstract: Consider a number of moving points, where each point is attached to a joint of the human body and projected onto an image plane. Johannson showed that humans can effortlessly detect and recognize the presence of other humans from such displays. This is true even when some of the body points are missing (e.g. because of occlusion) and unrelated clutter points are added to the display. We are interested in replicating this ability in a machine. To this end, we present a labelling and detection scheme in a probabilistic framework. Our method is based on representing the joint probability density of positions and velocities of body points with a graphical model, and using Loopy Belief Propagation to calculate a likely interpretation of the scene. Furthermore, we introduce a global variable representing the body’s centroid. Experiments on one motion-captured sequence suggest that our scheme improves on the accuracy of a previous approach based on triangulated graphical models, especially when very few parts are visible. The improvement is due both to the more general graph structure we use and, more significantly, to the introduction of the centroid variable. 1

5 0.69924003 42 nips-2003-Bounded Finite State Controllers

Author: Pascal Poupart, Craig Boutilier

Abstract: We describe a new approximation algorithm for solving partially observable MDPs. Our bounded policy iteration approach searches through the space of bounded-size, stochastic finite state controllers, combining several advantages of gradient ascent (efficiency, search through restricted controller space) and policy iteration (less vulnerability to local optima).

6 0.65302956 33 nips-2003-Approximate Planning in POMDPs with Macro-Actions

7 0.5939945 116 nips-2003-Linear Program Approximations for Factored Continuous-State Markov Decision Processes

8 0.58426905 113 nips-2003-Learning with Local and Global Consistency

9 0.57929415 78 nips-2003-Gaussian Processes in Reinforcement Learning

10 0.57903492 106 nips-2003-Learning Non-Rigid 3D Shape from 2D Motion

11 0.55550915 6 nips-2003-A Fast Multi-Resolution Method for Detection of Significant Spatial Disease Clusters

12 0.55191541 81 nips-2003-Geometric Analysis of Constrained Curves

13 0.54870504 109 nips-2003-Learning a Rare Event Detection Cascade by Direct Feature Selection

14 0.54757327 147 nips-2003-Online Learning via Global Feedback for Phrase Recognition

15 0.54744774 172 nips-2003-Semi-Supervised Learning with Trees

16 0.54371774 34 nips-2003-Approximate Policy Iteration with a Policy Language Bias

17 0.54268759 158 nips-2003-Policy Search by Dynamic Programming

18 0.53992677 48 nips-2003-Convex Methods for Transduction

19 0.53924072 138 nips-2003-Non-linear CCA and PCA by Alignment of Local Models

20 0.53695428 132 nips-2003-Multiple Instance Learning via Disjunctive Programming Boosting