nips nips2002 nips2002-82 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Nicholas Roy, Geoffrey J. Gordon
Abstract: Standard value function approaches to finding policies for Partially Observable Markov Decision Processes (POMDPs) are intractable for large models. The intractability of these algorithms is due to a great extent to their generating an optimal policy over the entire belief space. However, in real POMDP problems most belief states are unlikely, and there is a structured, low-dimensional manifold of plausible beliefs embedded in the high-dimensional belief space. We introduce a new method for solving large-scale POMDPs by taking advantage of belief space sparsity. We reduce the dimensionality of the belief space by exponential family Principal Components Analysis [1], which allows us to turn the sparse, highdimensional belief space into a compact, low-dimensional representation in terms of learned features of the belief state. We then plan directly on the low-dimensional belief features. By planning in a low-dimensional space, we can find policies for POMDPs that are orders of magnitude larger than can be handled by conventional techniques. We demonstrate the use of this algorithm on a synthetic problem and also on a mobile robot navigation task.
Reference: text
sentIndex sentText sentNum sentScore
1 The intractability of these algorithms is due to a great extent to their generating an optimal policy over the entire belief space. [sent-6, score-0.692]
2 However, in real POMDP problems most belief states are unlikely, and there is a structured, low-dimensional manifold of plausible beliefs embedded in the high-dimensional belief space. [sent-7, score-1.414]
3 We introduce a new method for solving large-scale POMDPs by taking advantage of belief space sparsity. [sent-8, score-0.485]
4 We reduce the dimensionality of the belief space by exponential family Principal Components Analysis [1], which allows us to turn the sparse, highdimensional belief space into a compact, low-dimensional representation in terms of learned features of the belief state. [sent-9, score-1.639]
5 We then plan directly on the low-dimensional belief features. [sent-10, score-0.455]
6 By planning in a low-dimensional space, we can find policies for POMDPs that are orders of magnitude larger than can be handled by conventional techniques. [sent-11, score-0.229]
7 We demonstrate the use of this algorithm on a synthetic problem and also on a mobile robot navigation task. [sent-12, score-0.605]
8 Maintaining a full value function over the high-dimensional belief space entails finding the expected reward of every possible belief under the optimal policy. [sent-14, score-1.056]
9 For example, a mobile robot navigating in an office building is extremely unlikely to ever encounter a belief about its pose that resembles a checkerboard. [sent-16, score-1.067]
10 If the execution of a POMDP is viewed as a trajectory inside the belief space, trajectories for most large, real world POMDPs lie on low-dimensional manifolds embedded in the belief space. [sent-17, score-1.098]
11 So, POMDP algorithms that compute a value function over the full belief space do a lot of unnecessary work. [sent-18, score-0.485]
12 Additionally, real POMDPs frequently have the property that the belief probability distributions themselves are sparse. [sent-19, score-0.483]
13 Intuitively, mobile robots and other real world systems have local uncertainty (which can often be multi-modal), but rarely encounter global uncertainty. [sent-21, score-0.411]
14 Figure 1 depicts a mobile robot travelling down a corridor, and illustrates the sparsity of the belief space. [sent-22, score-0.93]
15 Figure 1: An example probability distribution of a mobile robot navigating in a hallway (map dimensions are 47m x 17m, with a grid cell resolution of 10cm). [sent-23, score-0.608]
16 The white areas are free space, states where the mobile robot could be. [sent-24, score-0.539]
17 The particles are located in states where the robot’s belief over its position is non-zero. [sent-26, score-0.648]
18 Although the distribution is multi-modal, it is still relatively compact: the majority of the states contain no particles and therefore have zero probability. [sent-27, score-0.146]
19 We will take advantage of these characteristics of POMDP beliefs by using a variant of a common dimensionality reduction technique, Principal Components Analysis (PCA). [sent-28, score-0.52]
20 PCA is well-suited to dimensionality reduction where the data lies near a linear manifold in the higher-dimensional space. [sent-29, score-0.21]
21 Unfortunately, POMDP belief manifolds are rarely linear; in particular, sparse beliefs are usually very non-linear. [sent-30, score-0.898]
22 However, we can employ a link function to transform the data into a space where it does lie near a linear manifold; the algorithm which does so (while also correctly handling the transformed residual errors) is called Exponential Family PCA (E-PCA). [sent-31, score-0.147]
23 E-PCA will allow us to find manifolds with only a handful of dimensions, even for belief spaces with thousands of dimensions. [sent-32, score-0.498]
24 Our algorithm begins with a set of beliefs from a POMDP. [sent-33, score-0.361]
25 It uses these beliefs to find a decomposition of belief space into a small number of belief features. [sent-34, score-1.301]
26 Finally, it plans over a low-dimensional space by discretizing the features and using standard value iteration to find a policy over the discrete beliefs. [sent-35, score-0.327]
27 8¥ § 70¥ § 5 ¥ § ¥ § ©¨¦§¥ ¤¢¡ & ¨ &£ £ The objective of the planning problem is to find a policy that maximises the expected sum of future (possibly discounted) rewards of the agent executing the policy. [sent-45, score-0.421]
28 We suggest that a large part of the success of policy search is due to the fact that it focuses computation on relevant belief states. [sent-50, score-0.737]
29 A disadvantage of policy search, however, is that can be data-inefficient: many policy search techniques have trouble reusing sample trajectories generated from old policies. [sent-51, score-0.594]
30 Our approach focuses computation on relevant belief states, but also allows us to use all relevant training data to estimate the effect of any policy. [sent-52, score-0.455]
31 Related research has developed heuristics which reduce the belief space representation. [sent-53, score-0.485]
32 In particular, entropy-based representations for heuristic control [8] and full value-function planning [9] have been tried with some success. [sent-54, score-0.16]
33 By performing prin- cipled dimensionality reduction of the belief space, our technique should be applicable to a wider range of problems. [sent-56, score-0.614]
34 3 Dimensionality Reduction Principal Component Analysis is one of the most popular and successful forms of dimensionality reduction [10]. [sent-57, score-0.159]
35 This particular loss function assumes that the data lie near a linear manifold, and that displacements from this manifold are symmetric and have the same variance everywhere. [sent-59, score-0.18]
36 Although the optimisation procedure for E-PCA may be more complicated than that for other models such as locally-linear models, it requires many fewer samples of the belief space. [sent-68, score-0.455]
37 For real world systems such as mobile robots, large sample sets may be difficult to acquire. [sent-69, score-0.328]
38 1 Exponential family PCA Exponential family Principal Component Analysis [1] (E-PCA) varies from conventional PCA by adding a link function, in analogy to generalised linear models, and modifying the loss function appropriately. [sent-71, score-0.395]
39 As long as we choose the link and loss functions to match each other, there will exist efficient algorithms for finding and given . [sent-72, score-0.194]
40 § ¥ 2 &( We can use any convex function to generate a matching pair of link and loss functions. [sent-74, score-0.222]
41 ) where (2) & The loss functions themselves are only necessary for the analysis; our algorithm needs only the link functions and their derivatives. [sent-78, score-0.194]
42 So, we can pick the loss functions and differentiate to get the matching link functions; or, we can pick the link functions directly and not worry about the corresponding loss functions. [sent-79, score-0.446]
43 Each choice of link and loss functions results in a different model and therefore a potentially different decomposition of . [sent-80, score-0.194]
44 In our case the entries of are the number of particles from a large sample which fell into a small bin, so a Poisson loss function is most appropriate. [sent-82, score-0.234]
45 It is worth noting that using the Poisson loss for dimensionality reduction is related to Lee and Seung’s non-negative matrix factorization [14]. [sent-84, score-0.262]
46 ¦ 22 ©( ¥ ' ¨ § ( ¥ 22 ©( ' ¨ § (¦ 4 Augmented MDP Given the belief features acquired through E-PCA, it remains to learn a policy. [sent-91, score-0.483]
47 We do so by using the low-dimensional belief features to convert the POMDP into a tractable MDP. [sent-92, score-0.483]
48 Our conversion algorithm is a variant of the Augmented MDP, or Coastal Navigation algorithm [9], using belief features instead of entropy. [sent-93, score-0.483]
49 Collect sample beliefs Use E-PCA to generate low-dimensional belief features Convert low-dimensional space into discrete state space Learn belief transition probabilities , and reward function . [sent-95, score-1.606]
50 Perform value iteration on new model, using states , transition probabilities 2 §( 2 § ! [sent-96, score-0.14]
51 Table 1: Algorithm for planning in low-dimensional belief space. [sent-103, score-0.526]
52 We can collect the beliefs in step 1 using some prior policy such as a random walk or a most-likely-state heuristic. [sent-104, score-0.627]
53 The state space can be discretized in a number of ways, such as laying a grid over the belief features or using distance to the closest training beliefs to divide feature space into Voronoi regions. [sent-107, score-0.973]
54 Thrun [16] has proposed nearest-neighbor discretization in high-dimensional belief space; we propose instead to use low-dimensional feature space, where neighbors should be more closely related. [sent-108, score-0.455]
55 2 §( We can compute the model reward function easily from the reconstructed beliefs. [sent-109, score-0.142]
56 (8) §2 ( A2 ( 3 To learn the transition function, we can sample states from the reconstructed beliefs, sample observations from those states, and incorporate those observations to produce new belief states. [sent-110, score-0.687]
57 A second possibility is to use a model selection technique such as keeping a validation set of belief samples and picking the basis size with the best reconstruction quality. [sent-113, score-0.554]
58 5 Experimental Results We tested our approach on two models: a synthetic 40 state world with idealised action and observations, and a large mobile robot navigation task. [sent-115, score-0.694]
59 For each problem, we compared EPCA to conventional PCA for belief representation quality, and compared E-PCA to some heuristics for policy performance. [sent-116, score-0.758]
60 The reward is at a known position along the corridor; therefore, the agent needs to discover its orientation, move to the appropriate position, and declare it has arrived at the goal. [sent-121, score-0.261]
61 When the goal is declared the system resets (regardless of whether the agent is actually at the goal). [sent-122, score-0.138]
62 The observation and transition probabilities are given by von Mises distributions, an exponential family . [sent-124, score-0.231]
63 02 0 0 5 10 15 20 25 30 35 40 State Figure 2: Some sample beliefs from the two-dimensional problem, generated from roll-outs of the model. [sent-135, score-0.41]
64 Notice that some beliefs are bimodal, whereas others are unimodal in one half or the other of the state space. [sent-136, score-0.442]
65 Figure 2 shows some sample beliefs from this model. [sent-137, score-0.41]
66 Notice that some of the beliefs are bimodal, but some beliefs have probability mass over half of the state space only—these unimodal beliefs follow the sense_orientation action. [sent-138, score-1.194]
67 Figure 3(a) shows the reconstruction performance of both the E-PCA approach and conventional PCA, plotting average KL-divergence between the sample belief and its reconstruction against the number of bases used for the reconstruction. [sent-139, score-0.973]
68 PCA minimises squared error, while E-PCA with the Poisson loss minimises unnormalised KL-divergence, so it is no surprise that E-PCA performs better. [sent-140, score-0.215]
69 Both PCA and E-PCA reach near-zero error at 3 bases (E-PCA hits zero error, since an -basis E-PCA can fit an -parameter exponential family exactly). [sent-142, score-0.281]
70 (b) A comparison of policy performance using different numbers of bases, for 10000 trials. [sent-150, score-0.237]
71 In comparison, the Entropy heuristic does very poorly because it is unable to distinguish between a unimodal belief that has uncertainty about its orientation but not its position, and a bimodal belief that knows its position but not its orientation. [sent-155, score-1.259]
72 2 Mobile Robot Navigation Next we tried our algorithm on a mobile robot navigating in a corridor, as shown in figure 1. [sent-157, score-0.577]
73 As in the previous example, the robot can detect its position, but cannot determine its orientation until it reaches the lab door approximately halfway down the corridor. [sent-158, score-0.426]
74 The robot must navigate to within 10cm of the goal and declare the goal to receive the reward. [sent-159, score-0.396]
75 The total number of unoccupied cells is 8250, generating a POMDP with a belief space of 8250 dimensions. [sent-162, score-0.485]
76 Without loss of generality, we restrict the robot’s actions to the forward and backward motion, and similarly simplified the observation model. [sent-163, score-0.136]
77 The reward structure of the problem strongly penalised declaring the goal when the robot was far removed from the goal state. [sent-164, score-0.471]
78 4 The initial set of beliefs was collected by a mobile robot navigating in the world, and then post-processed using a noisy sensor model. [sent-168, score-0.938]
79 In this particular environment, the laser data used for localisation normally gives very good localisation results; however, this will not be true for many real world environments [17]. [sent-169, score-0.201]
80 Figure 4 shows a sample robot trajectory using the policy learned using 5 basis functions. [sent-170, score-0.601]
81 Notice that the robot drives past the goal to the lab door in order to verify its orientation before returning to the goal. [sent-171, score-0.528]
82 If the robot had started at the other end of the corridor, its orientation would have become apparent on its way to the goal. [sent-172, score-0.339]
83 Figure 5(a) shows the reconstruction performance of both the E-PCA approach and con- 4 Start State Start Distribution Robot Trajectory Goal State Figure 4: An example robot trajectory, using the policy learned using 5 basis functions. [sent-173, score-0.611]
84 Notice that the robot drives past the goal to the lab door to localise itself, before returning to the goal. [sent-176, score-0.464]
85 ventional PCA, plotting average KL-divergence between the sample belief and its reconstruction against the number of bases used for the reconstruction. [sent-177, score-0.808]
86 (b) A comparison of policy performance using E-PCA, conventional PCA and the Maximum Likelihood heuristic, for 1,000 trials. [sent-182, score-0.303]
87 Figure 5(b) shows the average policy performance for the different techniques, using 5 bases. [sent-183, score-0.237]
88 (The number of bases was chosen based on reconstruction quality of E-PCA: see [15] for further details. [sent-184, score-0.271]
89 ) Again, the E-PCA outperformed the other techniques because it was able to model its belief accurately. [sent-185, score-0.481]
90 The Maximum-Likelihood heuristic could not distinguish orientations, and therefore regularly declared the goal in the wrong place. [sent-186, score-0.17]
91 The conventional PCA algorithm failed because it could not represent its belief accurately with only a few bases. [sent-187, score-0.521]
92 6 Conclusions We have demonstrated an algorithm for planning for Partially Observable Markov Decision Processes by taking advantage of particular kinds of belief space structure that are prevalent in real world domains. [sent-188, score-0.635]
93 In particular, we have shown this approach to work well on an abstract small problem, and also on a 8250 state mobile robot navigation task which is well beyond the capability of existing value function techniques. [sent-189, score-0.617]
94 The heuristic that we chose for dimensionality reduction was simply one of reconstruction error, as in equation 5: a reduction that minimises reconstruction error should allow nearoptimal policies to be learned. [sent-190, score-0.676]
95 However, it may be possible to learn good policies with even fewer dimensions by taking advantage of transition probability structure, or cost function structure. [sent-191, score-0.136]
96 ( ' 2 § ¨ ¥ ¡ 3 A2 ( ¥ ¦ £ KL Divergence 35 (9) would lead to a dimensionality reduction that maximises predictability. [sent-194, score-0.215]
97 Similarly, 2 ¡ ( ¡ ¥ § ¥ ¨ ¥ 22 ( ( A2 ( 3 ¥ ¥ ¡ £ (10) is some heuristic cost function (such as from a previous iteration of dimensionwhere ality reduction) would lead to a reduction that maximises ability to differentiate states with different values. [sent-195, score-0.353]
98 PEGASUS: A policy search method for large MDPs and POMDPs. [sent-217, score-0.282]
99 Autonomous helicopter control using reinforcement learning policy search methods. [sent-231, score-0.282]
100 Probabilistic algorithms and the interactive museum tour-guide robot Minerva. [sent-297, score-0.275]
wordName wordTfidf (topN-words)
[('belief', 0.455), ('beliefs', 0.361), ('robot', 0.275), ('policy', 0.237), ('pomdp', 0.205), ('mobile', 0.2), ('pca', 0.186), ('bases', 0.172), ('pomdps', 0.157), ('corridor', 0.131), ('reward', 0.116), ('navigation', 0.104), ('loss', 0.103), ('mises', 0.102), ('navigating', 0.102), ('reconstruction', 0.099), ('policies', 0.092), ('link', 0.091), ('heuristic', 0.089), ('sebastian', 0.087), ('particles', 0.082), ('reduction', 0.082), ('von', 0.078), ('dimensionality', 0.077), ('observable', 0.071), ('planning', 0.071), ('conventional', 0.066), ('states', 0.064), ('orientation', 0.064), ('localisation', 0.061), ('kl', 0.059), ('exponential', 0.058), ('divergence', 0.058), ('agent', 0.057), ('maximises', 0.056), ('minimises', 0.056), ('reconstructions', 0.055), ('roy', 0.052), ('bimodal', 0.052), ('partially', 0.052), ('thrun', 0.052), ('manifold', 0.051), ('world', 0.051), ('family', 0.051), ('sample', 0.049), ('principal', 0.048), ('coastal', 0.047), ('position', 0.047), ('door', 0.047), ('search', 0.045), ('transition', 0.044), ('unimodal', 0.043), ('manifolds', 0.043), ('declare', 0.041), ('declared', 0.041), ('trajectory', 0.04), ('goal', 0.04), ('lab', 0.04), ('mdp', 0.04), ('rarely', 0.039), ('andrew', 0.039), ('robotics', 0.038), ('state', 0.038), ('pack', 0.037), ('bagnell', 0.037), ('nicholas', 0.035), ('kaelbling', 0.035), ('encounter', 0.035), ('notice', 0.034), ('actions', 0.033), ('plotting', 0.033), ('leslie', 0.033), ('generalised', 0.033), ('iteration', 0.032), ('returning', 0.031), ('drives', 0.031), ('geoffrey', 0.031), ('acting', 0.031), ('poisson', 0.031), ('thanks', 0.031), ('grid', 0.031), ('space', 0.03), ('nding', 0.03), ('robots', 0.03), ('differentiate', 0.03), ('gordon', 0.029), ('collect', 0.029), ('anthony', 0.029), ('uncertainty', 0.028), ('matching', 0.028), ('real', 0.028), ('features', 0.028), ('becker', 0.027), ('decision', 0.026), ('lie', 0.026), ('synthetic', 0.026), ('unable', 0.026), ('december', 0.026), ('reconstructed', 0.026), ('techniques', 0.026)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999964 82 nips-2002-Exponential Family PCA for Belief Compression in POMDPs
Author: Nicholas Roy, Geoffrey J. Gordon
Abstract: Standard value function approaches to finding policies for Partially Observable Markov Decision Processes (POMDPs) are intractable for large models. The intractability of these algorithms is due to a great extent to their generating an optimal policy over the entire belief space. However, in real POMDP problems most belief states are unlikely, and there is a structured, low-dimensional manifold of plausible beliefs embedded in the high-dimensional belief space. We introduce a new method for solving large-scale POMDPs by taking advantage of belief space sparsity. We reduce the dimensionality of the belief space by exponential family Principal Components Analysis [1], which allows us to turn the sparse, highdimensional belief space into a compact, low-dimensional representation in terms of learned features of the belief state. We then plan directly on the low-dimensional belief features. By planning in a low-dimensional space, we can find policies for POMDPs that are orders of magnitude larger than can be handled by conventional techniques. We demonstrate the use of this algorithm on a synthetic problem and also on a mobile robot navigation task.
2 0.34621143 96 nips-2002-Generalized² Linear² Models
Author: Geoffrey J. Gordon
Abstract: We introduce the Generalized 2 Linear 2 Model, a statistical estimator which combines features of nonlinear regression and factor analysis. A (GL)2M approximately decomposes a rectangular matrix X into a simpler representation j(g(A)h(B)). Here A and Bare low-rank matrices, while j, g, and h are link functions. (GL)2Ms include many useful models as special cases, including principal components analysis, exponential-family peA, the infomax formulation of independent components analysis, linear regression, and generalized linear models. They also include new and interesting special cases, one of which we describe below. We also present an iterative procedure which optimizes the parameters of a (GL)2M. This procedure reduces to well-known algorithms for some of the special cases listed above; for other special cases, it is new. 1
3 0.28194922 205 nips-2002-Value-Directed Compression of POMDPs
Author: Pascal Poupart, Craig Boutilier
Abstract: We examine the problem of generating state-space compressions of POMDPs in a way that minimally impacts decision quality. We analyze the impact of compressions on decision quality, observing that compressions that allow accurate policy evaluation (prediction of expected future reward) will not affect decision quality. We derive a set of sufficient conditions that ensure accurate prediction in this respect, illustrate interesting mathematical properties these confer on lossless linear compressions, and use these to derive an iterative procedure for finding good linear lossy compressions. We also elaborate on how structured representations of a POMDP can be used to find such compressions.
4 0.22404993 3 nips-2002-A Convergent Form of Approximate Policy Iteration
Author: Theodore J. Perkins, Doina Precup
Abstract: We study a new, model-free form of approximate policy iteration which uses Sarsa updates with linear state-action value function approximation for policy evaluation, and a “policy improvement operator” to generate a new policy based on the learned state-action values. We prove that if the policy improvement operator produces -soft policies and is Lipschitz continuous in the action values, with a constant that is not too large, then the approximate policy iteration algorithm converges to a unique solution from any initial policy. To our knowledge, this is the first convergence result for any form of approximate policy iteration under similar computational-resource assumptions.
5 0.21716632 94 nips-2002-Fractional Belief Propagation
Author: Wim Wiegerinck, Tom Heskes
Abstract: We consider loopy belief propagation for approximate inference in probabilistic graphical models. A limitation of the standard algorithm is that clique marginals are computed as if there were no loops in the graph. To overcome this limitation, we introduce fractional belief propagation. Fractional belief propagation is formulated in terms of a family of approximate free energies, which includes the Bethe free energy and the naive mean-field free as special cases. Using the linear response correction of the clique marginals, the scale parameters can be tuned. Simulation results illustrate the potential merits of the approach.
6 0.20830357 155 nips-2002-Nonparametric Representation of Policies and Value Functions: A Trajectory-Based Approach
7 0.20581409 189 nips-2002-Stable Fixed Points of Loopy Belief Propagation Are Local Minima of the Bethe Free Energy
8 0.19959189 169 nips-2002-Real-Time Particle Filters
9 0.1451124 13 nips-2002-A Note on the Representational Incompatibility of Function Approximation and Factored Dynamics
10 0.12206012 20 nips-2002-Adaptive Caching by Refetching
11 0.11545468 123 nips-2002-Learning Attractor Landscapes for Learning Motor Primitives
12 0.11441427 134 nips-2002-Learning to Take Concurrent Actions
13 0.11011914 33 nips-2002-Approximate Linear Programming for Average-Cost Dynamic Programming
14 0.1084324 199 nips-2002-Timing and Partial Observability in the Dopamine System
15 0.097924419 130 nips-2002-Learning in Zero-Sum Team Markov Games Using Factored Value Functions
16 0.095857725 159 nips-2002-Optimality of Reinforcement Learning Algorithms with Linear Function Approximation
17 0.090096101 137 nips-2002-Location Estimation with a Differential Update Network
18 0.085278727 128 nips-2002-Learning a Forward Model of a Reflex
19 0.078945607 61 nips-2002-Convergent Combinations of Reinforcement Learning with Linear Function Approximation
20 0.073672555 4 nips-2002-A Differential Semantics for Jointree Algorithms
topicId topicWeight
[(0, -0.219), (1, -0.028), (2, -0.425), (3, -0.024), (4, -0.02), (5, -0.006), (6, -0.027), (7, 0.208), (8, -0.118), (9, 0.113), (10, -0.288), (11, 0.059), (12, -0.172), (13, -0.068), (14, 0.098), (15, 0.081), (16, 0.042), (17, -0.029), (18, 0.051), (19, 0.065), (20, -0.047), (21, -0.109), (22, -0.03), (23, 0.088), (24, -0.032), (25, 0.008), (26, -0.224), (27, -0.13), (28, -0.051), (29, -0.005), (30, 0.118), (31, 0.078), (32, 0.011), (33, 0.011), (34, 0.006), (35, -0.104), (36, 0.086), (37, -0.012), (38, -0.126), (39, 0.053), (40, 0.07), (41, -0.006), (42, 0.071), (43, -0.065), (44, 0.024), (45, -0.052), (46, 0.003), (47, 0.079), (48, 0.056), (49, -0.002)]
simIndex simValue paperId paperTitle
same-paper 1 0.97356623 82 nips-2002-Exponential Family PCA for Belief Compression in POMDPs
Author: Nicholas Roy, Geoffrey J. Gordon
Abstract: Standard value function approaches to finding policies for Partially Observable Markov Decision Processes (POMDPs) are intractable for large models. The intractability of these algorithms is due to a great extent to their generating an optimal policy over the entire belief space. However, in real POMDP problems most belief states are unlikely, and there is a structured, low-dimensional manifold of plausible beliefs embedded in the high-dimensional belief space. We introduce a new method for solving large-scale POMDPs by taking advantage of belief space sparsity. We reduce the dimensionality of the belief space by exponential family Principal Components Analysis [1], which allows us to turn the sparse, highdimensional belief space into a compact, low-dimensional representation in terms of learned features of the belief state. We then plan directly on the low-dimensional belief features. By planning in a low-dimensional space, we can find policies for POMDPs that are orders of magnitude larger than can be handled by conventional techniques. We demonstrate the use of this algorithm on a synthetic problem and also on a mobile robot navigation task.
2 0.79968667 205 nips-2002-Value-Directed Compression of POMDPs
Author: Pascal Poupart, Craig Boutilier
Abstract: We examine the problem of generating state-space compressions of POMDPs in a way that minimally impacts decision quality. We analyze the impact of compressions on decision quality, observing that compressions that allow accurate policy evaluation (prediction of expected future reward) will not affect decision quality. We derive a set of sufficient conditions that ensure accurate prediction in this respect, illustrate interesting mathematical properties these confer on lossless linear compressions, and use these to derive an iterative procedure for finding good linear lossy compressions. We also elaborate on how structured representations of a POMDP can be used to find such compressions.
3 0.75955129 96 nips-2002-Generalized² Linear² Models
Author: Geoffrey J. Gordon
Abstract: We introduce the Generalized 2 Linear 2 Model, a statistical estimator which combines features of nonlinear regression and factor analysis. A (GL)2M approximately decomposes a rectangular matrix X into a simpler representation j(g(A)h(B)). Here A and Bare low-rank matrices, while j, g, and h are link functions. (GL)2Ms include many useful models as special cases, including principal components analysis, exponential-family peA, the infomax formulation of independent components analysis, linear regression, and generalized linear models. They also include new and interesting special cases, one of which we describe below. We also present an iterative procedure which optimizes the parameters of a (GL)2M. This procedure reduces to well-known algorithms for some of the special cases listed above; for other special cases, it is new. 1
4 0.50110489 169 nips-2002-Real-Time Particle Filters
Author: Cody Kwok, Dieter Fox, Marina Meila
Abstract: Particle filters estimate the state of dynamical systems from sensor information. In many real time applications of particle filters, however, sensor information arrives at a significantly higher rate than the update rate of the filter. The prevalent approach to dealing with such situations is to update the particle filter as often as possible and to discard sensor information that cannot be processed in time. In this paper we present real-time particle filters, which make use of all sensor information even when the filter update rate is below the update rate of the sensors. This is achieved by representing posteriors as mixtures of sample sets, where each mixture component integrates one observation arriving during a filter update. The weights of the mixture components are set so as to minimize the approximation error introduced by the mixture representation. Thereby, our approach focuses computational resources (samples) on valuable sensor information. Experiments using data collected with a mobile robot show that our approach yields strong improvements over other approaches.
Author: Eric Allender, Sanjeev Arora, Michael Kearns, Cristopher Moore, Alexander Russell
Abstract: We establish a new hardness result that shows that the difficulty of planning in factored Markov decision processes is representational rather than just computational. More precisely, we give a fixed family of factored MDPs with linear rewards whose optimal policies and value functions simply cannot be represented succinctly in any standard parametric form. Previous hardness results indicated that computing good policies from the MDP parameters was difficult, but left open the possibility of succinct function approximation for any fixed factored MDP. Our result applies even to policies which yield a polynomially poor approximation to the optimal value, and highlights interesting connections with the complexity class of Arthur-Merlin games.
6 0.45899028 94 nips-2002-Fractional Belief Propagation
7 0.4519079 3 nips-2002-A Convergent Form of Approximate Policy Iteration
8 0.45089313 189 nips-2002-Stable Fixed Points of Loopy Belief Propagation Are Local Minima of the Bethe Free Energy
9 0.44127244 20 nips-2002-Adaptive Caching by Refetching
10 0.43980101 155 nips-2002-Nonparametric Representation of Policies and Value Functions: A Trajectory-Based Approach
11 0.38347977 134 nips-2002-Learning to Take Concurrent Actions
12 0.37019944 33 nips-2002-Approximate Linear Programming for Average-Cost Dynamic Programming
13 0.35886887 130 nips-2002-Learning in Zero-Sum Team Markov Games Using Factored Value Functions
14 0.3071011 4 nips-2002-A Differential Semantics for Jointree Algorithms
15 0.30529583 123 nips-2002-Learning Attractor Landscapes for Learning Motor Primitives
16 0.2998392 133 nips-2002-Learning to Perceive Transparency from the Statistics of Natural Scenes
17 0.26848668 199 nips-2002-Timing and Partial Observability in the Dopamine System
18 0.26451862 168 nips-2002-Real-Time Monitoring of Complex Industrial Processes with Particle Filters
19 0.26271859 185 nips-2002-Speeding up the Parti-Game Algorithm
20 0.26213992 150 nips-2002-Multiple Cause Vector Quantization
topicId topicWeight
[(3, 0.014), (11, 0.021), (12, 0.171), (14, 0.05), (23, 0.059), (42, 0.095), (54, 0.129), (55, 0.051), (57, 0.023), (62, 0.042), (67, 0.015), (68, 0.031), (73, 0.021), (74, 0.083), (83, 0.012), (92, 0.03), (98, 0.071)]
simIndex simValue paperId paperTitle
same-paper 1 0.86743379 82 nips-2002-Exponential Family PCA for Belief Compression in POMDPs
Author: Nicholas Roy, Geoffrey J. Gordon
Abstract: Standard value function approaches to finding policies for Partially Observable Markov Decision Processes (POMDPs) are intractable for large models. The intractability of these algorithms is due to a great extent to their generating an optimal policy over the entire belief space. However, in real POMDP problems most belief states are unlikely, and there is a structured, low-dimensional manifold of plausible beliefs embedded in the high-dimensional belief space. We introduce a new method for solving large-scale POMDPs by taking advantage of belief space sparsity. We reduce the dimensionality of the belief space by exponential family Principal Components Analysis [1], which allows us to turn the sparse, highdimensional belief space into a compact, low-dimensional representation in terms of learned features of the belief state. We then plan directly on the low-dimensional belief features. By planning in a low-dimensional space, we can find policies for POMDPs that are orders of magnitude larger than can be handled by conventional techniques. We demonstrate the use of this algorithm on a synthetic problem and also on a mobile robot navigation task.
2 0.74233609 96 nips-2002-Generalized² Linear² Models
Author: Geoffrey J. Gordon
Abstract: We introduce the Generalized 2 Linear 2 Model, a statistical estimator which combines features of nonlinear regression and factor analysis. A (GL)2M approximately decomposes a rectangular matrix X into a simpler representation j(g(A)h(B)). Here A and Bare low-rank matrices, while j, g, and h are link functions. (GL)2Ms include many useful models as special cases, including principal components analysis, exponential-family peA, the infomax formulation of independent components analysis, linear regression, and generalized linear models. They also include new and interesting special cases, one of which we describe below. We also present an iterative procedure which optimizes the parameters of a (GL)2M. This procedure reduces to well-known algorithms for some of the special cases listed above; for other special cases, it is new. 1
3 0.73981434 205 nips-2002-Value-Directed Compression of POMDPs
Author: Pascal Poupart, Craig Boutilier
Abstract: We examine the problem of generating state-space compressions of POMDPs in a way that minimally impacts decision quality. We analyze the impact of compressions on decision quality, observing that compressions that allow accurate policy evaluation (prediction of expected future reward) will not affect decision quality. We derive a set of sufficient conditions that ensure accurate prediction in this respect, illustrate interesting mathematical properties these confer on lossless linear compressions, and use these to derive an iterative procedure for finding good linear lossy compressions. We also elaborate on how structured representations of a POMDP can be used to find such compressions.
4 0.73373049 150 nips-2002-Multiple Cause Vector Quantization
Author: David A. Ross, Richard S. Zemel
Abstract: We propose a model that can learn parts-based representations of highdimensional data. Our key assumption is that the dimensions of the data can be separated into several disjoint subsets, or factors, which take on values independently of each other. We assume each factor has a small number of discrete states, and model it using a vector quantizer. The selected states of each factor represent the multiple causes of the input. Given a set of training examples, our model learns the association of data dimensions with factors, as well as the states of each VQ. Inference and learning are carried out efficiently via variational algorithms. We present applications of this model to problems in image decomposition, collaborative filtering, and text classification.
5 0.73250896 169 nips-2002-Real-Time Particle Filters
Author: Cody Kwok, Dieter Fox, Marina Meila
Abstract: Particle filters estimate the state of dynamical systems from sensor information. In many real time applications of particle filters, however, sensor information arrives at a significantly higher rate than the update rate of the filter. The prevalent approach to dealing with such situations is to update the particle filter as often as possible and to discard sensor information that cannot be processed in time. In this paper we present real-time particle filters, which make use of all sensor information even when the filter update rate is below the update rate of the sensors. This is achieved by representing posteriors as mixtures of sample sets, where each mixture component integrates one observation arriving during a filter update. The weights of the mixture components are set so as to minimize the approximation error introduced by the mixture representation. Thereby, our approach focuses computational resources (samples) on valuable sensor information. Experiments using data collected with a mobile robot show that our approach yields strong improvements over other approaches.
6 0.72406 3 nips-2002-A Convergent Form of Approximate Policy Iteration
7 0.72307444 83 nips-2002-Extracting Relevant Structures with Side Information
8 0.72022372 52 nips-2002-Cluster Kernels for Semi-Supervised Learning
9 0.71307689 190 nips-2002-Stochastic Neighbor Embedding
10 0.71110284 21 nips-2002-Adaptive Classification by Variational Kalman Filtering
11 0.71058124 127 nips-2002-Learning Sparse Topographic Representations with Products of Student-t Distributions
12 0.70761228 137 nips-2002-Location Estimation with a Differential Update Network
13 0.70745867 74 nips-2002-Dynamic Structure Super-Resolution
14 0.70545357 2 nips-2002-A Bilinear Model for Sparse Coding
15 0.70416862 68 nips-2002-Discriminative Densities from Maximum Contrast Estimation
16 0.70323926 14 nips-2002-A Probabilistic Approach to Single Channel Blind Signal Separation
17 0.70307904 159 nips-2002-Optimality of Reinforcement Learning Algorithms with Linear Function Approximation
18 0.70084947 132 nips-2002-Learning to Detect Natural Image Boundaries Using Brightness and Texture
19 0.70073777 10 nips-2002-A Model for Learning Variance Components of Natural Images
20 0.70039916 88 nips-2002-Feature Selection and Classification on Matrix Data: From Large Margins to Small Covering Numbers