nips nips2002 nips2002-205 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Pascal Poupart, Craig Boutilier
Abstract: We examine the problem of generating state-space compressions of POMDPs in a way that minimally impacts decision quality. We analyze the impact of compressions on decision quality, observing that compressions that allow accurate policy evaluation (prediction of expected future reward) will not affect decision quality. We derive a set of sufficient conditions that ensure accurate prediction in this respect, illustrate interesting mathematical properties these confer on lossless linear compressions, and use these to derive an iterative procedure for finding good linear lossy compressions. We also elaborate on how structured representations of a POMDP can be used to find such compressions.
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract We examine the problem of generating state-space compressions of POMDPs in a way that minimally impacts decision quality. [sent-5, score-0.498]
2 We analyze the impact of compressions on decision quality, observing that compressions that allow accurate policy evaluation (prediction of expected future reward) will not affect decision quality. [sent-6, score-1.174]
3 We derive a set of sufficient conditions that ensure accurate prediction in this respect, illustrate interesting mathematical properties these confer on lossless linear compressions, and use these to derive an iterative procedure for finding good linear lossy compressions. [sent-7, score-0.535]
4 1 Introduction Partially observable Markov decision processes (POMDPs) provide a rich framework for modeling a wide range of sequential decision problems in the presence of uncertainty. [sent-9, score-0.184]
5 Unfortunately, the application of POMDPs to real world problems remains limited due to the intractability of current solution algorithms, in large part because of the exponential growth of state spaces with the number of relevant variables. [sent-10, score-0.174]
6 Ideally, we would like to mitigate this source of intractability by compressing the state space as much as possible without compromising decision quality. [sent-11, score-0.335]
7 Our aim in solving a POMDP is to maximize future reward based on our current beliefs about the world. [sent-12, score-0.178]
8 By compressing its belief state, an agent may lose relevant information, which results in suboptimal policy choice. [sent-13, score-0.382]
9 Thus an important aspect of belief state compression lies in distinguishing relevant information from that which can be safely discarded. [sent-14, score-0.54]
10 For example, approaches using bounded memory [8, 10] and state aggregation—either dynamic [2] or static [5, 9]—can be viewed in this light. [sent-16, score-0.189]
11 In this paper, we study the effect of static state-space compression on decision quality. [sent-17, score-0.349]
12 We first characterize lossless compressions—those that do not lead to any error in expected value—by deriving a set of conditions that guarantee decision quality will not be impaired. [sent-18, score-0.42]
13 This analysis leads to algorithms that find good compression schemes, including methods that exploit structure in the POMDP dynamics (as exhibited, e. [sent-20, score-0.32]
14 We derive a (somewhat loose) upper bound on the loss in decision quality when the conditions for lossless compression (of some required dimensionality) are not met. [sent-24, score-0.675]
15 Finally we propose a simple optimization program to find linear lossy compressions that minimizes this bound, and describe how structured POMDP models can be used to implement this scheme efficiently. [sent-25, score-0.79]
16 £ ¢ ¡§ #)( 7 5 3 1 86420 ( ¡ #¨' ¡§ ©¨¦ ¡ ¥ £ ¡ § ' ¥ ¦ ¡ ¥ ¡ Policies and value functions for POMDPs are typically defined over belief space, where a belief state is a distribution over capturing an agent’s knowledge about the current state of the world. [sent-31, score-0.574]
17 Note that a belief state and reward function can be viewed respectively as -dimensional row and column vectors. [sent-34, score-0.444]
18 bD X Wa`T Y ¦ § ¡ ¡§ V X WU¦ VT F ¥ ©E S©¨R©$QP)GE #§ 9 ¡§ ' ¡ £ ¡§ ¦ ¡§ 9 I H F D ¡ B C¥ A£@ Solving a POMDP consists of finding an optimal policy mapping belief states to actions. [sent-38, score-0.28]
19 of a policy is the expected sum of discounted rewards and is defined as: The value (1) A number of techniques [11] based on value iteration or policy iteration can be used to compute optimal or approximately optimal policies for POMDPs. [sent-39, score-0.345]
20 2 Conditional Independence and Additive Separability When our state space is defined by a set of variables, POMDPs can often be represented concisely in a factored way by specifying the transition, observation and reward functions using a dynamic Bayesian network (DBN). [sent-41, score-0.326]
21 These representations can often be exploited to solve POMDPs without state space enumeration [2]. [sent-43, score-0.225]
22 Recently, Pfeffer [13] showed that conditional independence combined with some form of additive separability can enable efficient inference in many DBNs. [sent-44, score-0.173]
23 Pfeffer shows how additive separability in the CPTs of a DBN can be exploited to identify families of self-sufficient variables. [sent-48, score-0.266]
24 A self-sufficient family consists of a set of subsets of variables such that the marginals of each subset are sufficient to predict the marginals of the same subsets at the next time step. [sent-49, score-0.334]
25 Hence, if we require the probabilities of a few variables, and can identify a self-sufficient family containing those variables, then we need only compute marginals over this family when monitoring belief state. [sent-50, score-0.341]
26 Tπ f b gπ ~ b f b’ ~ R gπ ~ b’ ~π T ~ b f b ~ R R b’ ~ R R r’ ~π T Tπ ~ b’ ~ R R r a) Tπ Tπ R r b) r’ Figure 1: a) Functional flow of a POMDP (dotted arrows) and a compressed POMDP (solid arrows) where the next belief state is accurately predicted. [sent-58, score-0.654]
27 b) Functional flow of a POMDP (dotted arrows) and a compressed POMDP (solid arrows) where the next compressed belief state is accurately predicted. [sent-59, score-0.956]
28 A Krylov subspace is the smallest subspace that contains and is invariant with respect to . [sent-66, score-0.338]
29 A basis for a Krylov subspace can easily be generated by repeatedly multiplying by (i. [sent-67, score-0.171]
30 ¨¡¨¡£¡¢ ¢ ¢ ¢ In a DBN, families of self-sufficient variables naturally correspond to invariant subspaces. [sent-77, score-0.167]
31 Hence, when a family of variables is self-sufficient, the subspace . [sent-82, score-0.199]
32 of linear functions defined over the truth values of that family is invariant w. [sent-83, score-0.185]
33 1 X WQ¦ VT 3 Lossless Compressions If a compression of the state space of a POMDP allows us to accurately evaluate all policies, we say the compression is lossless, since we have sufficient information to select the optimal policy. [sent-86, score-0.619]
34 1 Let be a compression function that maps each belief state into some lower dimensional compressed belief state (see Figure 1(a)). [sent-89, score-1.105]
35 We desire a compression such that corresponds to the smallest statistic sufficient for accurately predicting the current reward as well as the next belief state (since we can accurately predict all following rewards from ). [sent-93, score-0.881]
36 Figure 1(b) represents the transition function that illustrates the resulting functional flow, where directly maps one compressed belief state to the next compressed belief state. [sent-95, score-1.137]
37 3, we can evaluate a policy POMDP dynamics as follows: using the compressed rE5 q D rsq X V y e 59 § X &© Cr 5 ¦ § r 5 q Uvt d59 25 ( ed59 § r 5 q w 3 § D ¡ r E5 q Once is found, we can recover the original value function and Eq. [sent-98, score-0.443]
38 1 Linear compressions 1 1 We say is a linear compression when is a linear function, representable by some matrix . [sent-110, score-0.707]
39 In this case, the approximate transition and reward functions and must also be linear (assuming Eq. [sent-111, score-0.206]
40 3 can be rewritten in matrix notation: 5( X T 5 ¦ V £ V V ¥ Q¥ X WT 5 ¦ ¥ D ¥ X WT ¦ ¥ ¥ ¥ ¥ ¥ D ( 5( and (5) In a linear compression, can be viewed as effecting a change of basis for the value function, with the columns of defining a subspace in which the compressed value function lies. [sent-114, score-0.558]
41 Furthermore, the rank of indicates the dimensionality of the compressed state space. [sent-115, score-0.444]
42 5 is satisfied, the columns of span a subspace that contains and that is invariant with respect to each . [sent-117, score-0.241]
43 5 says that a sufficient statistic must be able to “predict itself” at the next time step (hence the subspace is invariant), and that it must predict the current reward (hence the subspace contains ). [sent-119, score-0.381]
44 It also requires that the columns of each the columns of , so is invariant with respect to each . [sent-125, score-0.167]
45 ¦ ¥ ¥ ( ¥ Thus, the best linear lossless compression corresponds to the smallest invariant subspace that contains . [sent-127, score-0.747]
46 Using this fact we can easily compute the best lossless linear compression by iteratively multiplying by each until the Krylov basis is obtained. [sent-129, score-0.58]
47 Finally, we can solve the POMDP in the compressed state space by using and . [sent-132, score-0.448]
48 The practical need to avoid state enumeration is a key motivation for POMDP compression. [sent-140, score-0.162]
49 However, the complexity of the search for a good compression must also be independent of the state space size. [sent-141, score-0.348]
50 We consider several ways in which a compact POMDP specification can be exploited to construct a linear compression without state enumeration. [sent-143, score-0.455]
51 If transition, observation and reward functions are represented using DBNs and structured CPTs (e. [sent-145, score-0.207]
52 , decision trees or algebraic decision diagrams), then the matrix operations required by the Krylov algorithm can be implemented effectively [1, 7]. [sent-147, score-0.184]
53 Although this approach can offer substantial savings, the DTs or ADDs that represent the basis vectors of the Krylov subspace may still be much larger than the dimensionality of the compressed state space and the original DBN specifications. [sent-148, score-0.587]
54 Alternatively, families of self-sufficient variables corresponding to invariant subspaces can be identified by exploiting additive separability. [sent-149, score-0.284]
55 The corresponding subspace is invariant and necessarily contains . [sent-151, score-0.206]
56 Assumto each ing a tractable self-sufficient family is found, a compact basis can then be constructed by using all indicator functions for each subset of variables in this family (e. [sent-152, score-0.211]
57 This approach allows us to quickly identify a good compression by a simple inspection of the additive separability structure of the DBN. [sent-155, score-0.449]
58 The resulting compression is not necessarily optimal; however, it is the best among those corresponding to some such family. [sent-156, score-0.229]
59 It is and reward of the compressed POMDP can important to note that the dynamics be constructed easily (i. [sent-157, score-0.446]
60 Pfeffer [13] notes that observations tend to reduce the amount of additive separability present in a DBN, thereby increasing the size of self-sufficient families. [sent-160, score-0.173]
61 Therefore, we should point out that lossless compressions of POMDPs that exploit self-sufficiency and offer an acceptable degree of compression may not exist. [sent-161, score-0.953]
62 Hence lossy compressions are likely to be required in many cases. [sent-162, score-0.591]
63 ' 6 ( X WT ¦ V ¥ 5( X WT 5 ¦ V Finally, we ask whether the existence of lossless compressions requires some form of structure in the POMDP. [sent-164, score-0.681]
64 Suppose a transition matrix and a reward vector are chosen uniformly at random. [sent-166, score-0.17]
65 The odds that falls are essentially zero since there are infinitely more into a proper invariant subspace of vectors in the full space than in all the proper invariant subspaces put together. [sent-167, score-0.336]
66 We have described how context-specific independence and additive separability can be exploited to identify some linear lossless compressions. [sent-169, score-0.523]
67 However they do not guarantee that the optimal compression will be found, so it remains an open question whether other types of structure could be used in similar ways. [sent-170, score-0.277]
68 ¦ VT ( X Wh¦ VT 4 Lossy compressions Since we cannot generally find effective lossless compressions, we also consider lossy compressions. [sent-172, score-0.844]
69 We propose a simple approach to find linear lossy compressions that “almost satisfy” Eq. [sent-173, score-0.627]
70 Table 1 outlines a simple optimization program to find lossy compressions that minimize a weighted sum of the max-norm residual errors, and , in Eq. [sent-175, score-0.685]
71 (7) Table 1: Optimization program for linear lossy compressions ¥ X Wh5 ¦ 5 ( VT should be satisfied. [sent-180, score-0.691]
72 However, it is possible to solve this optimization program by solving a series of LPs (linear programs). [sent-184, score-0.157]
73 1 Max-norm Error Bound ¥ The quality of the compression resulting from this program depends on the weights and . [sent-188, score-0.342]
74 Ideally, we would like to set and in a way that represents the loss in decision quality associated with compressing the state space. [sent-189, score-0.366]
75 If we can bound the error of evaluating any policy using the compressed POMDP, then the difference in expected total return between the policy that is best w. [sent-190, score-0.526]
76 the compressed POMDP and the true optimal policy is at most . [sent-193, score-0.414]
77 Here is typically much greater than because of the factor , which means that has a much higher impact on the loss in decision quality than . [sent-203, score-0.232]
78 Intuitively, this makes sense because the error in predicting the next compressed belief state may compound over time, so we should set significantly higher than . [sent-204, score-0.653]
79 2 Structured Compressions As with lossless compressions, solving the program in Table 1 may be intractable due to the size of . [sent-207, score-0.353]
80 3 We describe several techniques that allow one to exploit problem structure to find an acceptable lossy compression without state space enumeration. [sent-209, score-0.62]
81 ¥ % 5 %% % % )H %§ ¥ One approach is related to the basis function model proposed in [4], in which we restrict to be functions over some small set of factors (subsets of state variables. [sent-210, score-0.153]
82 To deal with the constraints, we can exploit the structure imposed on and the DBN structure to reduce the number of constraints to something (in the many cases) polynomial in the number of state variables. [sent-214, score-0.227]
83 % )H %§ ¥ By searching within a restricted set of structured compressions and by exploiting DBN structure it is possible to efficiently solve the optimization program in Table 1. [sent-217, score-0.644]
84 ( ) be subsets of variables, and be a function over and the Let compressed state space . [sent-222, score-0.475]
85 Note that the number of variand the compressed state space . [sent-227, score-0.421]
86 ables is dependent only on the size of the subsets Furthermore, this form of additive separability lends itself to the same compact constraint generation techniques mentioned above. [sent-228, score-0.262]
87 Finally, the (discrete) search for decent subsets can be interleaved with optimization of the compression mapping for fixed sets . [sent-229, score-0.313]
88 2) are necessary to assess the degree of compression achievable with large, realistic problems. [sent-238, score-0.229]
89 The 32-dimensional belief space can be compressed without any loss to a 7-dimensional subspace using the Krylov subspace algorithm described in Section 3. [sent-239, score-0.74]
90 the optimal policy) when policy computed using varying degrees of compression is executing for stages. [sent-246, score-0.341]
91 The loss is sampled from 100,000 random initial belief states, averaged over 10 runs. [sent-247, score-0.22]
92 In contrast, the average loss of a random policy is about (or ). [sent-249, score-0.164]
93 ¥ 0 7 § ¨ 00 SS 7 § ¨ 7 ¦ 00 SW7 © ' § 6 Concluding Remarks We have presented an in-depth theoretical analysis of the impact of static compressions on decision quality. [sent-250, score-0.565]
94 We derived a set of conditions that guarantee compression does not impair decision quality, leading to interesting mathematical properties for linear compressions that allow us to exploit structure in the POMDP dynamics. [sent-251, score-0.851]
95 3 7 Figure 2: Average loss for various lossy compressions optimization program to search for good lossy compressions. [sent-255, score-0.922]
96 Preliminary results suggest that significant compression can be achieved with little impact on decision quality. [sent-256, score-0.36]
97 It would be interesting to carry out a similar analysis in terms of information theory (instead of linear algebra) since the problem of identifying information in a belief state relevant to predicting future rewards can be modeled naturally using information theoretic concepts [6]. [sent-258, score-0.494]
98 Dynamic compressions could also be analyzed since, as we solve a POMDP, the set of reasonable policies shrinks, allowing greater compression. [sent-259, score-0.5]
99 Computing optimal policies for partially observable decision processes using compact representations. [sent-269, score-0.194]
100 Hidden state and reinforcement learning with instance-based state identification. [sent-338, score-0.238]
wordName wordTfidf (topN-words)
[('compressions', 0.406), ('pomdp', 0.371), ('compressed', 0.302), ('lossless', 0.253), ('compression', 0.229), ('krylov', 0.195), ('dbn', 0.186), ('lossy', 0.185), ('wt', 0.185), ('pomdps', 0.173), ('belief', 0.168), ('vt', 0.15), ('state', 0.119), ('separability', 0.115), ('reward', 0.115), ('policy', 0.112), ('subspace', 0.109), ('invariant', 0.097), ('decision', 0.092), ('boutilier', 0.072), ('factored', 0.069), ('structured', 0.069), ('policies', 0.067), ('program', 0.064), ('poupart', 0.058), ('additive', 0.058), ('transition', 0.055), ('compressing', 0.054), ('subsets', 0.054), ('rewards', 0.054), ('family', 0.052), ('loss', 0.052), ('pfeffer', 0.051), ('quality', 0.049), ('guestrin', 0.046), ('marginals', 0.044), ('enumeration', 0.043), ('viewed', 0.042), ('accurately', 0.042), ('predicting', 0.041), ('toronto', 0.04), ('exploit', 0.04), ('arrows', 0.039), ('impact', 0.039), ('cient', 0.039), ('givan', 0.039), ('mitigate', 0.039), ('patrascu', 0.039), ('uvud', 0.039), ('variables', 0.038), ('wa', 0.036), ('exploited', 0.036), ('linear', 0.036), ('solving', 0.036), ('columns', 0.035), ('compact', 0.035), ('adjusts', 0.034), ('basis', 0.034), ('seattle', 0.033), ('subspaces', 0.033), ('families', 0.032), ('separable', 0.032), ('lp', 0.032), ('cpts', 0.031), ('intractability', 0.031), ('edmonton', 0.031), ('suf', 0.03), ('optimization', 0.03), ('dynamics', 0.029), ('bottleneck', 0.029), ('sx', 0.029), ('multiplying', 0.028), ('ow', 0.028), ('static', 0.028), ('solve', 0.027), ('wh', 0.027), ('future', 0.027), ('entries', 0.026), ('ensures', 0.026), ('schuurmans', 0.026), ('eq', 0.026), ('guarantee', 0.026), ('exploiting', 0.026), ('identify', 0.025), ('predict', 0.025), ('vancouver', 0.025), ('acceptable', 0.025), ('multiply', 0.025), ('cr', 0.025), ('iterative', 0.025), ('concepts', 0.025), ('relevant', 0.024), ('constraints', 0.024), ('agent', 0.024), ('dimensionality', 0.023), ('smallest', 0.023), ('next', 0.023), ('table', 0.023), ('observation', 0.023), ('structure', 0.022)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999982 205 nips-2002-Value-Directed Compression of POMDPs
Author: Pascal Poupart, Craig Boutilier
Abstract: We examine the problem of generating state-space compressions of POMDPs in a way that minimally impacts decision quality. We analyze the impact of compressions on decision quality, observing that compressions that allow accurate policy evaluation (prediction of expected future reward) will not affect decision quality. We derive a set of sufficient conditions that ensure accurate prediction in this respect, illustrate interesting mathematical properties these confer on lossless linear compressions, and use these to derive an iterative procedure for finding good linear lossy compressions. We also elaborate on how structured representations of a POMDP can be used to find such compressions.
2 0.28194922 82 nips-2002-Exponential Family PCA for Belief Compression in POMDPs
Author: Nicholas Roy, Geoffrey J. Gordon
Abstract: Standard value function approaches to finding policies for Partially Observable Markov Decision Processes (POMDPs) are intractable for large models. The intractability of these algorithms is due to a great extent to their generating an optimal policy over the entire belief space. However, in real POMDP problems most belief states are unlikely, and there is a structured, low-dimensional manifold of plausible beliefs embedded in the high-dimensional belief space. We introduce a new method for solving large-scale POMDPs by taking advantage of belief space sparsity. We reduce the dimensionality of the belief space by exponential family Principal Components Analysis [1], which allows us to turn the sparse, highdimensional belief space into a compact, low-dimensional representation in terms of learned features of the belief state. We then plan directly on the low-dimensional belief features. By planning in a low-dimensional space, we can find policies for POMDPs that are orders of magnitude larger than can be handled by conventional techniques. We demonstrate the use of this algorithm on a synthetic problem and also on a mobile robot navigation task.
3 0.13431652 3 nips-2002-A Convergent Form of Approximate Policy Iteration
Author: Theodore J. Perkins, Doina Precup
Abstract: We study a new, model-free form of approximate policy iteration which uses Sarsa updates with linear state-action value function approximation for policy evaluation, and a “policy improvement operator” to generate a new policy based on the learned state-action values. We prove that if the policy improvement operator produces -soft policies and is Lipschitz continuous in the action values, with a constant that is not too large, then the approximate policy iteration algorithm converges to a unique solution from any initial policy. To our knowledge, this is the first convergence result for any form of approximate policy iteration under similar computational-resource assumptions.
4 0.12473553 13 nips-2002-A Note on the Representational Incompatibility of Function Approximation and Factored Dynamics
Author: Eric Allender, Sanjeev Arora, Michael Kearns, Cristopher Moore, Alexander Russell
Abstract: We establish a new hardness result that shows that the difficulty of planning in factored Markov decision processes is representational rather than just computational. More precisely, we give a fixed family of factored MDPs with linear rewards whose optimal policies and value functions simply cannot be represented succinctly in any standard parametric form. Previous hardness results indicated that computing good policies from the MDP parameters was difficult, but left open the possibility of succinct function approximation for any fixed factored MDP. Our result applies even to policies which yield a polynomially poor approximation to the optimal value, and highlights interesting connections with the complexity class of Arthur-Merlin games.
5 0.10629837 199 nips-2002-Timing and Partial Observability in the Dopamine System
Author: Nathaniel D. Daw, Aaron C. Courville, David S. Touretzky
Abstract: According to a series of influential models, dopamine (DA) neurons signal reward prediction error using a temporal-difference (TD) algorithm. We address a problem not convincingly solved in these accounts: how to maintain a representation of cues that predict delayed consequences. Our new model uses a TD rule grounded in partially observable semi-Markov processes, a formalism that captures two largely neglected features of DA experiments: hidden state and temporal variability. Previous models predicted rewards using a tapped delay line representation of sensory inputs; we replace this with a more active process of inference about the underlying state of the world. The DA system can then learn to map these inferred states to reward predictions using TD. The new model can explain previously vexing data on the responses of DA neurons in the face of temporal variability. By combining statistical model-based learning with a physiologically grounded TD theory, it also brings into contact with physiology some insights about behavior that had previously been confined to more abstract psychological models.
6 0.10133452 94 nips-2002-Fractional Belief Propagation
7 0.098683752 96 nips-2002-Generalized² Linear² Models
8 0.091843784 155 nips-2002-Nonparametric Representation of Policies and Value Functions: A Trajectory-Based Approach
9 0.083607189 130 nips-2002-Learning in Zero-Sum Team Markov Games Using Factored Value Functions
10 0.083042122 33 nips-2002-Approximate Linear Programming for Average-Cost Dynamic Programming
11 0.081852682 189 nips-2002-Stable Fixed Points of Loopy Belief Propagation Are Local Minima of the Bethe Free Energy
12 0.075987458 159 nips-2002-Optimality of Reinforcement Learning Algorithms with Linear Function Approximation
13 0.070723213 134 nips-2002-Learning to Take Concurrent Actions
14 0.070441015 61 nips-2002-Convergent Combinations of Reinforcement Learning with Linear Function Approximation
15 0.067111365 166 nips-2002-Rate Distortion Function in the Spin Glass State: A Toy Model
16 0.065543242 20 nips-2002-Adaptive Caching by Refetching
17 0.056693479 165 nips-2002-Ranking with Large Margin Principle: Two Approaches
18 0.051185865 19 nips-2002-Adapting Codes and Embeddings for Polychotomies
19 0.049094819 169 nips-2002-Real-Time Particle Filters
20 0.047853358 194 nips-2002-The Decision List Machine
topicId topicWeight
[(0, -0.171), (1, -0.029), (2, -0.264), (3, -0.049), (4, -0.004), (5, -0.035), (6, -0.026), (7, 0.068), (8, -0.096), (9, 0.022), (10, -0.107), (11, -0.034), (12, -0.125), (13, -0.012), (14, 0.051), (15, -0.009), (16, 0.043), (17, -0.017), (18, 0.01), (19, 0.002), (20, 0.003), (21, -0.132), (22, -0.078), (23, -0.015), (24, -0.041), (25, -0.003), (26, -0.138), (27, -0.107), (28, 0.029), (29, 0.085), (30, 0.125), (31, 0.06), (32, 0.089), (33, 0.059), (34, 0.016), (35, -0.149), (36, 0.023), (37, -0.005), (38, -0.14), (39, -0.041), (40, 0.031), (41, 0.015), (42, 0.07), (43, -0.045), (44, -0.032), (45, -0.1), (46, -0.013), (47, 0.097), (48, 0.078), (49, -0.067)]
simIndex simValue paperId paperTitle
same-paper 1 0.94648862 205 nips-2002-Value-Directed Compression of POMDPs
Author: Pascal Poupart, Craig Boutilier
Abstract: We examine the problem of generating state-space compressions of POMDPs in a way that minimally impacts decision quality. We analyze the impact of compressions on decision quality, observing that compressions that allow accurate policy evaluation (prediction of expected future reward) will not affect decision quality. We derive a set of sufficient conditions that ensure accurate prediction in this respect, illustrate interesting mathematical properties these confer on lossless linear compressions, and use these to derive an iterative procedure for finding good linear lossy compressions. We also elaborate on how structured representations of a POMDP can be used to find such compressions.
2 0.82498646 82 nips-2002-Exponential Family PCA for Belief Compression in POMDPs
Author: Nicholas Roy, Geoffrey J. Gordon
Abstract: Standard value function approaches to finding policies for Partially Observable Markov Decision Processes (POMDPs) are intractable for large models. The intractability of these algorithms is due to a great extent to their generating an optimal policy over the entire belief space. However, in real POMDP problems most belief states are unlikely, and there is a structured, low-dimensional manifold of plausible beliefs embedded in the high-dimensional belief space. We introduce a new method for solving large-scale POMDPs by taking advantage of belief space sparsity. We reduce the dimensionality of the belief space by exponential family Principal Components Analysis [1], which allows us to turn the sparse, highdimensional belief space into a compact, low-dimensional representation in terms of learned features of the belief state. We then plan directly on the low-dimensional belief features. By planning in a low-dimensional space, we can find policies for POMDPs that are orders of magnitude larger than can be handled by conventional techniques. We demonstrate the use of this algorithm on a synthetic problem and also on a mobile robot navigation task.
3 0.6329481 96 nips-2002-Generalized² Linear² Models
Author: Geoffrey J. Gordon
Abstract: We introduce the Generalized 2 Linear 2 Model, a statistical estimator which combines features of nonlinear regression and factor analysis. A (GL)2M approximately decomposes a rectangular matrix X into a simpler representation j(g(A)h(B)). Here A and Bare low-rank matrices, while j, g, and h are link functions. (GL)2Ms include many useful models as special cases, including principal components analysis, exponential-family peA, the infomax formulation of independent components analysis, linear regression, and generalized linear models. They also include new and interesting special cases, one of which we describe below. We also present an iterative procedure which optimizes the parameters of a (GL)2M. This procedure reduces to well-known algorithms for some of the special cases listed above; for other special cases, it is new. 1
4 0.61197793 13 nips-2002-A Note on the Representational Incompatibility of Function Approximation and Factored Dynamics
Author: Eric Allender, Sanjeev Arora, Michael Kearns, Cristopher Moore, Alexander Russell
Abstract: We establish a new hardness result that shows that the difficulty of planning in factored Markov decision processes is representational rather than just computational. More precisely, we give a fixed family of factored MDPs with linear rewards whose optimal policies and value functions simply cannot be represented succinctly in any standard parametric form. Previous hardness results indicated that computing good policies from the MDP parameters was difficult, but left open the possibility of succinct function approximation for any fixed factored MDP. Our result applies even to policies which yield a polynomially poor approximation to the optimal value, and highlights interesting connections with the complexity class of Arthur-Merlin games.
5 0.48730972 3 nips-2002-A Convergent Form of Approximate Policy Iteration
Author: Theodore J. Perkins, Doina Precup
Abstract: We study a new, model-free form of approximate policy iteration which uses Sarsa updates with linear state-action value function approximation for policy evaluation, and a “policy improvement operator” to generate a new policy based on the learned state-action values. We prove that if the policy improvement operator produces -soft policies and is Lipschitz continuous in the action values, with a constant that is not too large, then the approximate policy iteration algorithm converges to a unique solution from any initial policy. To our knowledge, this is the first convergence result for any form of approximate policy iteration under similar computational-resource assumptions.
6 0.46320361 20 nips-2002-Adaptive Caching by Refetching
7 0.44526842 134 nips-2002-Learning to Take Concurrent Actions
8 0.42405978 199 nips-2002-Timing and Partial Observability in the Dopamine System
9 0.41614422 130 nips-2002-Learning in Zero-Sum Team Markov Games Using Factored Value Functions
10 0.39394867 33 nips-2002-Approximate Linear Programming for Average-Cost Dynamic Programming
11 0.3352038 150 nips-2002-Multiple Cause Vector Quantization
12 0.3330062 94 nips-2002-Fractional Belief Propagation
13 0.32526064 189 nips-2002-Stable Fixed Points of Loopy Belief Propagation Are Local Minima of the Bethe Free Energy
14 0.31843379 4 nips-2002-A Differential Semantics for Jointree Algorithms
15 0.31674954 107 nips-2002-Identity Uncertainty and Citation Matching
16 0.29798031 155 nips-2002-Nonparametric Representation of Policies and Value Functions: A Trajectory-Based Approach
17 0.26979327 83 nips-2002-Extracting Relevant Structures with Side Information
18 0.26384398 71 nips-2002-Dopamine Induced Bistability Enhances Signal Processing in Spiny Neurons
19 0.25764871 169 nips-2002-Real-Time Particle Filters
20 0.2453813 104 nips-2002-How the Poverty of the Stimulus Solves the Poverty of the Stimulus
topicId topicWeight
[(11, 0.02), (14, 0.019), (23, 0.036), (42, 0.078), (54, 0.14), (55, 0.049), (57, 0.026), (62, 0.274), (67, 0.027), (68, 0.025), (73, 0.021), (74, 0.069), (92, 0.051), (98, 0.088)]
simIndex simValue paperId paperTitle
same-paper 1 0.81406385 205 nips-2002-Value-Directed Compression of POMDPs
Author: Pascal Poupart, Craig Boutilier
Abstract: We examine the problem of generating state-space compressions of POMDPs in a way that minimally impacts decision quality. We analyze the impact of compressions on decision quality, observing that compressions that allow accurate policy evaluation (prediction of expected future reward) will not affect decision quality. We derive a set of sufficient conditions that ensure accurate prediction in this respect, illustrate interesting mathematical properties these confer on lossless linear compressions, and use these to derive an iterative procedure for finding good linear lossy compressions. We also elaborate on how structured representations of a POMDP can be used to find such compressions.
2 0.78963816 199 nips-2002-Timing and Partial Observability in the Dopamine System
Author: Nathaniel D. Daw, Aaron C. Courville, David S. Touretzky
Abstract: According to a series of influential models, dopamine (DA) neurons signal reward prediction error using a temporal-difference (TD) algorithm. We address a problem not convincingly solved in these accounts: how to maintain a representation of cues that predict delayed consequences. Our new model uses a TD rule grounded in partially observable semi-Markov processes, a formalism that captures two largely neglected features of DA experiments: hidden state and temporal variability. Previous models predicted rewards using a tapped delay line representation of sensory inputs; we replace this with a more active process of inference about the underlying state of the world. The DA system can then learn to map these inferred states to reward predictions using TD. The new model can explain previously vexing data on the responses of DA neurons in the face of temporal variability. By combining statistical model-based learning with a physiologically grounded TD theory, it also brings into contact with physiology some insights about behavior that had previously been confined to more abstract psychological models.
3 0.64175832 82 nips-2002-Exponential Family PCA for Belief Compression in POMDPs
Author: Nicholas Roy, Geoffrey J. Gordon
Abstract: Standard value function approaches to finding policies for Partially Observable Markov Decision Processes (POMDPs) are intractable for large models. The intractability of these algorithms is due to a great extent to their generating an optimal policy over the entire belief space. However, in real POMDP problems most belief states are unlikely, and there is a structured, low-dimensional manifold of plausible beliefs embedded in the high-dimensional belief space. We introduce a new method for solving large-scale POMDPs by taking advantage of belief space sparsity. We reduce the dimensionality of the belief space by exponential family Principal Components Analysis [1], which allows us to turn the sparse, highdimensional belief space into a compact, low-dimensional representation in terms of learned features of the belief state. We then plan directly on the low-dimensional belief features. By planning in a low-dimensional space, we can find policies for POMDPs that are orders of magnitude larger than can be handled by conventional techniques. We demonstrate the use of this algorithm on a synthetic problem and also on a mobile robot navigation task.
4 0.59472847 3 nips-2002-A Convergent Form of Approximate Policy Iteration
Author: Theodore J. Perkins, Doina Precup
Abstract: We study a new, model-free form of approximate policy iteration which uses Sarsa updates with linear state-action value function approximation for policy evaluation, and a “policy improvement operator” to generate a new policy based on the learned state-action values. We prove that if the policy improvement operator produces -soft policies and is Lipschitz continuous in the action values, with a constant that is not too large, then the approximate policy iteration algorithm converges to a unique solution from any initial policy. To our knowledge, this is the first convergence result for any form of approximate policy iteration under similar computational-resource assumptions.
5 0.59466583 21 nips-2002-Adaptive Classification by Variational Kalman Filtering
Author: Peter Sykacek, Stephen J. Roberts
Abstract: We propose in this paper a probabilistic approach for adaptive inference of generalized nonlinear classification that combines the computational advantage of a parametric solution with the flexibility of sequential sampling techniques. We regard the parameters of the classifier as latent states in a first order Markov process and propose an algorithm which can be regarded as variational generalization of standard Kalman filtering. The variational Kalman filter is based on two novel lower bounds that enable us to use a non-degenerate distribution over the adaptation rate. An extensive empirical evaluation demonstrates that the proposed method is capable of infering competitive classifiers both in stationary and non-stationary environments. Although we focus on classification, the algorithm is easily extended to other generalized nonlinear models.
6 0.59311545 169 nips-2002-Real-Time Particle Filters
7 0.59298027 189 nips-2002-Stable Fixed Points of Loopy Belief Propagation Are Local Minima of the Bethe Free Energy
8 0.59244549 37 nips-2002-Automatic Derivation of Statistical Algorithms: The EM Family and Beyond
9 0.59212214 52 nips-2002-Cluster Kernels for Semi-Supervised Learning
10 0.59200573 68 nips-2002-Discriminative Densities from Maximum Contrast Estimation
11 0.5901857 127 nips-2002-Learning Sparse Topographic Representations with Products of Student-t Distributions
12 0.589724 190 nips-2002-Stochastic Neighbor Embedding
13 0.58963311 88 nips-2002-Feature Selection and Classification on Matrix Data: From Large Margins to Small Covering Numbers
14 0.58926338 10 nips-2002-A Model for Learning Variance Components of Natural Images
15 0.5880962 27 nips-2002-An Impossibility Theorem for Clustering
16 0.58778811 14 nips-2002-A Probabilistic Approach to Single Channel Blind Signal Separation
17 0.58776498 24 nips-2002-Adaptive Scaling for Feature Selection in SVMs
18 0.58743083 53 nips-2002-Clustering with the Fisher Score
19 0.58528912 46 nips-2002-Boosting Density Estimation
20 0.58370161 203 nips-2002-Using Tarjan's Red Rule for Fast Dependency Tree Construction