nips nips2007 nips2007-215 knowledge-graph by maker-knowledge-mining

215 nips-2007-What makes some POMDP problems easy to approximate?


Source: pdf

Author: Wee S. Lee, Nan Rong, Daniel J. Hsu

Abstract: Point-based algorithms have been surprisingly successful in computing approximately optimal solutions for partially observable Markov decision processes (POMDPs) in high dimensional belief spaces. In this work, we seek to understand the belief-space properties that allow some POMDP problems to be approximated efficiently and thus help to explain the point-based algorithms’ success often observed in the experiments. We show that an approximately optimal POMDP solution can be computed in time polynomial in the covering number of a reachable belief space, which is the subset of the belief space reachable from a given belief point. We also show that under the weaker condition of having a small covering number for an optimal reachable space, which is the subset of the belief space reachable under an optimal policy, computing an approximately optimal solution is NP-hard. However, given a suitable set of points that “cover” an optimal reachable space well, an approximate solution can be computed in polynomial time. The covering number highlights several interesting properties that reduce the complexity of POMDP planning in practice, e.g., fully observed state variables, beliefs with sparse support, smooth beliefs, and circulant state-transition matrices. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 We show that an approximately optimal POMDP solution can be computed in time polynomial in the covering number of a reachable belief space, which is the subset of the belief space reachable from a given belief point. [sent-4, score-2.11]

2 We also show that under the weaker condition of having a small covering number for an optimal reachable space, which is the subset of the belief space reachable under an optimal policy, computing an approximately optimal solution is NP-hard. [sent-5, score-1.564]

3 However, given a suitable set of points that “cover” an optimal reachable space well, an approximate solution can be computed in polynomial time. [sent-6, score-0.566]

4 The covering number highlights several interesting properties that reduce the complexity of POMDP planning in practice, e. [sent-7, score-0.586]

5 , fully observed state variables, beliefs with sparse support, smooth beliefs, and circulant state-transition matrices. [sent-9, score-0.544]

6 1 Introduction Computing an optimal policy for a partially observable Markov decision process (POMDP) is an intractable problem [10, 9]. [sent-10, score-0.415]

7 Intuitively, the intractability is due to the “curse of dimensionality”: the belief space B used in solving a POMDP typically has dimensionality equal to |S|, the number of states in the POMDP, and therefore the size of B grows exponentially with |S|. [sent-11, score-0.418]

8 However, in recent years, point-based POMDP algorithms have made impressive progress in computing approximate solutions by sampling the belief space: POMDPs with hundreds of states have been solved in a matter of seconds [14, 4]. [sent-13, score-0.411]

9 Our work is motivated by a benchmark problem called Tag [11], in which a robot needs to search and tag a moving target that tends to move away from it. [sent-16, score-0.477]

10 The joint state of the robot and target positions is thus only partially observable. [sent-22, score-0.391]

11 The problem has 870 states in total, resulting in a belief space of 870 dimensions. [sent-23, score-0.375]

12 Surprisingly, only two years later, another point-based algorithm [14] computed an approximate solution to Tag, a problem with an 870-dimensional belief space, in less than a minute! [sent-26, score-0.338]

13 One important feature that underlies the success of many point-based algorithms is that they only explore a subset R(b0 ) ⊆ B, usually called the reachable space from b0 . [sent-27, score-0.345]

14 The reachable space R(b0 ) contains all points reachable from a given initial belief point b0 ∈ B under arbitrary sequences of actions and observations. [sent-28, score-1.104]

15 One may then speculate that the reason for point-based algorithms’ good performance on Tag is that its reachable space R(b0 ) has much lower dimensionality than B. [sent-29, score-0.388]

16 In this paper, we propose to use the covering number as an alternative measure of the complexity of POMDP planning ( Section 4). [sent-32, score-0.506]

17 Intuitively, the covering number of a space is the minimum number of given size balls that needed to cover the space fully. [sent-33, score-0.563]

18 We show that an approximately optimal POMDP solution can be computed in time polynomial in the covering number of R(b0 ). [sent-34, score-0.543]

19 The covering number also reveals that the belief space for Tag behaves more like the union of some 29-dimensional spaces rather than an 870-dimensional space, as the robot’s position is fully observed. [sent-35, score-0.875]

20 Therefore, Tag is probably not as hard as it was thought to be, and the covering number captures the complexity of the Tag problem better than the dimensionality of the belief space (the number of states) or the dimensionality of the reachable space. [sent-36, score-1.144]

21 We further ask whether it is possible to compute an approximate solution efficiently under the weaker condition of having a small covering number for an optimal reachable R∗ (b0 ), which contains only points in B reachable from b0 under an optimal policy. [sent-37, score-1.179]

22 Together, the negative and the positive results indicate that using sampling to approximate an optimal reachable space, and not just the reachable space, may be a promising approach in practice. [sent-41, score-0.718]

23 The covering number highlights several properties that reduce the complexity of POMDP planning in practice, and it helps to quantify their effects (Section 5). [sent-44, score-0.586]

24 Highly informative observations usually result in beliefs with sparse support and substantially reduce the covering number. [sent-45, score-0.684]

25 For example, fully observed state variables reduce the covering number by a doubly exponential factor. [sent-46, score-0.528]

26 Interestingly, smooth beliefs, usually a result of imperfect actions and uninformative observations, also reduce the covering number. [sent-47, score-0.558]

27 In addition, state-transition matrices with special structures, such as circulant matrices [1], restrict the space of reachable beliefs and reduce the covering number correspondingly. [sent-48, score-1.232]

28 It has been shown that finding an optimal policy over the entire belief space for a finite-horizon POMDP is PSPACE-complete [10] and that finding an optimal policy over an infinite horizon is undecidable [9]. [sent-50, score-0.749]

29 The approximation errors of some point-based algorithms have been analyzed [11, 14], but these analyses do not address the general question of when an approximately optimal policy can be computed efficiently in polynomial time. [sent-54, score-0.333]

30 The reward function R gives the agent a real-valued reward R(s, a) if it takes action a in state s, and the goal of the agent is to maximize its expected total reward by choosing a suitable sequence of actions. [sent-65, score-0.377]

31 A POMDP solution is a policy π that specifies the action π(b) for every belief b. [sent-69, score-0.508]

32 A policy π induces a value function V π that specifies the value V π (b) of every belief b under π. [sent-71, score-0.443]

33 The optimal value function V ∗ satisfies the following Lipschitz condition: Lemma 1 For any two belief points b and b , if ||b − b || ≤ δ, then |V ∗ (b) − V ∗ (b )| ≤ Rmax 1 1−γ δ. [sent-73, score-0.415]

34 It proo1 o2 vides the basis for approximating the value at a belief point by the values of other belief points nearby. [sent-76, score-0.659]

35 To find an approximately optimal policy, pointbased algorithms explore only the reachable belief space R(b0 ) from a given initial belief point b0 . [sent-77, score-1.101]

36 Strictly speaking, these algorithms compute only a Figure 1: The belief tree rooted at b0 . [sent-78, score-0.363]

37 policy over R(b0 ), rather than the entire belief space B. [sent-79, score-0.487]

38 We can view the exploration of R(b0 ) as searching a belief tree TR rooted at b0 (Figure 1). [sent-80, score-0.363]

39 After obtaining enough belief points from R(b0 ), point-based algorithms perform backup operations over them to compute an approximately optimal value function. [sent-85, score-0.544]

40 4 The Covering Number and the Complexity of POMDP Planning Our first goal is to show that if the covering number of a reachable space R(b0 ) is small, then an approximately optimal policy in R(b0 ) can be computed efficiently. [sent-86, score-0.966]

41 We start with the definition of the covering number: Definition 1 Given a metric space X, a δ-cover of a set B ⊆ X is a set of point C ⊆ X such that for every point b ∈ B, there is a point c ∈ C with ||b − c|| < δ. [sent-87, score-0.419]

42 Intuitively, the covering number is equal to the minimum number of balls of radius δ needed to cover the set B. [sent-90, score-0.475]

43 For any set B, the following relationship holds between packing and covering numbers. [sent-99, score-0.525]

44 It shows that for any point b0 ∈ B, if the covering number of R(b0 ) grows polynomially with the parameters of interest, then a good approximation of the value at b0 can be computed in polynomial time. [sent-102, score-0.462]

45 It performs a depth-first search on a depth-bounded belief tree and uses approximate memorization to avoid unnecessarily computing the values of very similar beliefs. [sent-107, score-0.415]

46 Intuitively, to achieve a polynomial time algorithm, we bound the height of the tree by exploiting the discount factor and bound the width of the tree by exploiting the covering number. [sent-108, score-0.519]

47 We perform the depth-first search recursively on a belief tree TR that has root b0 and height h, while maintaining a δ-packing of R(b0 ) at every level of TR . [sent-109, score-0.444]

48 Suppose that the search encounters a new belief node b at level i of TR . [sent-110, score-0.426]

49 When the search returns, we perform a backup operation to compute V (b) and add b to the packing at level i. [sent-113, score-0.348]

50 For each node b in the packings, the algorithm expands it by calculating the beliefs and the corresponding values for all its children and performing a backup operation at b to compute V (b). [sent-128, score-0.395]

51 It takes O(|S|2 ) time to calculate the belief at a child node. [sent-129, score-0.361]

52 We assume that |S|, |A|, and |O| are constant to focus on the dependency on the covering number, and the above expression then becomes O(hC(δ/2)2 ). [sent-134, score-0.375]

53 Suppose that at each belief point reachable from b0 , we perform such an on-line search for action selection. [sent-138, score-0.718]

54 It first runs the algorithm in Theorem 1 to obtain an initial packing Pi for each level i and estimate the values of belief points in Pi . [sent-146, score-0.572]

55 The process repeats until no more new belief points are discovered within a set of simulation. [sent-149, score-0.352]

56 We show that if the set of simulations is sufficiently large, then the probability that in any future run of the policy, we encounter new belief points not covered by the final set of packings can be made arbitrarily small. [sent-150, score-0.435]

57 Both theorems above assume tha a small covering number of R(b0 ) for efficient computation. [sent-155, score-0.413]

58 To relax this assumption, we may require only that the covering number for an optimal reachable space R∗ (b0 ) is small, as R∗ (b0 ) contains only points reachable under an optimal policy and can be much smaller than R(b0 ). [sent-156, score-1.328]

59 The main idea is to show that a Hamiltonian cycle exists in a given graph if and only an approximation to V ∗ (b0 ), with a suitably chosen error, can be computed for a POMDP whose optimal reachable space R∗ (b0 ) has a small covering number. [sent-159, score-0.84]

60 Theorem 3 Given constant > 0, computing an approximation V (b0 ) of V ∗ (b0 ), with error |V (b0 ) − V ∗ (b0 )| ≤ |V ∗ (b0 )|, is NP-hard, even if the covering number of R∗ (b0 ) is polynomialsized. [sent-161, score-0.404]

61 On the other hand, if an oracle provides us a proper cover of an optimal reachable space R∗ (b0 ), then a good approximation of V ∗ (b0 ) can be found efficiently. [sent-165, score-0.531]

62 5 Bounding the Covering Number The covering number highlights several properties that reduce the complexity of POMDP planning in practice. [sent-171, score-0.586]

63 We describe them below and show how they affect the covering number. [sent-172, score-0.375]

64 If d of these variables are fully observed, then for every such belief point, its vector representation contains at most m = k d−d non-zero elements out of k d elements in total. [sent-175, score-0.36]

65 For a given initial belief b0 , the belief vectors with the same non-zero pattern form a subspace in R(b0 ), and R(b0 ) is a union of these subspaces. [sent-176, score-0.717]

66 We can compute a δ-cover for each subspace by discretizing each non-zero element of the belief vectors to an accuracy of δ/m, and the size of the resulting δ-cover is at most ( m )m . [sent-177, score-0.335]

67 So the δ-covering number of R(b0 ) is at most k d ( m )m = k d ( k δ )k δ The fully observed variables thus give a doubly exponential reduction in the covering number: it reduces the exponent by a factor of k d at the cost of a multiplicative factor of k d . [sent-180, score-0.454]

68 If d state variables are fully observed, then for any belief point b0 , the δ-covering number d−d d−d of the reachable belief space R(b0 ) is at most k d ( k δ )k . [sent-182, score-1.056]

69 The robot and the target can occupy any position in an environment modeled as a grid of 29 cells. [sent-185, score-0.405]

70 So, there are 29 × 29 + 29 = 870 states in total, and the belief space B is 870-dimensional. [sent-187, score-0.375]

71 Indeed, for Tag, any reachable belief space R(b0 ) is effectively a union of two sets. [sent-190, score-0.695]

72 Clearly, the covering number captures the underlying complexity of R(b0 ) more accurately than the dimensionality of R(b0 ). [sent-193, score-0.449]

73 For example, in the Tag problem, the state is known exactly if the robot and the target are in the same position, leaving only a single non-zero element in the belief vector. [sent-198, score-0.632]

74 If the beliefs are always sparse, we can exploit the sparsity to bound the covering number. [sent-200, score-0.558]

75 Otherwise, sparsity may still give a hint that the covering number is smaller than what would be suggested by the dimensionality of the belief space. [sent-201, score-0.725]

76 By exploiting the non-zeros patterns of belief vectors in a way similar to that in Section 5. [sent-202, score-0.335]

77 If every belief in B can be represented as a vector with at most m non-zero elements, then the δ-covering number of B is m O(nm m ). [sent-204, score-0.333]

78 , when their Fourier representations are sparse, the covering number is also small. [sent-209, score-0.375]

79 Assume that every belief b ∈ B can be represented as a linear combination of m basis vectors such that the magnitudes of both the elements of the basis vectors and the coefficients representing b are bounded by a constant C. [sent-212, score-0.419]

80 The δ2 2 covering number of B is O(( 2C δmn )m ) when the basis vectors are real-valued, and O(( 4C δmn )2m ) when they are complex-valued. [sent-213, score-0.403]

81 A mobile robot scout needs to navigate from a known start position to a goal position in a large environment modeled as a grid. [sent-218, score-0.469]

82 The robot can take four actions to move in the {N, S, E, W } directions, but have imperfect control. [sent-220, score-0.332]

83 At each grid cell, the robot moves to the intended cell with probability 1 − p and moves diagonally to the two cells adjacent to the intended one with probability 0. [sent-222, score-0.387]

84 Under our assumptions, the state-transition functions representing robot actions are invariant over the grid cells and can thus be represented by circulant matrices [1]. [sent-225, score-0.605]

85 In the context of POMDPs, if applying a state-transition matrix to a belief b corresponds to convolution with a suitable distribution, then the state-transition matrix is circulant. [sent-227, score-0.331]

86 In our example, this means that given a set of robot moves, we can apply them in any order and the resulting belief on the robot’s position is the same. [sent-230, score-0.594]

87 This greatly reduces the number of possible beliefs and correspondingly the covering number in open-loop POMDPs, where there are no observations involved. [sent-231, score-0.587]

88 Proposition 4 Suppose that all state-transition matrices representing actions are circulant and that each matrix has at most m eigenvalues whose magnitudes are greater than ζ, with 0 < ζ < 1. [sent-232, score-0.348]

89 of the reachable belief space R(b0 ) is O 8 mn δ In our example, suppose that the robot scout makes a sequences of moves and needs to decide when to take occasional observations along the way to localize itself. [sent-234, score-1.093]

90 To bound the covering number, we divide the sequence of moves into subsequences such that each subsequence starts with an observation and ends right before the next observation. [sent-235, score-0.54]

91 In each subsequence, the robot starts at a specific belief and moves without additional observations. [sent-236, score-0.604]

92 Furthermore, since all the observations are highly informative, we assume that the initial beliefs of all subsequences can be represented as vectors with at most m non-zero elements. [sent-238, score-0.364]

93 In mobile robot navigation, obstacles or boundaries in the environment often cause difficulties. [sent-246, score-0.336]

94 6 Conclusion We propose the covering number as a measure of the complexity of POMDP planning. [sent-248, score-0.406]

95 We believe that for point-based algorithms, the covering number captures the difficulty of computing approximate solutions to POMDPs better than other commonly used measures, such as the number of states. [sent-249, score-0.406]

96 The covering number highlights several interesting properties that reduce the complexity of POMDP planning, and quantifies their effects. [sent-250, score-0.486]

97 Using the covering number, we have shown several results that help to identify the main difficulty of POMDP planning using point-based algorithms. [sent-251, score-0.475]

98 These results indicate that a promising approach in practice is to approximate an optimal reachable space through sampling. [sent-252, score-0.461]

99 Accelerating point-based POMDP algorithms through successive approximations of the optimal reachable space. [sent-283, score-0.364]

100 On the undecidability of probabilistic planning and infinite-horizon partially observable Markov decision problems. [sent-319, score-0.316]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('pomdp', 0.434), ('covering', 0.375), ('belief', 0.307), ('reachable', 0.301), ('robot', 0.234), ('rmax', 0.225), ('circulant', 0.189), ('beliefs', 0.183), ('tag', 0.151), ('packing', 0.15), ('policy', 0.136), ('pomdps', 0.127), ('observable', 0.107), ('planning', 0.1), ('backup', 0.082), ('actions', 0.074), ('cover', 0.07), ('agent', 0.068), ('partially', 0.066), ('subsequences', 0.066), ('action', 0.065), ('moves', 0.063), ('optimal', 0.063), ('children', 0.061), ('tr', 0.058), ('polynomial', 0.058), ('hamiltonian', 0.057), ('packings', 0.057), ('scout', 0.057), ('matrices', 0.055), ('child', 0.054), ('fully', 0.053), ('position', 0.053), ('proposition', 0.053), ('highlights', 0.05), ('target', 0.047), ('approximately', 0.047), ('singapore', 0.045), ('search', 0.045), ('points', 0.045), ('sparse', 0.044), ('space', 0.044), ('environment', 0.044), ('state', 0.044), ('decision', 0.043), ('union', 0.043), ('dimensionality', 0.043), ('pi', 0.04), ('tagged', 0.04), ('theorems', 0.038), ('level', 0.038), ('pbvi', 0.038), ('piecewiselinear', 0.038), ('markov', 0.037), ('reward', 0.036), ('subsequence', 0.036), ('node', 0.036), ('mn', 0.033), ('operation', 0.033), ('nm', 0.032), ('initial', 0.032), ('tree', 0.032), ('approximate', 0.031), ('complexity', 0.031), ('smooth', 0.031), ('reduce', 0.03), ('magnitudes', 0.03), ('kaelbling', 0.03), ('recurrence', 0.03), ('obstacles', 0.03), ('balls', 0.03), ('hc', 0.03), ('approximation', 0.029), ('observations', 0.029), ('lemma', 0.028), ('cycle', 0.028), ('mobile', 0.028), ('vectors', 0.028), ('culty', 0.028), ('grid', 0.027), ('hundreds', 0.027), ('hsu', 0.026), ('doubly', 0.026), ('encounter', 0.026), ('represented', 0.026), ('suppose', 0.025), ('rooted', 0.024), ('imperfect', 0.024), ('uninformative', 0.024), ('proper', 0.024), ('states', 0.024), ('suitable', 0.024), ('theorem', 0.024), ('hardness', 0.023), ('intelligence', 0.023), ('informative', 0.023), ('height', 0.022), ('promising', 0.022), ('seconds', 0.022), ('intuitively', 0.021)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0 215 nips-2007-What makes some POMDP problems easy to approximate?

Author: Wee S. Lee, Nan Rong, Daniel J. Hsu

Abstract: Point-based algorithms have been surprisingly successful in computing approximately optimal solutions for partially observable Markov decision processes (POMDPs) in high dimensional belief spaces. In this work, we seek to understand the belief-space properties that allow some POMDP problems to be approximated efficiently and thus help to explain the point-based algorithms’ success often observed in the experiments. We show that an approximately optimal POMDP solution can be computed in time polynomial in the covering number of a reachable belief space, which is the subset of the belief space reachable from a given belief point. We also show that under the weaker condition of having a small covering number for an optimal reachable space, which is the subset of the belief space reachable under an optimal policy, computing an approximately optimal solution is NP-hard. However, given a suitable set of points that “cover” an optimal reachable space well, an approximate solution can be computed in polynomial time. The covering number highlights several interesting properties that reduce the complexity of POMDP planning in practice, e.g., fully observed state variables, beliefs with sparse support, smooth beliefs, and circulant state-transition matrices. 1

2 0.38002518 30 nips-2007-Bayes-Adaptive POMDPs

Author: Stephane Ross, Brahim Chaib-draa, Joelle Pineau

Abstract: Bayesian Reinforcement Learning has generated substantial interest recently, as it provides an elegant solution to the exploration-exploitation trade-off in reinforcement learning. However most investigations of Bayesian reinforcement learning to date focus on the standard Markov Decision Processes (MDPs). Our goal is to extend these ideas to the more general Partially Observable MDP (POMDP) framework, where the state is a hidden variable. To address this problem, we introduce a new mathematical model, the Bayes-Adaptive POMDP. This new model allows us to (1) improve knowledge of the POMDP domain through interaction with the environment, and (2) plan optimal sequences of actions which can tradeoff between improving the model, identifying the state, and gathering reward. We show how the model can be finitely approximated while preserving the value function. We describe approximations for belief tracking and planning in this model. Empirical results on two domains show that the model estimate and agent’s return improve over time, as the agent learns better model estimates. 1

3 0.31984788 204 nips-2007-Theoretical Analysis of Heuristic Search Methods for Online POMDPs

Author: Stephane Ross, Joelle Pineau, Brahim Chaib-draa

Abstract: Planning in partially observable environments remains a challenging problem, despite significant recent advances in offline approximation techniques. A few online methods have also been proposed recently, and proven to be remarkably scalable, but without the theoretical guarantees of their offline counterparts. Thus it seems natural to try to unify offline and online techniques, preserving the theoretical properties of the former, and exploiting the scalability of the latter. In this paper, we provide theoretical guarantees on an anytime algorithm for POMDPs which aims to reduce the error made by approximate offline value iteration algorithms through the use of an efficient online searching procedure. The algorithm uses search heuristics based on an error analysis of lookahead search, to guide the online search towards reachable beliefs with the most potential to reduce error. We provide a general theorem showing that these search heuristics are admissible, and lead to complete and ǫ-optimal algorithms. This is, to the best of our knowledge, the strongest theoretical result available for online POMDP solution methods. We also provide empirical evidence showing that our approach is also practical, and can find (provably) near-optimal solutions in reasonable time. 1

4 0.13558713 162 nips-2007-Random Sampling of States in Dynamic Programming

Author: Chris Atkeson, Benjamin Stephens

Abstract: We combine three threads of research on approximate dynamic programming: sparse random sampling of states, value function and policy approximation using local models, and using local trajectory optimizers to globally optimize a policy and associated value function. Our focus is on finding steady state policies for deterministic time invariant discrete time control problems with continuous states and actions often found in robotics. In this paper we show that we can now solve problems we couldn’t solve previously. 1

5 0.13077877 86 nips-2007-Exponential Family Predictive Representations of State

Author: David Wingate, Satinder S. Baveja

Abstract: In order to represent state in controlled, partially observable, stochastic dynamical systems, some sort of sufficient statistic for history is necessary. Predictive representations of state (PSRs) capture state as statistics of the future. We introduce a new model of such systems called the “Exponential family PSR,” which defines as state the time-varying parameters of an exponential family distribution which models n sequential observations in the future. This choice of state representation explicitly connects PSRs to state-of-the-art probabilistic modeling, which allows us to take advantage of current efforts in high-dimensional density estimation, and in particular, graphical models and maximum entropy models. We present a parameter learning algorithm based on maximum likelihood, and we show how a variety of current approximate inference methods apply. We evaluate the quality of our model with reinforcement learning by directly evaluating the control performance of the model. 1

6 0.12742266 168 nips-2007-Reinforcement Learning in Continuous Action Spaces through Sequential Monte Carlo Methods

7 0.11634382 91 nips-2007-Fitted Q-iteration in continuous action-space MDPs

8 0.11462118 98 nips-2007-Hierarchical Apprenticeship Learning with Application to Quadruped Locomotion

9 0.10624776 148 nips-2007-Online Linear Regression and Its Application to Model-Based Reinforcement Learning

10 0.09810292 5 nips-2007-A Game-Theoretic Approach to Apprenticeship Learning

11 0.097888403 34 nips-2007-Bayesian Policy Learning with Trans-Dimensional MCMC

12 0.097143218 102 nips-2007-Incremental Natural Actor-Critic Algorithms

13 0.081138797 151 nips-2007-Optimistic Linear Programming gives Logarithmic Regret for Irreducible MDPs

14 0.080769792 68 nips-2007-Discovering Weakly-Interacting Factors in a Complex Stochastic Process

15 0.066650175 52 nips-2007-Competition Adds Complexity

16 0.062808312 185 nips-2007-Stable Dual Dynamic Programming

17 0.061346509 124 nips-2007-Managing Power Consumption and Performance of Computing Systems Using Reinforcement Learning

18 0.058634181 75 nips-2007-Efficient Bayesian Inference for Dynamically Changing Graphs

19 0.052209731 157 nips-2007-Privacy-Preserving Belief Propagation and Sampling

20 0.051423654 203 nips-2007-The rat as particle filter


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.204), (1, -0.269), (2, 0.029), (3, -0.09), (4, -0.181), (5, -0.004), (6, -0.008), (7, 0.036), (8, 0.162), (9, -0.018), (10, -0.031), (11, 0.098), (12, 0.028), (13, -0.134), (14, -0.063), (15, -0.204), (16, 0.181), (17, -0.183), (18, 0.169), (19, -0.083), (20, -0.031), (21, -0.231), (22, 0.004), (23, 0.155), (24, -0.218), (25, 0.13), (26, 0.028), (27, 0.09), (28, -0.048), (29, -0.065), (30, 0.003), (31, 0.072), (32, 0.077), (33, -0.034), (34, 0.078), (35, -0.0), (36, 0.049), (37, -0.019), (38, 0.064), (39, -0.033), (40, 0.076), (41, -0.009), (42, -0.029), (43, 0.01), (44, 0.0), (45, -0.041), (46, -0.049), (47, 0.003), (48, -0.015), (49, 0.035)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97202122 215 nips-2007-What makes some POMDP problems easy to approximate?

Author: Wee S. Lee, Nan Rong, Daniel J. Hsu

Abstract: Point-based algorithms have been surprisingly successful in computing approximately optimal solutions for partially observable Markov decision processes (POMDPs) in high dimensional belief spaces. In this work, we seek to understand the belief-space properties that allow some POMDP problems to be approximated efficiently and thus help to explain the point-based algorithms’ success often observed in the experiments. We show that an approximately optimal POMDP solution can be computed in time polynomial in the covering number of a reachable belief space, which is the subset of the belief space reachable from a given belief point. We also show that under the weaker condition of having a small covering number for an optimal reachable space, which is the subset of the belief space reachable under an optimal policy, computing an approximately optimal solution is NP-hard. However, given a suitable set of points that “cover” an optimal reachable space well, an approximate solution can be computed in polynomial time. The covering number highlights several interesting properties that reduce the complexity of POMDP planning in practice, e.g., fully observed state variables, beliefs with sparse support, smooth beliefs, and circulant state-transition matrices. 1

2 0.92540926 30 nips-2007-Bayes-Adaptive POMDPs

Author: Stephane Ross, Brahim Chaib-draa, Joelle Pineau

Abstract: Bayesian Reinforcement Learning has generated substantial interest recently, as it provides an elegant solution to the exploration-exploitation trade-off in reinforcement learning. However most investigations of Bayesian reinforcement learning to date focus on the standard Markov Decision Processes (MDPs). Our goal is to extend these ideas to the more general Partially Observable MDP (POMDP) framework, where the state is a hidden variable. To address this problem, we introduce a new mathematical model, the Bayes-Adaptive POMDP. This new model allows us to (1) improve knowledge of the POMDP domain through interaction with the environment, and (2) plan optimal sequences of actions which can tradeoff between improving the model, identifying the state, and gathering reward. We show how the model can be finitely approximated while preserving the value function. We describe approximations for belief tracking and planning in this model. Empirical results on two domains show that the model estimate and agent’s return improve over time, as the agent learns better model estimates. 1

3 0.86268002 204 nips-2007-Theoretical Analysis of Heuristic Search Methods for Online POMDPs

Author: Stephane Ross, Joelle Pineau, Brahim Chaib-draa

Abstract: Planning in partially observable environments remains a challenging problem, despite significant recent advances in offline approximation techniques. A few online methods have also been proposed recently, and proven to be remarkably scalable, but without the theoretical guarantees of their offline counterparts. Thus it seems natural to try to unify offline and online techniques, preserving the theoretical properties of the former, and exploiting the scalability of the latter. In this paper, we provide theoretical guarantees on an anytime algorithm for POMDPs which aims to reduce the error made by approximate offline value iteration algorithms through the use of an efficient online searching procedure. The algorithm uses search heuristics based on an error analysis of lookahead search, to guide the online search towards reachable beliefs with the most potential to reduce error. We provide a general theorem showing that these search heuristics are admissible, and lead to complete and ǫ-optimal algorithms. This is, to the best of our knowledge, the strongest theoretical result available for online POMDP solution methods. We also provide empirical evidence showing that our approach is also practical, and can find (provably) near-optimal solutions in reasonable time. 1

4 0.53300774 52 nips-2007-Competition Adds Complexity

Author: Judy Goldsmith, Martin Mundhenk

Abstract: It is known that determinining whether a DEC-POMDP, namely, a cooperative partially observable stochastic game (POSG), has a cooperative strategy with positive expected reward is complete for NEXP. It was not known until now how cooperation affected that complexity. We show that, for competitive POSGs, the complexity of determining whether one team has a positive-expected-reward strategy is complete for NEXPNP .

5 0.45236221 68 nips-2007-Discovering Weakly-Interacting Factors in a Complex Stochastic Process

Author: Charlie Frogner, Avi Pfeffer

Abstract: Dynamic Bayesian networks are structured representations of stochastic processes. Despite their structure, exact inference in DBNs is generally intractable. One approach to approximate inference involves grouping the variables in the process into smaller factors and keeping independent beliefs over these factors. In this paper we present several techniques for decomposing a dynamic Bayesian network automatically to enable factored inference. We examine a number of features of a DBN that capture different types of dependencies that will cause error in factored inference. An empirical comparison shows that the most useful of these is a heuristic that estimates the mutual information introduced between factors by one step of belief propagation. In addition to features computed over entire factors, for efficiency we explored scores computed over pairs of variables. We present search methods that use these features, pairwise and not, to find a factorization, and we compare their results on several datasets. Automatic factorization extends the applicability of factored inference to large, complex models that are undesirable to factor by hand. Moreover, tests on real DBNs show that automatic factorization can achieve significantly lower error in some cases. 1

6 0.4201425 168 nips-2007-Reinforcement Learning in Continuous Action Spaces through Sequential Monte Carlo Methods

7 0.41174805 162 nips-2007-Random Sampling of States in Dynamic Programming

8 0.38967094 86 nips-2007-Exponential Family Predictive Representations of State

9 0.36023173 148 nips-2007-Online Linear Regression and Its Application to Model-Based Reinforcement Learning

10 0.34518394 171 nips-2007-Scan Strategies for Meteorological Radars

11 0.3397536 91 nips-2007-Fitted Q-iteration in continuous action-space MDPs

12 0.32972059 98 nips-2007-Hierarchical Apprenticeship Learning with Application to Quadruped Locomotion

13 0.29164752 185 nips-2007-Stable Dual Dynamic Programming

14 0.26861769 5 nips-2007-A Game-Theoretic Approach to Apprenticeship Learning

15 0.25681058 151 nips-2007-Optimistic Linear Programming gives Logarithmic Regret for Irreducible MDPs

16 0.24829787 174 nips-2007-Selecting Observations against Adversarial Objectives

17 0.2454275 34 nips-2007-Bayesian Policy Learning with Trans-Dimensional MCMC

18 0.24524637 124 nips-2007-Managing Power Consumption and Performance of Computing Systems Using Reinforcement Learning

19 0.24309528 77 nips-2007-Efficient Inference for Distributions on Permutations

20 0.23694962 187 nips-2007-Structured Learning with Approximate Inference


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(5, 0.069), (9, 0.072), (13, 0.033), (16, 0.017), (21, 0.047), (31, 0.033), (34, 0.027), (35, 0.051), (47, 0.117), (83, 0.124), (85, 0.036), (88, 0.24), (90, 0.044)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.79002589 166 nips-2007-Regularized Boost for Semi-Supervised Learning

Author: Ke Chen, Shihai Wang

Abstract: Semi-supervised inductive learning concerns how to learn a decision rule from a data set containing both labeled and unlabeled data. Several boosting algorithms have been extended to semi-supervised learning with various strategies. To our knowledge, however, none of them takes local smoothness constraints among data into account during ensemble learning. In this paper, we introduce a local smoothness regularizer to semi-supervised boosting algorithms based on the universal optimization framework of margin cost functionals. Our regularizer is applicable to existing semi-supervised boosting algorithms to improve their generalization and speed up their training. Comparative results on synthetic, benchmark and real world tasks demonstrate the effectiveness of our local smoothness regularizer. We discuss relevant issues and relate our regularizer to previous work. 1

2 0.7887032 41 nips-2007-COFI RANK - Maximum Margin Matrix Factorization for Collaborative Ranking

Author: Markus Weimer, Alexandros Karatzoglou, Quoc V. Le, Alex J. Smola

Abstract: In this paper, we consider collaborative filtering as a ranking problem. We present a method which uses Maximum Margin Matrix Factorization and optimizes ranking instead of rating. We employ structured output prediction to optimize directly for ranking scores. Experimental results show that our method gives very good ranking scores and scales well on collaborative filtering tasks. 1

same-paper 3 0.76769465 215 nips-2007-What makes some POMDP problems easy to approximate?

Author: Wee S. Lee, Nan Rong, Daniel J. Hsu

Abstract: Point-based algorithms have been surprisingly successful in computing approximately optimal solutions for partially observable Markov decision processes (POMDPs) in high dimensional belief spaces. In this work, we seek to understand the belief-space properties that allow some POMDP problems to be approximated efficiently and thus help to explain the point-based algorithms’ success often observed in the experiments. We show that an approximately optimal POMDP solution can be computed in time polynomial in the covering number of a reachable belief space, which is the subset of the belief space reachable from a given belief point. We also show that under the weaker condition of having a small covering number for an optimal reachable space, which is the subset of the belief space reachable under an optimal policy, computing an approximately optimal solution is NP-hard. However, given a suitable set of points that “cover” an optimal reachable space well, an approximate solution can be computed in polynomial time. The covering number highlights several interesting properties that reduce the complexity of POMDP planning in practice, e.g., fully observed state variables, beliefs with sparse support, smooth beliefs, and circulant state-transition matrices. 1

4 0.66095281 96 nips-2007-Heterogeneous Component Analysis

Author: Shigeyuki Oba, Motoaki Kawanabe, Klaus-Robert Müller, Shin Ishii

Abstract: In bioinformatics it is often desirable to combine data from various measurement sources and thus structured feature vectors are to be analyzed that possess different intrinsic blocking characteristics (e.g., different patterns of missing values, observation noise levels, effective intrinsic dimensionalities). We propose a new machine learning tool, heterogeneous component analysis (HCA), for feature extraction in order to better understand the factors that underlie such complex structured heterogeneous data. HCA is a linear block-wise sparse Bayesian PCA based not only on a probabilistic model with block-wise residual variance terms but also on a Bayesian treatment of a block-wise sparse factor-loading matrix. We study various algorithms that implement our HCA concept extracting sparse heterogeneous structure by obtaining common components for the blocks and specific components within each block. Simulations on toy and bioinformatics data underline the usefulness of the proposed structured matrix factorization concept. 1

5 0.6541366 158 nips-2007-Probabilistic Matrix Factorization

Author: Andriy Mnih, Ruslan Salakhutdinov

Abstract: Many existing approaches to collaborative filtering can neither handle very large datasets nor easily deal with users who have very few ratings. In this paper we present the Probabilistic Matrix Factorization (PMF) model which scales linearly with the number of observations and, more importantly, performs well on the large, sparse, and very imbalanced Netflix dataset. We further extend the PMF model to include an adaptive prior on the model parameters and show how the model capacity can be controlled automatically. Finally, we introduce a constrained version of the PMF model that is based on the assumption that users who have rated similar sets of movies are likely to have similar preferences. The resulting model is able to generalize considerably better for users with very few ratings. When the predictions of multiple PMF models are linearly combined with the predictions of Restricted Boltzmann Machines models, we achieve an error rate of 0.8861, that is nearly 7% better than the score of Netflix’s own system.

6 0.62797934 30 nips-2007-Bayes-Adaptive POMDPs

7 0.61697358 94 nips-2007-Gaussian Process Models for Link Analysis and Transfer Learning

8 0.6147272 138 nips-2007-Near-Maximum Entropy Models for Binary Neural Representations of Natural Images

9 0.61256951 63 nips-2007-Convex Relaxations of Latent Variable Training

10 0.60599029 115 nips-2007-Learning the 2-D Topology of Images

11 0.60355967 156 nips-2007-Predictive Matrix-Variate t Models

12 0.60253096 86 nips-2007-Exponential Family Predictive Representations of State

13 0.60238892 168 nips-2007-Reinforcement Learning in Continuous Action Spaces through Sequential Monte Carlo Methods

14 0.60237026 204 nips-2007-Theoretical Analysis of Heuristic Search Methods for Online POMDPs

15 0.60212588 209 nips-2007-Ultrafast Monte Carlo for Statistical Summations

16 0.60059679 134 nips-2007-Multi-Task Learning via Conic Programming

17 0.59676141 18 nips-2007-A probabilistic model for generating realistic lip movements from speech

18 0.59552914 34 nips-2007-Bayesian Policy Learning with Trans-Dimensional MCMC

19 0.59512818 88 nips-2007-Fast and Scalable Training of Semi-Supervised CRFs with Application to Activity Recognition

20 0.59432936 148 nips-2007-Online Linear Regression and Its Application to Model-Based Reinforcement Learning