nips nips2012 nips2012-331 knowledge-graph by maker-knowledge-mining

331 nips-2012-Symbolic Dynamic Programming for Continuous State and Observation POMDPs


Source: pdf

Author: Zahra Zamani, Scott Sanner, Pascal Poupart, Kristian Kersting

Abstract: Point-based value iteration (PBVI) methods have proven extremely effective for finding (approximately) optimal dynamic programming solutions to partiallyobservable Markov decision processes (POMDPs) when a set of initial belief states is known. However, no PBVI work has provided exact point-based backups for both continuous state and observation spaces, which we tackle in this paper. Our key insight is that while there may be an infinite number of observations, there are only a finite number of continuous observation partitionings that are relevant for optimal decision-making when a finite, fixed set of reachable belief states is considered. To this end, we make two important contributions: (1) we show how previous exact symbolic dynamic programming solutions for continuous state MDPs can be generalized to continuous state POMDPs with discrete observations, and (2) we show how recently developed symbolic integration methods allow this solution to be extended to PBVI for continuous state and observation POMDPs with potentially correlated, multivariate continuous observation spaces. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 de Abstract Point-based value iteration (PBVI) methods have proven extremely effective for finding (approximately) optimal dynamic programming solutions to partiallyobservable Markov decision processes (POMDPs) when a set of initial belief states is known. [sent-13, score-0.276]

2 However, no PBVI work has provided exact point-based backups for both continuous state and observation spaces, which we tackle in this paper. [sent-14, score-0.482]

3 Our key insight is that while there may be an infinite number of observations, there are only a finite number of continuous observation partitionings that are relevant for optimal decision-making when a finite, fixed set of reachable belief states is considered. [sent-15, score-0.581]

4 In recent years, point-based value iteration methods (PBVI) [5, 10, 11, 7] have proved extremely successful at scaling (approximately) optimal POMDP solutions to large state spaces when a set of initial belief states is known. [sent-18, score-0.299]

5 While PBVI has been extended to both continuous state and continuous observation spaces, no prior work has tackled both jointly without sampling. [sent-19, score-0.507]

6 While restricted to discrete states, [2] provides an important insight that we exploit in this work: only a finite number of partitionings of the observation space are required to distinguish between the optimal conditional policy over a finite set of belief states. [sent-21, score-0.451]

7 We propose two major contributions: First, we extend symbolic dynamic programming for continuous state MDPs [9] to POMDPs with discrete observations, arbitrary continuous reward and transitions with discrete noise (i. [sent-22, score-0.917]

8 2 Hybrid POMDP Model A hybrid (discrete and continuous) partially observable MDP (H-POMDP) is a tuple S, A, O, T , R, Z, γ, h . [sent-26, score-0.074]

9 States S are given by vector (ds , xs ) = (ds1 , . [sent-27, score-0.523]

10 , xsm ) where each dsi ∈ {0, 1} (1 ≤ i ≤ n) is boolean and each xsj ∈ R (1 ≤ j ≤ m) is continuous. [sent-33, score-0.178]

11 We assume a finite, discrete action space A = {a1 , . [sent-34, score-0.106]

12 Observations O are given by the vector (do , xo ) = (do1 , . [sent-38, score-0.175]

13 , xoq ) where each doi ∈ {0, 1} (1 ≤ i ≤ p) is boolean and each xoj ∈ R (1 ≤ j ≤ q) is continuous. [sent-44, score-0.104]

14 A discount factor γ, 0 ≤ γ ≤ 1 is used to discount rewards t time steps into the future by γ t . [sent-46, score-0.022]

15 Observation probabilities over continuous variables p(xoj|xs ,ds ,a) only occur in the case of continuous observations and are required to be piecewise constant (a mixture of uniform distributions); the same restriction holds for belief state representations. [sent-51, score-0.713]

16 The reward R(d, x, a) may be an arbitrary (nonlinear) piecewise function in the case of deterministic observations and a piecewise constant function in the case of continuous observations. [sent-52, score-0.441]

17 Example (Power Plant) [1] The steam generation system of a power plant evaporates feed-water under restricted pressure and temperature conditions to turn a steam turbine. [sent-54, score-0.587]

18 A reward is obtained when electricity is generated from the turbine and the steam pressure and temperature are within safe ranges. [sent-55, score-0.434]

19 Mixing water and steam makes the respective pressure and temperature observations po ∈ R and to ∈ R on the underlying state ps ∈ R and ts ∈ R highly uncertain. [sent-56, score-0.989]

20 Actions A = {open, close} control temperature and pressure by means of a pressure valve. [sent-57, score-0.374]

21 We initially present two H-POMDP variants labeled 1D-Power Plant using a single temperature state variable ts . [sent-58, score-0.388]

22 5 * x) (1 * x) >= 150 250 (1 * x) <= 139 197 + (2 * x) 121 + (3 * x) Figure 1: (left) Example conditional plan β h for discrete observations; (right) example α-function for β h over state b ∈ {0, 1}, x ∈ R in decision diagram form: the true (1) branch is solid, the false (0) branch is dashed. [sent-62, score-0.198]

23 1 p(o = high|ts , a = open) = p(o = high|ts , a = close) = ts ≤ 15 : 0. [sent-65, score-0.22]

24 1D-Power Plant variant where we define an observation space with a single continuous variable to uniformly distributed on an interval of 10 units centered at ts . [sent-69, score-0.49]

25 p(to |ts , a = open) = U (to ; ts − 5, ts + 5) = (to > ts − 5) ∧ (to < ts + 5) : 0. [sent-70, score-0.88]

26 1 (to ≤ ts − 5) ∨ (to ≥ ts + 5) : 0 (5) While simple, we note no prior method could perform exact point-based backups for either problem. [sent-71, score-0.554]

27 3 Value Iteration for Hybrid POMDPs In an H-POMDP, the agent does not directly observe the states and thus must maintain a belief state b(xs ,ds ) = p(xs ,ds ). [sent-72, score-0.276]

28 For a given belief state b = b(xs ,ds ), a POMDP policy π can be represented by a tree corresponding to a conditional plan β. [sent-73, score-0.312]

29 An h-step conditional plan β h can be defined recursively in terms of (h − 1)-step conditional plans as shown in Fig. [sent-74, score-0.024]

30 Our goal is to find a policy π that maximizes the value function, defined as the sum of expected discounted rewards over horizon h starting from initial belief state b: h h Vπ (b) = Eπ t=0 γ t · rt b 0 = b (6) where rt is the reward obtained at time t and b0 is the belief state at t = 0. [sent-76, score-0.712]

31 For finite h and belief state b, the optimal policy π is given by an h-step conditional plan β h . [sent-77, score-0.312]

32 The Γh in this optimal h-stage-to-go value function can be computed via Monahan’s dynamic pro0 0 gramming approach to value iteration (VI) [4]. [sent-80, score-0.063]

33 The idea is straightforward and the main modification needed to Monahan’s VI approach in Algorithm 1 is the loop from lines 23–25 where only α-functions optimal at some belief state are retained for subsequent iterations. [sent-86, score-0.27]

34 In the case of continuous observation variables (q > 0), we will need to derive a relevant set of observations on line 10, a key contribution of this work as described in Section 4. [sent-87, score-0.414]

35 Otherwise if the observations are only discrete (q = 0), then a finite set of observations is already known and the observation function as given in Eq (2). [sent-89, score-0.425]

36 If used for PBVI, an efficient direct backup computation of the optimal α-function for belief state bi should be used in line 17 that is linear in the number of observations [5, 11] and which obviates the need for lines 23–25. [sent-91, score-0.547]

37 Nonetheless, PBVI has been quite successful in practice without exhaustive enumeration of all reachable beliefs [5, 10, 11, 7], which motivates our use of PBVI in this work. [sent-94, score-0.081]

38 4 4 Symbolic Dynamic Programming In this section we take a symbolic dynamic programming (SDP) approach to implementing VI and PBVI as defined in the last section. [sent-95, score-0.291]

39 To do this, we need only show that all required operations can be computed efficiently and in closed-form, which we do next, building on SDP for MDPs [9]. [sent-96, score-0.034]

40 The φi are disjoint logical formulae defined over xs ,ds and/or xo ,do with logical (∧, ∨, ¬) combinations of boolean variables and inequalities (≥, >, ≤, <) over continuous variables. [sent-105, score-0.96]

41 For discrete observation H-POMDPs, the fi and inequalities may use any function (e. [sent-106, score-0.237]

42 , sin(x1 ) > log(x2 ) · x3 ); for continuous observations, they are restricted to linear inequalities and linear or piecewise constant fi as described in Section 2. [sent-108, score-0.231]

43 For unary operations such as scalar multiplication c · f (for some constant c ∈ R) or negation −f on case statements is simply to apply the operation on each case partition fi (1 ≤ i ≤ k). [sent-109, score-0.137]

44 A binary operation on two case statements, takes the cross-product of the logical partitions of each case statement and performs the corresponding operation on the resulting paired partitions. [sent-110, score-0.169]

45 The cross-sum ⊕ of two cases is defined as the following:  φ1 : f 1 ⊕ φ2 : f 2 ψ1 : g 1 ψ2 : g 2 φ1 ∧ ψ1   φ ∧ ψ 1 2 = φ2 ∧ ψ1   φ ∧ ψ 2 2 : : : : f1 + g1 f1 + g2 f2 + g1 f2 + g2 Likewise and ⊗ are defined by subtracting or multiplying partition values. [sent-111, score-0.027]

46 Inconsistent partitions can be discarded when they are irrelevant to the function value. [sent-112, score-0.091]

47 A symbolic case maximization is  defined as below: φ1 ∧ ψ1 ∧ f1 > g1 : f1 casemax φ1 : f1 , φ2 : f2   φ1 ∧ ψ1 ∧ f1 ≤ g1 : g1    = φ 1 ∧ ψ2 ∧ f 1 > g 2 : f 1 φ1 ∧ ψ2 ∧ f1 ≤ g2 : g2    . [sent-113, score-0.315]

48 the transition in Eq (3)) then the integral is equivalent to a symbolic substitution and can be applied to any case statement (cf. [sent-121, score-0.253]

49 Case operations yield a combinatorial explosion in size if na¨vely implemented, hence we use the ı data structure of the extended algebraic decision diagram (XADD) [9] as shown in Figure 1 (right) to compactly represent case statements and efficiently support the above case operations with them. [sent-124, score-0.144]

50 2 VI for Hybrid State and Discrete Observations For H-POMDPs with only discrete observations o ∈ O and observation function p(o|xs ,ds , a) as in the form of Eq (4), we introduce a symbolic version of Monahan’s VI algorithm. [sent-126, score-0.532]

51 In brief, we note that all VI operations needed in Section 3 apply directly to H-POMDPs, e. [sent-127, score-0.034]

52 , δj r (xo ,do ))] h foreach ok ∈ O do // Let φok be the partition constraints for observation ok ∈ Oh p(Oh = ok |xs ,ds , a) := xo do p(xo ,do |xs ,ds , a)I[φok ]dxo return Oh , p(Oh |xs ,ds , a) end P(t s) (t o) P(o ) =0. [sent-132, score-0.8]

53 1D-Power Plant; (right) derived observation partitions for b2 at h = 2. [sent-141, score-0.222]

54 Crucially we note since the continuous transition cpfs p(xsj|xs ,ds ,ds ,a) are deterministic and hence defined with Dirac δ’s (e. [sent-142, score-0.176]

55 In short, nothing additional is required for PBVI on H-POMDPs in this case — the key insight is simply that α-functions are now represented by case statements and can “grow” with the horizon as they partition the state space more and more finely. [sent-146, score-0.293]

56 3 PBVI for Hybrid State and Hybrid Observations In general, it would be impossible to apply standard VI to H-POMDPs with continuous observations since the number of observations is infinite. [sent-148, score-0.357]

57 However, building on ideas in [2], in the case of PBVI, it is possible to derive a finite set of continuous observation partitions that permit exact point-based backups at a belief point. [sent-149, score-0.62]

58 This additional operation (GenRelObs) appears on line 10 of PBVI in Algorithm 1 in the case of continuous observations and is formally defined in Algorithm 2. [sent-150, score-0.271]

59 To demonstrate the generation of relevant continuous observation partitions, we use the second iteration of the Cont. [sent-151, score-0.328]

60 1D-Power Plant along with two belief points represented as uniform distributions: b1 : U (ts ; 2, 6) and b2 : U (ts ; 6, 11) as shown in Figure 2 (left). [sent-153, score-0.145]

61 1 ¬(ts −10 < to < ts ) : 0 We now need the α-vectors as a function of the observation space for a particular belief state, thus next we marginalize out xs ,ds in lines 5–7. [sent-157, score-1.046]

62 5to − 10 open close Both δ1 (to ) and δ1 (to ) are drawn graphically in Figure 2 (right). [sent-168, score-0.119]

63 These observationdependent δ’s divide the observation space into regions which can yield the optimal policy according to the belief state b2 . [sent-169, score-0.419]

64 Following [2], we need to find the optimal boundaries or partitions of the observation space; in their work, numerical solutions are proposed to find these boundaries in one dimension (multiple observations are handled through an independence assumption). [sent-170, score-0.331]

65 Instead, here we leverage the symbolic power of the casemax operator defined in Section 4. [sent-171, score-0.345]

66 1 to find all the partitions where each potentially correlated, multivariate observation δ is optimal. [sent-172, score-0.222]

67 For the two δ’s above, the following partitions of the observation space are derived by the casemax operator in line 9:   o1   o  1  close open casemax δ1 (to ), δ1 (to ) = o1  o  2    o2 : (14 < to ≤ 18) : 0. [sent-173, score-0.515]

68 5to − 10 close Here we have labeled with o1 the observations where δ1 is maximal and with o2 the observations open where δ1 is maximal. [sent-182, score-0.313]

69 This would associate with o1 the partition constraint φo1 ≡ (5. [sent-184, score-0.027]

70 1 < to ≤ 8) ∨ (8 < to ≤ 14) ∨ (14 < to ≤ 18) and with o2 the partition constraint φo2 ≡ (4 < to ≤ 5) ∨ (5 < to ≤ 5. [sent-185, score-0.027]

71 1) — taking into account the 0 partitions and the 1D nature of this example, we can further simplify φo1 ≡ (to > 5. [sent-186, score-0.091]

72 Given these relevant observation partitons, our final task in lines 10-12 is to compute the probabilities of each observation partition φok . [sent-189, score-0.372]

73 This is simply done by marginalizing over the observation function p(Oh |xs ,ds , a) within each region defined by φok (achieved by multiplying by an indicator function I[φok ] over these constraints). [sent-190, score-0.131]

74 In summary, in this section we have shown how we can extend the exact dynamic programming algorithm for the continuous state, discrete observation POMDP setting from Section 4. [sent-194, score-0.447]

75 Next we present some empirical results for 1- and 2-dimensional continuous state and observation spaces. [sent-196, score-0.368]

76 5 Empirical Results We evaluated our continuous POMDP solution using XADDs on the 1D-Power Plant example and another variant of this problem with two variables, described below. [sent-197, score-0.139]

77 3 2D-Power Plant: We consider the more complex model of the power plant similar to [1] where the pressure inside the water tank must be controlled to avoid mixing water into the steam (leading to explosion of the tank). [sent-198, score-0.548]

78 We model an observable pressure reading po as a function of the underlying pressure state ps . [sent-199, score-0.659]

79 Again we have two actions for opening and closing a pressure valve. [sent-200, score-0.197]

80 The close action has transition p(ps |ps , a = close) = δ ps − (p + 10 > 20) : 20 ¬(p + 10 > 20) : ps + 10 p(ts |ts , a = close) = δ ts − (ts + 10) 3 Full problem specifications and Java code to reproduce these experiments are available online in Google Code: http://code. [sent-201, score-0.668]

81 7 Power Plant 5 Power Plant 10 Number of Nodes Time(ms) 1 state & 1 observ var 2 state & 2 observ vars 4 10 3 10 2 10 1 2 3 4 5 6 70 1 state & 1 observ var 2 state & 2 observ vars 60 50 40 30 20 10 0 1 Horizon 2 3 4 Horizon 5 6 Figure 3: (left) time vs. [sent-204, score-0.714]

82 For the open reward function, we assume that there is always a small constant penalty (-1) since no electricity is produced. [sent-208, score-0.156]

83 Observations are distributed uniformly within a region depending on their underlying state: p(to |ts ) = (ts + 80 < to < ts + 105) ¬(ts + 80 < to < ts + 105) : 0. [sent-209, score-0.44]

84 1 ¬(ps < po < ps + 10) : 0 Finally for PBVI, we define two uniform beliefs as follows: b1 : U [ts ; 90, 100] ∗ U [ps ; 0, 10] and b2 : U [ts ; 90, 130] ∗ U [ps ; 10, 30] In Figure 3, a time and space analysis of the two versions of Power Plant have been performed for up to horizon h = 6. [sent-211, score-0.353]

85 Figure 3 shows that the computation time required per iteration generally increases since more complex α-functions lead to a larger number of observation partitions and thus a more expensive backup operation. [sent-214, score-0.321]

86 6 Conclusion We presented the first exact symbolic operations for PBVI in an expressive subset of H-POMDPs with continuous state and observations. [sent-216, score-0.513]

87 Unlike related work that has extended to the continuous state and observation setting [6], we do not approach the problem by sampling. [sent-217, score-0.368]

88 Rather, following [2], the key contribution of this work was to define a discrete set of observation partitions on the multivariate continuous observation space via symbolic maximization techniques and derive the related probabilities using symbolic integration. [sent-218, score-1.021]

89 An important avenue for future work is to extend these techniques to the case of continuous state, observation, and action H-POMDPs. [sent-219, score-0.169]

90 Solving pomdps with continuous or large discrete observation spaces. [sent-226, score-0.438]

91 Survey of partially observable markov decision processes: Theory, models, and algorithms. [sent-237, score-0.031]

92 Closing the gap: Improved bounds on optimal pomdp solutions. [sent-258, score-0.089]

93 Symbolic variable elimination for discrete and continuous graphical models. [sent-261, score-0.215]

94 Symbolic dynamic programming for discrete and continuous state mdps. [sent-264, score-0.388]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('xs', 0.523), ('pbvi', 0.398), ('ts', 0.22), ('symbolic', 0.216), ('xo', 0.175), ('ps', 0.172), ('plant', 0.169), ('pressure', 0.152), ('belief', 0.145), ('continuous', 0.139), ('observation', 0.131), ('ok', 0.122), ('observations', 0.109), ('oh', 0.107), ('bv', 0.104), ('dxs', 0.102), ('foreach', 0.101), ('casemax', 0.099), ('state', 0.098), ('pomdps', 0.092), ('bi', 0.092), ('partitions', 0.091), ('horizon', 0.09), ('pomdp', 0.089), ('backups', 0.088), ('steam', 0.083), ('xsj', 0.083), ('backup', 0.076), ('discrete', 0.076), ('temperature', 0.07), ('reward', 0.069), ('dsi', 0.066), ('observ', 0.066), ('oi', 0.064), ('eq', 0.063), ('piecewise', 0.062), ('ds', 0.059), ('open', 0.058), ('po', 0.054), ('statements', 0.053), ('genrelobs', 0.05), ('monahan', 0.05), ('policy', 0.045), ('reachable', 0.044), ('xoj', 0.044), ('hybrid', 0.043), ('sanner', 0.04), ('dynamic', 0.04), ('close', 0.037), ('vi', 0.037), ('transition', 0.037), ('beliefs', 0.037), ('programming', 0.035), ('relevant', 0.035), ('operations', 0.034), ('nicta', 0.033), ('fraunhofer', 0.033), ('valve', 0.033), ('xadd', 0.033), ('states', 0.033), ('scott', 0.032), ('logical', 0.032), ('doi', 0.031), ('water', 0.031), ('safe', 0.031), ('observable', 0.031), ('power', 0.03), ('inequalities', 0.03), ('action', 0.03), ('sdp', 0.03), ('transitions', 0.029), ('partitionings', 0.029), ('electricity', 0.029), ('spaan', 0.029), ('waterloo', 0.029), ('tank', 0.029), ('anu', 0.029), ('vars', 0.029), ('boolean', 0.029), ('canberra', 0.027), ('substitutions', 0.027), ('bonn', 0.027), ('lines', 0.027), ('partition', 0.027), ('exact', 0.026), ('poupart', 0.025), ('insight', 0.025), ('mdps', 0.024), ('plan', 0.024), ('graphically', 0.024), ('iteration', 0.023), ('operation', 0.023), ('explosion', 0.023), ('nite', 0.023), ('actions', 0.023), ('jair', 0.022), ('closing', 0.022), ('pascal', 0.022), ('rewards', 0.022), ('probabilities', 0.021)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999982 331 nips-2012-Symbolic Dynamic Programming for Continuous State and Observation POMDPs

Author: Zahra Zamani, Scott Sanner, Pascal Poupart, Kristian Kersting

Abstract: Point-based value iteration (PBVI) methods have proven extremely effective for finding (approximately) optimal dynamic programming solutions to partiallyobservable Markov decision processes (POMDPs) when a set of initial belief states is known. However, no PBVI work has provided exact point-based backups for both continuous state and observation spaces, which we tackle in this paper. Our key insight is that while there may be an infinite number of observations, there are only a finite number of continuous observation partitionings that are relevant for optimal decision-making when a finite, fixed set of reachable belief states is considered. To this end, we make two important contributions: (1) we show how previous exact symbolic dynamic programming solutions for continuous state MDPs can be generalized to continuous state POMDPs with discrete observations, and (2) we show how recently developed symbolic integration methods allow this solution to be extended to PBVI for continuous state and observation POMDPs with potentially correlated, multivariate continuous observation spaces. 1

2 0.27807015 147 nips-2012-Graphical Models via Generalized Linear Models

Author: Eunho Yang, Genevera Allen, Zhandong Liu, Pradeep K. Ravikumar

Abstract: Undirected graphical models, also known as Markov networks, enjoy popularity in a variety of applications. The popular instances of these models such as Gaussian Markov Random Fields (GMRFs), Ising models, and multinomial discrete models, however do not capture the characteristics of data in many settings. We introduce a new class of graphical models based on generalized linear models (GLMs) by assuming that node-wise conditional distributions arise from exponential families. Our models allow one to estimate multivariate Markov networks given any univariate exponential distribution, such as Poisson, negative binomial, and exponential, by fitting penalized GLMs to select the neighborhood for each node. A major contribution of this paper is the rigorous statistical analysis showing that with high probability, the neighborhood of our graphical models can be recovered exactly. We also provide examples of non-Gaussian high-throughput genomic networks learned via our GLM graphical models. 1

3 0.2426524 213 nips-2012-Minimization of Continuous Bethe Approximations: A Positive Variation

Author: Jason Pacheco, Erik B. Sudderth

Abstract: We develop convergent minimization algorithms for Bethe variational approximations which explicitly constrain marginal estimates to families of valid distributions. While existing message passing algorithms define fixed point iterations corresponding to stationary points of the Bethe free energy, their greedy dynamics do not distinguish between local minima and maxima, and can fail to converge. For continuous estimation problems, this instability is linked to the creation of invalid marginal estimates, such as Gaussians with negative variance. Conversely, our approach leverages multiplier methods with well-understood convergence properties, and uses bound projection methods to ensure that marginal approximations are valid at all iterations. We derive general algorithms for discrete and Gaussian pairwise Markov random fields, showing improvements over standard loopy belief propagation. We also apply our method to a hybrid model with both discrete and continuous variables, showing improvements over expectation propagation. 1

4 0.12499874 326 nips-2012-Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses

Author: Po-ling Loh, Martin J. Wainwright

Abstract: We investigate a curious relationship between the structure of a discrete graphical model and the support of the inverse of a generalized covariance matrix. We show that for certain graph structures, the support of the inverse covariance matrix of indicator variables on the vertices of a graph reflects the conditional independence structure of the graph. Our work extends results that have previously been established only in the context of multivariate Gaussian graphical models, thereby addressing an open question about the significance of the inverse covariance matrix of a non-Gaussian distribution. Based on our population-level results, we show how the graphical Lasso may be used to recover the edge structure of certain classes of discrete graphical models, and present simulations to verify our theoretical results. 1

5 0.1185086 88 nips-2012-Cost-Sensitive Exploration in Bayesian Reinforcement Learning

Author: Dongho Kim, Kee-eung Kim, Pascal Poupart

Abstract: In this paper, we consider Bayesian reinforcement learning (BRL) where actions incur costs in addition to rewards, and thus exploration has to be constrained in terms of the expected total cost while learning to maximize the expected longterm total reward. In order to formalize cost-sensitive exploration, we use the constrained Markov decision process (CMDP) as the model of the environment, in which we can naturally encode exploration requirements using the cost function. We extend BEETLE, a model-based BRL method, for learning in the environment with cost constraints. We demonstrate the cost-sensitive exploration behaviour in a number of simulated problems. 1

6 0.094964027 153 nips-2012-How Prior Probability Influences Decision Making: A Unifying Probabilistic Model

7 0.082506314 33 nips-2012-Active Learning of Model Evidence Using Bayesian Quadrature

8 0.076649547 344 nips-2012-Timely Object Recognition

9 0.073956423 186 nips-2012-Learning as MAP Inference in Discrete Graphical Models

10 0.073505431 259 nips-2012-Online Regret Bounds for Undiscounted Continuous Reinforcement Learning

11 0.071763165 149 nips-2012-Hierarchical Optimistic Region Selection driven by Curiosity

12 0.067293368 173 nips-2012-Learned Prioritization for Trading Off Accuracy and Speed

13 0.064683586 38 nips-2012-Algorithms for Learning Markov Field Policies

14 0.064631812 122 nips-2012-Exploration in Model-based Reinforcement Learning by Empirically Estimating Learning Progress

15 0.063810177 348 nips-2012-Tractable Objectives for Robust Policy Optimization

16 0.06307523 96 nips-2012-Density Propagation and Improved Bounds on the Partition Function

17 0.06256628 14 nips-2012-A P300 BCI for the Masses: Prior Information Enables Instant Unsupervised Spelling

18 0.061178941 162 nips-2012-Inverse Reinforcement Learning through Structured Classification

19 0.059818134 270 nips-2012-Phoneme Classification using Constrained Variational Gaussian Process Dynamical System

20 0.05829259 108 nips-2012-Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.135), (1, -0.12), (2, 0.023), (3, -0.022), (4, -0.068), (5, 0.006), (6, -0.005), (7, -0.119), (8, -0.108), (9, -0.008), (10, -0.032), (11, -0.008), (12, 0.045), (13, 0.067), (14, -0.074), (15, -0.058), (16, 0.011), (17, -0.037), (18, -0.106), (19, 0.13), (20, 0.069), (21, -0.091), (22, -0.225), (23, -0.045), (24, 0.068), (25, 0.121), (26, 0.205), (27, 0.056), (28, -0.041), (29, 0.049), (30, -0.007), (31, 0.076), (32, 0.141), (33, -0.008), (34, 0.027), (35, 0.149), (36, 0.076), (37, 0.117), (38, -0.004), (39, 0.037), (40, 0.085), (41, 0.012), (42, 0.112), (43, 0.033), (44, 0.015), (45, -0.119), (46, 0.087), (47, -0.065), (48, -0.002), (49, 0.074)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96516865 331 nips-2012-Symbolic Dynamic Programming for Continuous State and Observation POMDPs

Author: Zahra Zamani, Scott Sanner, Pascal Poupart, Kristian Kersting

Abstract: Point-based value iteration (PBVI) methods have proven extremely effective for finding (approximately) optimal dynamic programming solutions to partiallyobservable Markov decision processes (POMDPs) when a set of initial belief states is known. However, no PBVI work has provided exact point-based backups for both continuous state and observation spaces, which we tackle in this paper. Our key insight is that while there may be an infinite number of observations, there are only a finite number of continuous observation partitionings that are relevant for optimal decision-making when a finite, fixed set of reachable belief states is considered. To this end, we make two important contributions: (1) we show how previous exact symbolic dynamic programming solutions for continuous state MDPs can be generalized to continuous state POMDPs with discrete observations, and (2) we show how recently developed symbolic integration methods allow this solution to be extended to PBVI for continuous state and observation POMDPs with potentially correlated, multivariate continuous observation spaces. 1

2 0.75507098 213 nips-2012-Minimization of Continuous Bethe Approximations: A Positive Variation

Author: Jason Pacheco, Erik B. Sudderth

Abstract: We develop convergent minimization algorithms for Bethe variational approximations which explicitly constrain marginal estimates to families of valid distributions. While existing message passing algorithms define fixed point iterations corresponding to stationary points of the Bethe free energy, their greedy dynamics do not distinguish between local minima and maxima, and can fail to converge. For continuous estimation problems, this instability is linked to the creation of invalid marginal estimates, such as Gaussians with negative variance. Conversely, our approach leverages multiplier methods with well-understood convergence properties, and uses bound projection methods to ensure that marginal approximations are valid at all iterations. We derive general algorithms for discrete and Gaussian pairwise Markov random fields, showing improvements over standard loopy belief propagation. We also apply our method to a hybrid model with both discrete and continuous variables, showing improvements over expectation propagation. 1

3 0.71170229 147 nips-2012-Graphical Models via Generalized Linear Models

Author: Eunho Yang, Genevera Allen, Zhandong Liu, Pradeep K. Ravikumar

Abstract: Undirected graphical models, also known as Markov networks, enjoy popularity in a variety of applications. The popular instances of these models such as Gaussian Markov Random Fields (GMRFs), Ising models, and multinomial discrete models, however do not capture the characteristics of data in many settings. We introduce a new class of graphical models based on generalized linear models (GLMs) by assuming that node-wise conditional distributions arise from exponential families. Our models allow one to estimate multivariate Markov networks given any univariate exponential distribution, such as Poisson, negative binomial, and exponential, by fitting penalized GLMs to select the neighborhood for each node. A major contribution of this paper is the rigorous statistical analysis showing that with high probability, the neighborhood of our graphical models can be recovered exactly. We also provide examples of non-Gaussian high-throughput genomic networks learned via our GLM graphical models. 1

4 0.46039903 88 nips-2012-Cost-Sensitive Exploration in Bayesian Reinforcement Learning

Author: Dongho Kim, Kee-eung Kim, Pascal Poupart

Abstract: In this paper, we consider Bayesian reinforcement learning (BRL) where actions incur costs in addition to rewards, and thus exploration has to be constrained in terms of the expected total cost while learning to maximize the expected longterm total reward. In order to formalize cost-sensitive exploration, we use the constrained Markov decision process (CMDP) as the model of the environment, in which we can naturally encode exploration requirements using the cost function. We extend BEETLE, a model-based BRL method, for learning in the environment with cost constraints. We demonstrate the cost-sensitive exploration behaviour in a number of simulated problems. 1

5 0.42713225 326 nips-2012-Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses

Author: Po-ling Loh, Martin J. Wainwright

Abstract: We investigate a curious relationship between the structure of a discrete graphical model and the support of the inverse of a generalized covariance matrix. We show that for certain graph structures, the support of the inverse covariance matrix of indicator variables on the vertices of a graph reflects the conditional independence structure of the graph. Our work extends results that have previously been established only in the context of multivariate Gaussian graphical models, thereby addressing an open question about the significance of the inverse covariance matrix of a non-Gaussian distribution. Based on our population-level results, we show how the graphical Lasso may be used to recover the edge structure of certain classes of discrete graphical models, and present simulations to verify our theoretical results. 1

6 0.39948991 122 nips-2012-Exploration in Model-based Reinforcement Learning by Empirically Estimating Learning Progress

7 0.39155969 96 nips-2012-Density Propagation and Improved Bounds on the Partition Function

8 0.38603437 350 nips-2012-Trajectory-Based Short-Sighted Probabilistic Planning

9 0.37659389 259 nips-2012-Online Regret Bounds for Undiscounted Continuous Reinforcement Learning

10 0.3558026 353 nips-2012-Transferring Expectations in Model-based Reinforcement Learning

11 0.3500967 108 nips-2012-Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search

12 0.34872779 153 nips-2012-How Prior Probability Influences Decision Making: A Unifying Probabilistic Model

13 0.34021917 335 nips-2012-The Bethe Partition Function of Log-supermodular Graphical Models

14 0.32375023 31 nips-2012-Action-Model Based Multi-agent Plan Recognition

15 0.31454566 14 nips-2012-A P300 BCI for the Masses: Prior Information Enables Instant Unsupervised Spelling

16 0.30601314 288 nips-2012-Rational inference of relative preferences

17 0.29661408 161 nips-2012-Interpreting prediction markets: a stochastic approach

18 0.28693449 51 nips-2012-Bayesian Hierarchical Reinforcement Learning

19 0.27727515 53 nips-2012-Bayesian Pedigree Analysis using Measure Factorization

20 0.2766422 248 nips-2012-Nonparanormal Belief Propagation (NPNBP)


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.018), (21, 0.011), (27, 0.012), (38, 0.075), (42, 0.014), (54, 0.516), (55, 0.015), (74, 0.028), (76, 0.092), (80, 0.074), (92, 0.04)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.87340355 331 nips-2012-Symbolic Dynamic Programming for Continuous State and Observation POMDPs

Author: Zahra Zamani, Scott Sanner, Pascal Poupart, Kristian Kersting

Abstract: Point-based value iteration (PBVI) methods have proven extremely effective for finding (approximately) optimal dynamic programming solutions to partiallyobservable Markov decision processes (POMDPs) when a set of initial belief states is known. However, no PBVI work has provided exact point-based backups for both continuous state and observation spaces, which we tackle in this paper. Our key insight is that while there may be an infinite number of observations, there are only a finite number of continuous observation partitionings that are relevant for optimal decision-making when a finite, fixed set of reachable belief states is considered. To this end, we make two important contributions: (1) we show how previous exact symbolic dynamic programming solutions for continuous state MDPs can be generalized to continuous state POMDPs with discrete observations, and (2) we show how recently developed symbolic integration methods allow this solution to be extended to PBVI for continuous state and observation POMDPs with potentially correlated, multivariate continuous observation spaces. 1

2 0.71582949 177 nips-2012-Learning Invariant Representations of Molecules for Atomization Energy Prediction

Author: Grégoire Montavon, Katja Hansen, Siamac Fazli, Matthias Rupp, Franziska Biegler, Andreas Ziehe, Alexandre Tkatchenko, Anatole V. Lilienfeld, Klaus-Robert Müller

Abstract: The accurate prediction of molecular energetics in chemical compound space is a crucial ingredient for rational compound design. The inherently graph-like, non-vectorial nature of molecular data gives rise to a unique and difficult machine learning problem. In this paper, we adopt a learning-from-scratch approach where quantum-mechanical molecular energies are predicted directly from the raw molecular geometry. The study suggests a benefit from setting flexible priors and enforcing invariance stochastically rather than structurally. Our results improve the state-of-the-art by a factor of almost three, bringing statistical methods one step closer to chemical accuracy. 1

3 0.67801708 115 nips-2012-Efficient high dimensional maximum entropy modeling via symmetric partition functions

Author: Paul Vernaza, Drew Bagnell

Abstract: Maximum entropy (MaxEnt) modeling is a popular choice for sequence analysis in applications such as natural language processing, where the sequences are embedded in discrete, tractably-sized spaces. We consider the problem of applying MaxEnt to distributions over paths in continuous spaces of high dimensionality— a problem for which inference is generally intractable. Our main contribution is to show that this intractability can be avoided as long as the constrained features possess a certain kind of low dimensional structure. In this case, we show that the associated partition function is symmetric and that this symmetry can be exploited to compute the partition function efficiently in a compressed form. Empirical results are given showing an application of our method to learning models of high-dimensional human motion capture data. 1

4 0.65608883 70 nips-2012-Clustering by Nonnegative Matrix Factorization Using Graph Random Walk

Author: Zhirong Yang, Tele Hao, Onur Dikmen, Xi Chen, Erkki Oja

Abstract: Nonnegative Matrix Factorization (NMF) is a promising relaxation technique for clustering analysis. However, conventional NMF methods that directly approximate the pairwise similarities using the least square error often yield mediocre performance for data in curved manifolds because they can capture only the immediate similarities between data samples. Here we propose a new NMF clustering method which replaces the approximated matrix with its smoothed version using random walk. Our method can thus accommodate farther relationships between data samples. Furthermore, we introduce a novel regularization in the proposed objective function in order to improve over spectral clustering. The new learning objective is optimized by a multiplicative Majorization-Minimization algorithm with a scalable implementation for learning the factorizing matrix. Extensive experimental results on real-world datasets show that our method has strong performance in terms of cluster purity. 1

5 0.63765556 344 nips-2012-Timely Object Recognition

Author: Sergey Karayev, Tobias Baumgartner, Mario Fritz, Trevor Darrell

Abstract: In a large visual multi-class detection framework, the timeliness of results can be crucial. Our method for timely multi-class detection aims to give the best possible performance at any single point after a start time; it is terminated at a deadline time. Toward this goal, we formulate a dynamic, closed-loop policy that infers the contents of the image in order to decide which detector to deploy next. In contrast to previous work, our method significantly diverges from the predominant greedy strategies, and is able to learn to take actions with deferred values. We evaluate our method with a novel timeliness measure, computed as the area under an Average Precision vs. Time curve. Experiments are conducted on the PASCAL VOC object detection dataset. If execution is stopped when only half the detectors have been run, our method obtains 66% better AP than a random ordering, and 14% better performance than an intelligent baseline. On the timeliness measure, our method obtains at least 11% better performance. Our method is easily extensible, as it treats detectors and classifiers as black boxes and learns from execution traces using reinforcement learning. 1

6 0.62376702 287 nips-2012-Random function priors for exchangeable arrays with applications to graphs and relational data

7 0.516505 88 nips-2012-Cost-Sensitive Exploration in Bayesian Reinforcement Learning

8 0.51241422 173 nips-2012-Learned Prioritization for Trading Off Accuracy and Speed

9 0.47699496 259 nips-2012-Online Regret Bounds for Undiscounted Continuous Reinforcement Learning

10 0.4693231 108 nips-2012-Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search

11 0.44522002 162 nips-2012-Inverse Reinforcement Learning through Structured Classification

12 0.43928462 153 nips-2012-How Prior Probability Influences Decision Making: A Unifying Probabilistic Model

13 0.43922892 38 nips-2012-Algorithms for Learning Markov Field Policies

14 0.4362562 245 nips-2012-Nonparametric Bayesian Inverse Reinforcement Learning for Multiple Reward Functions

15 0.42778766 122 nips-2012-Exploration in Model-based Reinforcement Learning by Empirically Estimating Learning Progress

16 0.42744222 51 nips-2012-Bayesian Hierarchical Reinforcement Learning

17 0.42686766 353 nips-2012-Transferring Expectations in Model-based Reinforcement Learning

18 0.42627725 160 nips-2012-Imitation Learning by Coaching

19 0.42343712 348 nips-2012-Tractable Objectives for Robust Policy Optimization

20 0.41971388 364 nips-2012-Weighted Likelihood Policy Search with Model Selection