nips nips2009 nips2009-53 knowledge-graph by maker-knowledge-mining

53 nips-2009-Complexity of Decentralized Control: Special Cases


Source: pdf

Author: Martin Allen, Shlomo Zilberstein

Abstract: The worst-case complexity of general decentralized POMDPs, which are equivalent to partially observable stochastic games (POSGs) is very high, both for the cooperative and competitive cases. Some reductions in complexity have been achieved by exploiting independence relations in some models. We show that these results are somewhat limited: when these independence assumptions are relaxed in very small ways, complexity returns to that of the general case. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract The worst-case complexity of general decentralized POMDPs, which are equivalent to partially observable stochastic games (POSGs) is very high, both for the cooperative and competitive cases. [sent-5, score-0.656]

2 1 Introduction Decentralized and partially observable stochastic decision and planning problems are very common, comprising anything from strategic games of chance to robotic space exploration. [sent-8, score-0.274]

3 In such domains, multiple agents act under uncertainty about both their environment and the plans and actions of others. [sent-9, score-0.447]

4 These problems can be represented as decentralized partially observable Markov decision processes (Dec-POMDPs), or the equivalent, partially observable stochastic games (POSGs), allowing for precise formulation of solution concepts and success criteria. [sent-10, score-0.74]

5 The complexity of finite-horizon Dec-POMDPs goes down—from NEXP to NP—when agents interact only via a joint reward structure, and are otherwise independent. [sent-16, score-0.561]

6 Further, we consider two other Dec-POMDP sub-classes from the literature: (a) domains where local agent sub-problems are independent except for a (relatively small) number of event-based interactions, and (b) those where agents only interact influencing the set of currently available actions. [sent-19, score-0.936]

7 ) These results provide further impetus to devise new tools for the analysis and classification of problem difficulty in decentralized problem solving. [sent-22, score-0.407]

8 2 Basic definitions The cooperative, decentralized partially observable Markov decision process (Dec-POMDP) is a highly general and powerful framework, capable of representing a wide range of real-world problem 1 domains. [sent-23, score-0.583]

9 , an , s ) is the probability of going from state s to state s , given joint action a1 , . [sent-31, score-0.269]

10 , an ) is the reward obtained for performing joint action a1 , . [sent-53, score-0.255]

11 The most important sub-instance of the Dec-POMDP model is the decentralized MDP (Dec-MDP), where the joint observation tells us everything we need to know about the system state. [sent-57, score-0.509]

12 A decentralized Markov decision process (Dec-MDP) is a Dec-POMDP that is jointly fully observable. [sent-59, score-0.499]

13 In a Dec-MDP, then, the sum total of the individual agent observations provides a complete picture of the state of the environment. [sent-70, score-0.536]

14 It is important to note, however, that this does not mean that any individual agent actually possesses this information. [sent-71, score-0.459]

15 Dec-MDPs are still fully decentralized in general, and individual agents cannot count on access to the global state when choosing actions. [sent-72, score-0.861]

16 A local policy for an agent αi is a mapping from sequences of that agent’s observations, oi = o1 , . [sent-74, score-0.717]

17 A joint policy for n agents is a i i collection of local policies, one per agent, π = π1 , . [sent-78, score-0.483]

18 A solution method for a decentralized problem seeks to find some joint policy that maximizes expected value given the starting state (or distribution over states) of the problem. [sent-82, score-0.596]

19 For complexity purposes, the decision version of the Dec-(PO)MDP problem is to determine whether there exists some joint policy with value greater at least k. [sent-83, score-0.226]

20 A tiling is a mapping of board locations to tile-types, t : {0, . [sent-95, score-0.251]

21 , n − 1} → L; such a tiling is consistent just in case (i) the origin location of the board receives tile-type 0 (t(0, 0) = tile0 ); and (ii) all adjoint tile assignments are compatible: (∀x, y) t(x, y), t(x + 1, y) ∈ H & t(x, y), t(x, y + 1) ∈ V. [sent-101, score-0.282]

22 The reduction transforms a given instance of TILING into a 2-agent Dec-MDP, where each agent is queried about some location in the grid, and must answer with a tile to be placed there. [sent-105, score-0.494]

23 By careful design of the query and response mechanism, it is ensured that a policy with non-negative value exists only if the agents already have a consistent tiling, thus showing the Dec-MDP to be as hard as TILING. [sent-106, score-0.359]

24 4 Factored Dec-POMDPs and independence In general, the state transitions, observations, and rewards in a Dec-POMDP can involve probabilistic dependencies between agents. [sent-109, score-0.389]

25 [6, 7] have thus studied problems in which the global statespace consists of the product of local states, so that each agent has its own individual state-space. [sent-112, score-0.632]

26 A Dec-POMDP can then be transition independent, observation independent, or reward independent, as each the local effects given by each corresponding function are independent of one another. [sent-113, score-0.376]

27 A factored, n-agent Dec-POMDP is a Dec-POMDP such that the system state can be factored into n + 1 distinct components, so that S = S0 × S1 × · · · × Sn , and no state-variable appears in any Si , Sj , i = j. [sent-115, score-0.26]

28 As with the local (agent-specific) actions, ai , and observations, oi , in the general Dec-POMDP definition, we now refer to the local state, s ∈ Si × S0 , namely that portion of the overall stateˆ space that is either specific to agent αi (si ∈ Si ), or shared among all agents (so ∈ S0 ). [sent-116, score-1.14]

29 We use the notation s−i for the sequence of all state-components except that for agent αi : s−i = (s0 , s1 , . [sent-117, score-0.431]

30 A factored, n-agent DEC-POMDP is transition independent iff the state-transition function can be separated into n + 1 distinct transition functions P0 , . [sent-125, score-0.221]

31 , Pn , where, for any next state si ∈ Si , P (si | (s0 , . [sent-128, score-0.236]

32 , an ), s−i ) = P0 (s0 | s0 ) if i = 0; Pi (si | si , ai , s0 ) else. [sent-134, score-0.275]

33 ˆ In other words, the next local state of each agent is independent of the local states of all others, given its previous local state and local action, and the external system features (S0 ). [sent-135, score-1.106]

34 , On , where, for any local observation oi ∈ Ωi , O(oi | (a1 , . [sent-140, score-0.25]

35 , sn ), o−i ) = Oi (oi | ai , si ) ˆ In such cases, the probability of an agent’s individual observations is a function of their own local states and actions alone, independent of the states of others, and of what those others do or observe. [sent-146, score-0.794]

36 A factored, n-agent Dec-POMDP is reward independent iff the joint reward function can be represented by local reward functions R1 , . [sent-148, score-0.648]

37 , Rn (ˆn , an )) s s and Ri (ˆi , ai ) ≥ Ri (ˆi , ai ) ⇔ f (R1 , . [sent-160, score-0.232]

38 , Rn ) s s s s That is, joint reward is a function of local reward, constrained so that we maximize global reward if and only if we maximize local rewards. [sent-172, score-0.569]

39 s s It is important to note that each definition applies equally to Dec-MDPs; in such cases, joint full observability of the overall state is often accompanied by full observability at the local level. [sent-180, score-0.577]

40 A factored, n-agent Dec-MDP is locally fully observable iff an agent’s local observation uniquely determines its local state: ∀oi ∈ Ωi , ∃ˆi : P (ˆi | oi ) = 1. [sent-182, score-0.519]

41 s s Local full observability is not equivalent to independence of observations. [sent-183, score-0.237]

42 In particular, a problem may be locally fully observable without being observation independent (since agents may simply observe outcomes of non-independent joint actions). [sent-184, score-0.507]

43 1 Shared rewards alone lead to reduced complexity It is easy to see that if a Dec-MDP (or Dec-POMDP) has all three forms of independence given by Definitions 5–7, it can be decomposed into n separate problems, where each agent αi works solely within the local sub-environment Si × S0 . [sent-187, score-0.823]

44 ) Thus we see that in some respects, transition and observation independence are fundamental to the reduction of worst-case complexity from NEXP to NP. [sent-209, score-0.269]

45 When only the rewards depend upon the actions of both agents, the problems become easier; however, when the situation is reversed, 4 the general problem remains NEXP-hard. [sent-210, score-0.425]

46 The structure of rewards, while obviously key to the nature of the optimal (or otherwise) solution, is not as vital—even if agents can separate their individual reward-functions, making them entirely independent, other dependencies can still make the problem extremely complex. [sent-212, score-0.381]

47 We therefore turn to two other interesting special-case Dec-MDP frameworks, in which independent reward functions are accompanied by restricted degrees of transition- and observation-based interaction. [sent-213, score-0.232]

48 Such events can be thought of as occasions upon which that agent takes the given action to generate the associated state transition. [sent-220, score-0.764]

49 Dependencies are then introduced in the form of relationships between one agent’s possible actions in given states and another agent’s primitive events. [sent-221, score-0.419]

50 While no precise worst-case complexity results have been previously proven, the authors do point out that the class of problems has an upper-bound deterministic complexity that is exponential in the size of the state space, |S|, and doubly exponential in the number of defined interactions. [sent-222, score-0.234]

51 A history for an agent αi in a factored, n-agent Dec-POMDP D is a sequence of possible local states and actions, beginning in the agent’s initial state: Φi = [ˆ0 , a0 , s1 , a1 , . [sent-228, score-0.675]

52 A primitive event e = (ˆi , ai , si ) for an agent αi is a triple s ˆ representing a transition between two local states, given some action ai ∈ Ai . [sent-237, score-1.314]

53 A primitive event e occurs in the history Φi , written Φi e, if and only if the triple e is a sub-sequence of the sequence Φi . [sent-242, score-0.367]

54 An event E occurs in the history Φi , written Φi E, if and only if some component occurs in that history: ∃e ∈ E : Φi e. [sent-243, score-0.206]

55 If the historical sequence of state-action transitions that the agent encounters contains any one of those particular transitions, then the history satisfies the overall event. [sent-246, score-0.577]

56 Events can thus be used, for example, to represent such things as taking a particular action in any one of a number of states over time, or taking one of several actions at some particular state. [sent-247, score-0.299]

57 A primitive event e is proper if it occurs at most once in any given history. [sent-250, score-0.33]

58 An event E is proper if it consists of proper primitive events that are mutually i exclusive, in that no two of them both occur in any history: ∀Φi ¬∃x, y : (x = y) ∧ (ex ∈ E) ∧ (ey ∈ E) ∧ (Φi ex ) ∧ (Φi ey ). [sent-252, score-0.482]

59 Proper primitive events can be used, for instance, to represent actions that take place at particular times (building the time into the local state si ∈ e). [sent-253, score-0.828]

60 Since any given point in time can only occur ˆ once in any history, the events involving such time-steps will be proper by default. [sent-254, score-0.205]

61 A proper event 5 E can then be formed by collecting all the primitive events involving some single time-step, or by taking all possible primitive events involving an unrepeatable action. [sent-255, score-0.799]

62 Local full observability: each agent αi can determine its own portion of the state-space, si ∈ S0 × Si , exactly. [sent-261, score-0.624]

63 s s Interactions between agents are given in terms of a set of dependencies between certain state-action transitions for one agent, and events featuring transitions involving the other agent. [sent-264, score-0.623]

64 Thus, if a history contains one of the primitive events from the latter set, this can have some direct effect upon the transition-model for the first agent, introducing probabilistic transition-dependencies. [sent-265, score-0.464]

65 ij ij Such a dependency is thus a collection of possible actions that agent αj can take in one of its local state, each of which depends upon whether the other agent αi has made one of the state-transitions in its own set of primitive events. [sent-268, score-1.437]

66 Such structures can be used to model, for instance, cases where one agent cannot successfully complete some task until the other agent has completed an enabling sub-task, or where the precise outcome depends upon the groundwork laid by the other agent. [sent-269, score-0.966]

67 A dependency dk = Ei , Dj is satisfied when the ij k current history for enabling agent αi contains the relevant event: Φi Ei . [sent-271, score-0.654]

68 For any state-action pair sj , aj , we define a Boolean indicator variable bsj aj , which is true if and only if some dependency ˆ ˆ that contains the pair is satisfied: bsj aj = ˆ 1 0 k k k if (∃ dk = Ei , Dj ) sj , aj ∈ Dj & Φi ˆ ij otherwise. [sent-272, score-0.691]

69 The transition function for our Dec-MDP is factored into two functions, P1 and P2 , each defining the distribution over next possible local states: Pi (ˆi | si , ai , bsi ai ). [sent-275, score-0.756]

70 We can thus write Pi (ˆi , ai , bsi ai , si ) for this transition probability. [sent-276, score-0.498]

71 s ˆ s ˆ ˆ ˆ When agents take some action in a state for which dependencies exist, they observe whether or not the related events have occurred; that is, after taking any action aj in state sj , they can observe the state of indicator variable bsj aj . [sent-277, score-1.144]

72 Factored, finite-horizon, n-agent Dec-MDPs with local full observability, independent rewards, and event-driven interactions are NEXP-complete. [sent-280, score-0.225]

73 Local reward independence, which was not present in the original problem, is ensured by using event dependencies to affect future rewards of the other agent. [sent-284, score-0.442]

74 Thus, local immediate rewards remain dependent only upon the actions of the individual agent, but the state in which that agent finds itself (and so the options available to its reward function) can depend upon events involving the other agent. [sent-285, score-1.411]

75 ) 1 The model can be extended to n agents with little real difficulty. [sent-288, score-0.27]

76 A factored, finite-horizon, n-agent Dec-MDP with local full observability, independent rewards, and event-driven interactions are solvable in nondeterministic polynomial time (NP) if the number of dependencies is O(log |S|), where S is the state-set of the problem. [sent-297, score-0.339]

77 First, NEXP-completeness of the event-based case, even with independent rewards and local full observability (Theorem 5), means that many interesting problems are potentially intractable. [sent-307, score-0.484]

78 Second, isolating where complexity is lower can help determine what task structures and agent interrelationships lead to intractability. [sent-310, score-0.493]

79 Such problems are not wholly decoupled, however, as the actions available to each agent at any point depend upon the global system state. [sent-316, score-0.823]

80 Thus, agents interact by making choices that restrict or broaden the range of actions available to others. [sent-317, score-0.514]

81 • Each Bi : S → 2Ai is a mapping from global states of the system to some set of available actions for each agent αi . [sent-320, score-0.754]

82 • Ri : (S0 × Si ) → is a local reward function for agent αi . [sent-324, score-0.672]

83 We let the global reward function be the sum of local rewards. [sent-325, score-0.28]

84 Note that there need be no observations in such a problem; given local full observability, each agent observes only its local states. [sent-326, score-0.667]

85 Furthermore, it is presumed that each agent can observe its own available actions in any state; a local policy is thus a mapping from local states to available actions. [sent-327, score-0.981]

86 Again, we “record” actions of each agent in the state-space of the other, ensuring purely local rewards and local full observability. [sent-337, score-0.997]

87 That is, we add special state-dependent actions that, based on their availability (or lack thereof), affect each agent’s local reward. [sent-339, score-0.278]

88 1 Discussion of the result Guo and Lesser [16, 14] were able to show that deciding whether a decentralized problem with state-based actions had an equilibrium solution with value greater than k was NP-hard. [sent-344, score-0.584]

89 Such decentralized problems indeed appear to be quite simple in structure, requiring wholly independent rewards and action-transitions, so that agents can only interact with one another via choices that affect which actions are available. [sent-349, score-1.146]

90 (A typical example involves two persons acting completely regardless of one another, except for the existence of a single rowboat, used for crossing a stream; if either agent uses the rowboat to get to the other side, then that action is no longer available to the other. [sent-350, score-0.56]

91 At the same time, however, our results show that the same structures can be intractable in the worst case, establishing that even seemingly simple interactions between agents can lead to prohibitively high complexity in decentralized problems. [sent-352, score-0.838]

92 6 Conclusions This work addresses a number of existing models for decentralized problem-solving. [sent-353, score-0.407]

93 In each case, the models restrict agent interaction in some way, in order to produce a special sub-case of the general Dec-POMDP problem. [sent-354, score-0.431]

94 It has been known for some time that systems where agents act entirely independently, but share rewards, have reduced worst-case complexity. [sent-355, score-0.27]

95 This fact, combined with results showing many other decentralized problem models to be equivalent to the general Dec-POMDP model, or strictly harder [17], reveals the essential difficulty of optimal planning in decentralized settings. [sent-358, score-0.904]

96 Not all decentralized domains are going to be intractable, and indeed the event-based and action-set models have been shown to yield to specialized solution methods in many cases, enabling us to solve interesting instances in reasonable amounts of time. [sent-361, score-0.48]

97 When the number of actiondependencies is small, or there are few ways that agents can affect available action-sets, it may well be possible to provide optimal solutions effectively. [sent-362, score-0.296]

98 The complexity of decentralized control of Markov decision processes. [sent-373, score-0.521]

99 The complexity of decentralized control of Markov decision processes. [sent-377, score-0.521]

100 Formal models and algorithms for decentralized decision making under uncertainty. [sent-451, score-0.459]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('agent', 0.431), ('decentralized', 0.407), ('agents', 0.27), ('tiling', 0.204), ('primitive', 0.187), ('actions', 0.177), ('si', 0.159), ('factored', 0.157), ('rewards', 0.153), ('shlomo', 0.141), ('reward', 0.14), ('observability', 0.127), ('events', 0.127), ('oi', 0.121), ('ai', 0.116), ('local', 0.101), ('lesser', 0.094), ('aj', 0.09), ('history', 0.088), ('observable', 0.085), ('dependencies', 0.083), ('becker', 0.078), ('state', 0.077), ('independence', 0.076), ('victor', 0.072), ('nexp', 0.072), ('bernstein', 0.071), ('transition', 0.071), ('sn', 0.067), ('action', 0.067), ('dj', 0.066), ('event', 0.066), ('sj', 0.065), ('planning', 0.065), ('policy', 0.064), ('autonomous', 0.064), ('cooperative', 0.063), ('zilberstein', 0.063), ('upon', 0.062), ('complexity', 0.062), ('transitions', 0.058), ('guo', 0.056), ('states', 0.055), ('supplemental', 0.054), ('interactions', 0.054), ('anyuan', 0.054), ('bsj', 0.054), ('claudia', 0.054), ('raphen', 0.054), ('decision', 0.052), ('proper', 0.051), ('ei', 0.051), ('materials', 0.048), ('dependency', 0.048), ('joint', 0.048), ('board', 0.047), ('establishing', 0.045), ('dk', 0.045), ('nition', 0.044), ('iff', 0.043), ('enabling', 0.042), ('pi', 0.041), ('interact', 0.041), ('multiagent', 0.04), ('fully', 0.04), ('partially', 0.039), ('global', 0.039), ('massachusetts', 0.037), ('independent', 0.036), ('bsi', 0.036), ('decker', 0.036), ('goldsmith', 0.036), ('posgs', 0.036), ('rowboat', 0.036), ('taems', 0.036), ('full', 0.034), ('amherst', 0.034), ('problems', 0.033), ('reduction', 0.032), ('domains', 0.031), ('solvable', 0.031), ('tile', 0.031), ('ri', 0.031), ('bound', 0.031), ('theorem', 0.031), ('np', 0.029), ('wholly', 0.029), ('vital', 0.029), ('accompanied', 0.029), ('observation', 0.028), ('individual', 0.028), ('neil', 0.027), ('restricted', 0.027), ('involving', 0.027), ('available', 0.026), ('system', 0.026), ('occurs', 0.026), ('martin', 0.025), ('nitions', 0.025), ('showing', 0.025)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000008 53 nips-2009-Complexity of Decentralized Control: Special Cases

Author: Martin Allen, Shlomo Zilberstein

Abstract: The worst-case complexity of general decentralized POMDPs, which are equivalent to partially observable stochastic games (POSGs) is very high, both for the cooperative and competitive cases. Some reductions in complexity have been achieved by exploiting independence relations in some models. We show that these results are somewhat limited: when these independence assumptions are relaxed in very small ways, complexity returns to that of the general case. 1

2 0.38516298 242 nips-2009-The Infinite Partially Observable Markov Decision Process

Author: Finale Doshi-velez

Abstract: The Partially Observable Markov Decision Process (POMDP) framework has proven useful in planning domains where agents must balance actions that provide knowledge and actions that provide reward. Unfortunately, most POMDPs are complex structures with a large number of parameters. In many real-world problems, both the structure and the parameters are difficult to specify from domain knowledge alone. Recent work in Bayesian reinforcement learning has made headway in learning POMDP models; however, this work has largely focused on learning the parameters of the POMDP model. We define an infinite POMDP (iPOMDP) model that does not require knowledge of the size of the state space; instead, it assumes that the number of visited states will grow as the agent explores its world and only models visited states explicitly. We demonstrate the iPOMDP on several standard problems. 1

3 0.3727735 107 nips-2009-Help or Hinder: Bayesian Models of Social Goal Inference

Author: Tomer Ullman, Chris Baker, Owen Macindoe, Owain Evans, Noah Goodman, Joshua B. Tenenbaum

Abstract: Everyday social interactions are heavily influenced by our snap judgments about others’ goals. Even young infants can infer the goals of intentional agents from observing how they interact with objects and other agents in their environment: e.g., that one agent is ‘helping’ or ‘hindering’ another’s attempt to get up a hill or open a box. We propose a model for how people can infer these social goals from actions, based on inverse planning in multiagent Markov decision problems (MDPs). The model infers the goal most likely to be driving an agent’s behavior by assuming the agent acts approximately rationally given environmental constraints and its model of other agents present. We also present behavioral evidence in support of this model over a simpler, perceptual cue-based alternative. 1

4 0.21580371 218 nips-2009-Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining

Author: George Konidaris, Andre S. Barreto

Abstract: We introduce a skill discovery method for reinforcement learning in continuous domains that constructs chains of skills leading to an end-of-task reward. We demonstrate experimentally that it creates appropriate skills and achieves performance benefits in a challenging continuous domain. 1

5 0.13513424 134 nips-2009-Learning to Explore and Exploit in POMDPs

Author: Chenghui Cai, Xuejun Liao, Lawrence Carin

Abstract: A fundamental objective in reinforcement learning is the maintenance of a proper balance between exploration and exploitation. This problem becomes more challenging when the agent can only partially observe the states of its environment. In this paper we propose a dual-policy method for jointly learning the agent behavior and the balance between exploration exploitation, in partially observable environments. The method subsumes traditional exploration, in which the agent takes actions to gather information about the environment, and active learning, in which the agent queries an oracle for optimal actions (with an associated cost for employing the oracle). The form of the employed exploration is dictated by the specific problem. Theoretical guarantees are provided concerning the optimality of the balancing of exploration and exploitation. The effectiveness of the method is demonstrated by experimental results on benchmark problems.

6 0.13269897 113 nips-2009-Improving Existing Fault Recovery Policies

7 0.13147318 250 nips-2009-Training Factor Graphs with Reinforcement Learning for Efficient MAP Inference

8 0.1185194 221 nips-2009-Solving Stochastic Games

9 0.11649932 145 nips-2009-Manifold Embeddings for Model-Based Reinforcement Learning under Partial Observability

10 0.11132964 14 nips-2009-A Parameter-free Hedging Algorithm

11 0.10118252 52 nips-2009-Code-specific policy gradient rules for spiking neurons

12 0.086311862 150 nips-2009-Maximum likelihood trajectories for continuous-time Markov chains

13 0.076582827 209 nips-2009-Robust Value Function Approximation Using Bilinear Programming

14 0.072569102 57 nips-2009-Conditional Random Fields with High-Order Features for Sequence Labeling

15 0.070400678 171 nips-2009-Nonparametric Bayesian Models for Unsupervised Event Coreference Resolution

16 0.069511689 9 nips-2009-A Game-Theoretic Approach to Hypergraph Clustering

17 0.064563312 12 nips-2009-A Generalized Natural Actor-Critic Algorithm

18 0.063132904 41 nips-2009-Bayesian Source Localization with the Multivariate Laplace Prior

19 0.060405493 166 nips-2009-Noisy Generalized Binary Search

20 0.055515625 156 nips-2009-Monte Carlo Sampling for Regret Minimization in Extensive Games


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.168), (1, 0.013), (2, 0.257), (3, -0.302), (4, -0.37), (5, 0.024), (6, 0.046), (7, -0.039), (8, -0.021), (9, 0.034), (10, 0.144), (11, 0.112), (12, 0.076), (13, 0.031), (14, 0.033), (15, 0.041), (16, -0.013), (17, 0.054), (18, 0.009), (19, 0.068), (20, 0.007), (21, 0.082), (22, -0.249), (23, -0.001), (24, 0.191), (25, -0.02), (26, 0.112), (27, -0.074), (28, -0.038), (29, -0.055), (30, 0.027), (31, -0.009), (32, -0.005), (33, 0.03), (34, 0.072), (35, 0.011), (36, -0.009), (37, -0.046), (38, 0.036), (39, -0.027), (40, -0.052), (41, -0.028), (42, -0.071), (43, 0.021), (44, 0.006), (45, -0.021), (46, 0.061), (47, 0.016), (48, -0.051), (49, -0.001)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97011387 53 nips-2009-Complexity of Decentralized Control: Special Cases

Author: Martin Allen, Shlomo Zilberstein

Abstract: The worst-case complexity of general decentralized POMDPs, which are equivalent to partially observable stochastic games (POSGs) is very high, both for the cooperative and competitive cases. Some reductions in complexity have been achieved by exploiting independence relations in some models. We show that these results are somewhat limited: when these independence assumptions are relaxed in very small ways, complexity returns to that of the general case. 1

2 0.91699356 107 nips-2009-Help or Hinder: Bayesian Models of Social Goal Inference

Author: Tomer Ullman, Chris Baker, Owen Macindoe, Owain Evans, Noah Goodman, Joshua B. Tenenbaum

Abstract: Everyday social interactions are heavily influenced by our snap judgments about others’ goals. Even young infants can infer the goals of intentional agents from observing how they interact with objects and other agents in their environment: e.g., that one agent is ‘helping’ or ‘hindering’ another’s attempt to get up a hill or open a box. We propose a model for how people can infer these social goals from actions, based on inverse planning in multiagent Markov decision problems (MDPs). The model infers the goal most likely to be driving an agent’s behavior by assuming the agent acts approximately rationally given environmental constraints and its model of other agents present. We also present behavioral evidence in support of this model over a simpler, perceptual cue-based alternative. 1

3 0.88782358 242 nips-2009-The Infinite Partially Observable Markov Decision Process

Author: Finale Doshi-velez

Abstract: The Partially Observable Markov Decision Process (POMDP) framework has proven useful in planning domains where agents must balance actions that provide knowledge and actions that provide reward. Unfortunately, most POMDPs are complex structures with a large number of parameters. In many real-world problems, both the structure and the parameters are difficult to specify from domain knowledge alone. Recent work in Bayesian reinforcement learning has made headway in learning POMDP models; however, this work has largely focused on learning the parameters of the POMDP model. We define an infinite POMDP (iPOMDP) model that does not require knowledge of the size of the state space; instead, it assumes that the number of visited states will grow as the agent explores its world and only models visited states explicitly. We demonstrate the iPOMDP on several standard problems. 1

4 0.87504131 218 nips-2009-Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining

Author: George Konidaris, Andre S. Barreto

Abstract: We introduce a skill discovery method for reinforcement learning in continuous domains that constructs chains of skills leading to an end-of-task reward. We demonstrate experimentally that it creates appropriate skills and achieves performance benefits in a challenging continuous domain. 1

5 0.72280878 134 nips-2009-Learning to Explore and Exploit in POMDPs

Author: Chenghui Cai, Xuejun Liao, Lawrence Carin

Abstract: A fundamental objective in reinforcement learning is the maintenance of a proper balance between exploration and exploitation. This problem becomes more challenging when the agent can only partially observe the states of its environment. In this paper we propose a dual-policy method for jointly learning the agent behavior and the balance between exploration exploitation, in partially observable environments. The method subsumes traditional exploration, in which the agent takes actions to gather information about the environment, and active learning, in which the agent queries an oracle for optimal actions (with an associated cost for employing the oracle). The form of the employed exploration is dictated by the specific problem. Theoretical guarantees are provided concerning the optimality of the balancing of exploration and exploitation. The effectiveness of the method is demonstrated by experimental results on benchmark problems.

6 0.47094858 113 nips-2009-Improving Existing Fault Recovery Policies

7 0.44933993 250 nips-2009-Training Factor Graphs with Reinforcement Learning for Efficient MAP Inference

8 0.39939907 150 nips-2009-Maximum likelihood trajectories for continuous-time Markov chains

9 0.365753 145 nips-2009-Manifold Embeddings for Model-Based Reinforcement Learning under Partial Observability

10 0.35034955 69 nips-2009-Discrete MDL Predicts in Total Variation

11 0.34942389 221 nips-2009-Solving Stochastic Games

12 0.34536624 14 nips-2009-A Parameter-free Hedging Algorithm

13 0.3202211 12 nips-2009-A Generalized Natural Actor-Critic Algorithm

14 0.31825402 159 nips-2009-Multi-Step Dyna Planning for Policy Evaluation and Control

15 0.31564999 39 nips-2009-Bayesian Belief Polarization

16 0.30186215 52 nips-2009-Code-specific policy gradient rules for spiking neurons

17 0.25933772 9 nips-2009-A Game-Theoretic Approach to Hypergraph Clustering

18 0.25718591 171 nips-2009-Nonparametric Bayesian Models for Unsupervised Event Coreference Resolution

19 0.25115052 7 nips-2009-A Data-Driven Approach to Modeling Choice

20 0.24055216 152 nips-2009-Measuring model complexity with the prior predictive


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(24, 0.041), (25, 0.05), (35, 0.038), (36, 0.062), (39, 0.026), (58, 0.055), (61, 0.062), (71, 0.518), (86, 0.049), (91, 0.012)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.966048 4 nips-2009-A Bayesian Analysis of Dynamics in Free Recall

Author: Richard Socher, Samuel Gershman, Per Sederberg, Kenneth Norman, Adler J. Perotte, David M. Blei

Abstract: We develop a probabilistic model of human memory performance in free recall experiments. In these experiments, a subject first studies a list of words and then tries to recall them. To model these data, we draw on both previous psychological research and statistical topic models of text documents. We assume that memories are formed by assimilating the semantic meaning of studied words (represented as a distribution over topics) into a slowly changing latent context (represented in the same space). During recall, this context is reinstated and used as a cue for retrieving studied words. By conceptualizing memory retrieval as a dynamic latent variable model, we are able to use Bayesian inference to represent uncertainty and reason about the cognitive processes underlying memory. We present a particle filter algorithm for performing approximate posterior inference, and evaluate our model on the prediction of recalled words in experimental data. By specifying the model hierarchically, we are also able to capture inter-subject variability. 1

same-paper 2 0.96271282 53 nips-2009-Complexity of Decentralized Control: Special Cases

Author: Martin Allen, Shlomo Zilberstein

Abstract: The worst-case complexity of general decentralized POMDPs, which are equivalent to partially observable stochastic games (POSGs) is very high, both for the cooperative and competitive cases. Some reductions in complexity have been achieved by exploiting independence relations in some models. We show that these results are somewhat limited: when these independence assumptions are relaxed in very small ways, complexity returns to that of the general case. 1

3 0.95858651 143 nips-2009-Localizing Bugs in Program Executions with Graphical Models

Author: Laura Dietz, Valentin Dallmeier, Andreas Zeller, Tobias Scheffer

Abstract: We devise a graphical model that supports the process of debugging software by guiding developers to code that is likely to contain defects. The model is trained using execution traces of passing test runs; it reflects the distribution over transitional patterns of code positions. Given a failing test case, the model determines the least likely transitional pattern in the execution trace. The model is designed such that Bayesian inference has a closed-form solution. We evaluate the Bernoulli graph model on data of the software projects AspectJ and Rhino. 1

4 0.92374778 11 nips-2009-A General Projection Property for Distribution Families

Author: Yao-liang Yu, Yuxi Li, Dale Schuurmans, Csaba Szepesvári

Abstract: Surjectivity of linear projections between distribution families with fixed mean and covariance (regardless of dimension) is re-derived by a new proof. We further extend this property to distribution families that respect additional constraints, such as symmetry, unimodality and log-concavity. By combining our results with classic univariate inequalities, we provide new worst-case analyses for natural risk criteria arising in classification, optimization, portfolio selection and Markov decision processes. 1

5 0.83312309 56 nips-2009-Conditional Neural Fields

Author: Jian Peng, Liefeng Bo, Jinbo Xu

Abstract: Conditional random fields (CRF) are widely used for sequence labeling such as natural language processing and biological sequence analysis. Most CRF models use a linear potential function to represent the relationship between input features and output. However, in many real-world applications such as protein structure prediction and handwriting recognition, the relationship between input features and output is highly complex and nonlinear, which cannot be accurately modeled by a linear function. To model the nonlinear relationship between input and output we propose a new conditional probabilistic graphical model, Conditional Neural Fields (CNF), for sequence labeling. CNF extends CRF by adding one (or possibly more) middle layer between input and output. The middle layer consists of a number of gate functions, each acting as a local neuron or feature extractor to capture the nonlinear relationship between input and output. Therefore, conceptually CNF is much more expressive than CRF. Experiments on two widely-used benchmarks indicate that CNF performs significantly better than a number of popular methods. In particular, CNF is the best among approximately 10 machine learning methods for protein secondary structure prediction and also among a few of the best methods for handwriting recognition.

6 0.8319971 130 nips-2009-Learning from Multiple Partially Observed Views - an Application to Multilingual Text Categorization

7 0.75915998 205 nips-2009-Rethinking LDA: Why Priors Matter

8 0.74497372 65 nips-2009-Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process

9 0.65927184 150 nips-2009-Maximum likelihood trajectories for continuous-time Markov chains

10 0.65102386 204 nips-2009-Replicated Softmax: an Undirected Topic Model

11 0.63806391 206 nips-2009-Riffled Independence for Ranked Data

12 0.63506794 171 nips-2009-Nonparametric Bayesian Models for Unsupervised Event Coreference Resolution

13 0.62677938 40 nips-2009-Bayesian Nonparametric Models on Decomposable Graphs

14 0.61331403 107 nips-2009-Help or Hinder: Bayesian Models of Social Goal Inference

15 0.60868222 96 nips-2009-Filtering Abstract Senses From Image Search Results

16 0.60711753 226 nips-2009-Spatial Normalized Gamma Processes

17 0.60318351 242 nips-2009-The Infinite Partially Observable Markov Decision Process

18 0.60012472 215 nips-2009-Sensitivity analysis in HMMs with application to likelihood maximization

19 0.59983677 194 nips-2009-Predicting the Optimal Spacing of Study: A Multiscale Context Model of Memory

20 0.59300512 154 nips-2009-Modeling the spacing effect in sequential category learning