nips nips2009 nips2009-107 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Tomer Ullman, Chris Baker, Owen Macindoe, Owain Evans, Noah Goodman, Joshua B. Tenenbaum
Abstract: Everyday social interactions are heavily influenced by our snap judgments about others’ goals. Even young infants can infer the goals of intentional agents from observing how they interact with objects and other agents in their environment: e.g., that one agent is ‘helping’ or ‘hindering’ another’s attempt to get up a hill or open a box. We propose a model for how people can infer these social goals from actions, based on inverse planning in multiagent Markov decision problems (MDPs). The model infers the goal most likely to be driving an agent’s behavior by assuming the agent acts approximately rationally given environmental constraints and its model of other agents present. We also present behavioral evidence in support of this model over a simpler, perceptual cue-based alternative. 1
Reference: text
sentIndex sentText sentNum sentScore
1 Even young infants can infer the goals of intentional agents from observing how they interact with objects and other agents in their environment: e. [sent-7, score-1.339]
2 We propose a model for how people can infer these social goals from actions, based on inverse planning in multiagent Markov decision problems (MDPs). [sent-10, score-1.095]
3 The model infers the goal most likely to be driving an agent’s behavior by assuming the agent acts approximately rationally given environmental constraints and its model of other agents present. [sent-11, score-1.0]
4 1 Introduction Humans make rapid, consistent intuitive inferences about the goals of agents from the most impoverished of visual stimuli. [sent-13, score-0.956]
5 On viewing a short video of geometric shapes moving in a 2D world, adults spontaneously attribute to them an array of goals and intentions [7]. [sent-14, score-0.495]
6 Yet people also attribute complex social goals, such as helping, hindering or protecting another agent. [sent-18, score-0.592]
7 Recent studies suggest that infants as young as six months make the same sort of complex social goal attributions on observing simple displays of moving shapes, or (at older ages) in displays of puppets interacting [6]. [sent-19, score-0.583]
8 How do humans make these rapid social goal inferences from such impoverished displays? [sent-20, score-0.492]
9 On one approach, social goals are inferred directly from perceptual cues in a bottom-up fashion. [sent-21, score-0.65]
10 For example, infants in [6] may judge that a triangle pushing a circle up a hill is helping the circle get to the top of the hill simply because the circle is moving the triangle in the direction the triangle was last observed moving on its own. [sent-22, score-0.706]
11 On this approach, the triangle is judged to be helping the circle because in some sense he knows what the circle’s goal is, desires for the circle to achieve the goal, constructs a rational plan of action that he expects will increase the probability of the circle realizing the goal. [sent-27, score-0.603]
12 The virtue of this theoryof-mind approach is its generality, accounting for a much wider range of social goal inferences that cannot be reduced to simple perceptual cues. [sent-28, score-0.438]
13 Our question here is whether the rapid goal inferences we make in everyday social situations, and that both infants and adults have been shown to make from simple perceptual displays, require the sophistication of a theory-based approach or can be sufficiently explained in terms of perceptual cues. [sent-29, score-0.576]
14 This framework should enable the inference that agent A is helping or hindering agent B from a joint goal inference based on observing A and B interacting. [sent-35, score-1.404]
15 Inference should be possible even with minimal prior knowledge about the agents and without knowledge of B’s goal. [sent-36, score-0.456]
16 beliefs, goals, planning abilities) explain fast human judgments from impoverished and unfamiliar stimuli? [sent-40, score-0.482]
17 In addressing the challenge of formalization, we present a formal account of social goal attribution based on the abstract criterion of A helping (or hindering) B by acting to maximize (minimize) Bs probability of realizing his goals. [sent-41, score-0.63]
18 On this account, agent A rationally maximizes utility by maximizing (minimizing) the expected utility of B, where this expectation comes from As model of Bs goals and plans of action. [sent-42, score-0.8]
19 We incorporate this formalization of helping and hindering into an existing computational framework for theory-based goal inference, on which goals are inferred from actions by inverting a generative rational planning (MDP) model [1]. [sent-43, score-1.408]
20 The augmented model allows for the inference that A is helping or hindering B from stimuli in which B’s goal is not directly observable. [sent-44, score-0.741]
21 We test this Inverse Planning model of social goal attribution on a set of simple 2D displays, comparing its performance to that of an alternative model which makes inferences directly from visual cues, based on previous work such as that of Blythe et al. [sent-45, score-0.544]
22 2 Computational Framework Our framework assumes that people represent the causal role of agents’ goals in terms of an intuitive principle of rationality [4]: the assumption that agents will tend to take efficient actions to achieve their goals, given their beliefs about the world. [sent-47, score-0.929]
23 Inferences of simple relational goals between agents (such as chasing and fleeing) from maze-world interactions were considered by Baker, Goodman and Tenenbaum [1], using multiagent MDP-based inverse planning. [sent-49, score-1.034]
24 In this paper, we present a framework for modeling inferences of more complex social goals, such as helping and hindering, where an agent’s goals depend on the goals of other agents. [sent-50, score-1.275]
25 We will define two types of agents: simple agents, which have object-directed goals and do not represent other agents’ goals, and complex agents, which have either social or object-directed goals, and represent other agents’ goals and reason about their likely behavior. [sent-51, score-0.929]
26 For each type of agent and goal, we describe the multiagent MDPs they define. [sent-52, score-0.475]
27 We then describe joint inferences of objectdirected and social goals based on the Bayesian inversion of MDP models of behavior. [sent-53, score-0.648]
28 S is an encoding of the world into a finite set of mutually exclusive states, which specifies the set of possible configurations of all agents and objects. [sent-56, score-0.456]
29 R : S × A → R is the reward function, which provides agents with realvalued rewards for each state-action pair, and γ is the discount factor. [sent-60, score-0.646]
30 We then describe how agents plan over multiagent MDPs. [sent-62, score-0.554]
31 The state reward functions range from a unit reward in the goal location (row 1) to a field of reward that extends to every location in the grid (row 3). [sent-85, score-0.578]
32 Specifically, ρg and δg determine the scale and shape of the state reward function, with ri (S) = max(ρg (1 − distance(S, i, G)/δg ), 0), where distance(S, i, G) is the geodesic distance between agent i and the goal. [sent-92, score-0.583]
33 With δg ≤ 1, the reward function has a unit value of r(S) = ρg when the agent and object goal occupy the same location, i. [sent-93, score-0.676]
34 Social rewards for helping and hindering For complex agent j, the state reward function induced by a social goal Gj depends on the cost of j’s action Aj , as well as the reward function Ri of the agent that j wants to help or hinder. [sent-100, score-2.05]
35 ρo is the social agent’s scaling of the expected reward of state S for agent i, which determines how much j “cares” about i relative to its own costs. [sent-102, score-0.744]
36 For helping agents, ρo > 0, and for hindering agents, ρo < 0. [sent-103, score-0.552]
37 Computing the expectation EAi [Ri (S, Ai )] relies on the social agent’s model of i’s planning process, which we will describe below. [sent-104, score-0.495]
38 Simple agents We assume that the simple agents model other agents as randomly selecting actions in proportion to the softmax of their expected cost, i. [sent-110, score-1.453]
39 Complex agents We assume that the social agent j uses its model of other agents’ planning process to compute P (Ai |S, Gi ), for i = j, allowing for accurate prediction of other agents’ actions. [sent-113, score-1.328]
40 We assume agents have access to the true environment dynamics. [sent-114, score-0.475]
41 This is a simplification of a more realistic framework in which agents have only partial or false knowledge about the environment. [sent-115, score-0.456]
42 3 Multiagent planning Given the variables of MDP M , we can compute the optimal state-action value function Q∗ : S ×A → R, which determines the expected infinite-horizon reward of taking an action in each state. [sent-118, score-0.489]
43 We assume that agents have softmax-optimal policies, such that P (A|S, G) ∝ exp(βQ∗ (S, A)), allowing occasional deviations from the optimal action depending on the parameter β, which determines agents’ level of determinism (higher β implies higher determinism, or less randomness). [sent-119, score-0.545]
44 In a multiagent setting, joint value functions can be optimized recursively, with one agent representing the value function of the other, and the other representing the representation of the first, and so on to an arbitrarily high order [10]. [sent-120, score-0.475]
45 That is, an agent A can at most represent an agent B’s reasoning about A’s goals and actions, but not a deeper recursion in which B reasons about A reasoning about B. [sent-122, score-1.103]
46 2 Inverse planning in multiagent MDPs Once we have computed P (Ai |S, Gi ) for agents 1 through n using multiagent planning, we use Bayesian inverse planning to infer agents’ goals, given observations of their behavior. [sent-124, score-1.289]
47 1 over a range of θ values for each stimulus trial: P (Gi |S1:T , A1:n −1 , β) = 1:T P (Gi , θ|S1:T , A1:n −1 , β) 1:T (2) θ This allows our models to infer the combination of goals and reward functions that best explains the agents’ behavior for each stimulus. [sent-129, score-0.509]
48 3 Experiment We designed an experiment to test the Inverse Planning model of social goal attributions in a simple 2D maze-world domain, inspired by the stimuli of many previous studies involving children and adults [7, 5, 8, 6, 9, 12]. [sent-130, score-0.487]
49 We created a set of videos which depicted agents interacting in a maze. [sent-131, score-0.496]
50 Subjects were asked to attribute goals to the agents after viewing brief snippets of these videos. [sent-133, score-0.876]
51 2 Stimuli We constructed 24 scenarios in which two agents moved around a 2D maze (shown in Fig. [sent-140, score-0.579]
52 The maze always contained two potential object goals (a flower and a tree), and on 12 of the 24 scenarios it also contained a movable obstacle (a boulder). [sent-142, score-0.53]
53 First, scenarios were to have agents acting in ways that were consistent with more than one hypothesis concerning their goals, with these ambiguities between goals sometimes being resolved as the scenario developed (see Fig. [sent-144, score-0.92]
54 Second, scenarios were to involve a variety of 4 (a) Scenario 6 perceptually distinct plans of action that might be interpreted as issuing from helping or hindering goals. [sent-147, score-0.702]
55 For example, one agent pushing another toward an object goal, removing an obstacle from the other agent’s path, and moving aside for the other agent (all of which featured in our scenarios) could all be interpreted as helping. [sent-148, score-0.888]
56 This criterion was included to test our formalization of social goals as based on an abstract relation between reward functions. [sent-149, score-0.752]
57 In our model, social agents act to maximize or minimize the reward of the other agent, and the precise manner in which they do so will vary depending on the structure of the environment and their initial positions. [sent-150, score-0.842]
58 (a) The Large agent moves over each of the goal objects (Frames 1-7) and so the video is initially ambiguous between his having an object goal and a social goal. [sent-154, score-0.871]
59 Disambiguation occurs from Frame 8, when the Large agent moves down and blocks the Small agent from continuing his path up to the object goal. [sent-155, score-0.821]
60 Once the Small agent moves into the same room (6), the Large agent pushes him onto the flower and allows him to rest there (8-16). [sent-157, score-0.834]
61 Large agents were visually bigger and are able to shift both movable obstacles and Small agents by moving directly into them. [sent-159, score-1.015]
62 Small agents were visually smaller, and could not shift agents or boulders. [sent-163, score-0.912]
63 In our scenarios, the actions of Small agents failed with a probability of about 0. [sent-164, score-0.517]
64 Large agents correspond to the “complex agents” introduced in Section 2, in that they could have either object-directed goals or social goals (helping or hindering the Small agent). [sent-166, score-1.659]
65 Small agents correspond to “simple agents” and could have only object goals. [sent-167, score-0.497]
66 Asking subjects for goal attributions at multiple points in a sequence allowed us to track the change in their judgments as evidence for particular goals accumulated. [sent-173, score-0.797]
67 3 Procedure Subjects were initially shown a set of familiarization videos of agents interacting in the maze, illustrating the structural properties of the maze-world e. [sent-177, score-0.496]
68 the actions available to agents and the possibility of moving obstacles) and the differences between Small and Large agents. [sent-179, score-0.565]
69 The left-right orientation of agents and goals was counterbalanced across subjects. [sent-182, score-0.805]
70 Subjects were told that each snippet would contain two new agents (one Small and one Large) and this was highlighted in the stimuli by randomly varying the color of the agents for each snippet. [sent-183, score-1.0]
71 Subjects were told that agents had complete knowledge of the physical structure of the maze, including the position of all goals, agents and obstacles. [sent-184, score-0.912]
72 For the Large agent, they could select either of the two social goals and either of the two object goals. [sent-186, score-0.597]
73 In our experiments, the world was given by a 2D maze-world, and the state space included the set of positions that agents and objects can jointly occupy without overlapping. [sent-192, score-0.456]
74 For instance, some stimuli were suggestive of “field” goals rather than point goals, and marginalizing over δg allowed our model to capture this. [sent-198, score-0.44]
75 We compared the Inverse Planning model to a model that made inferences about goals based on simple visual cues, inspired by previous heuristic- or perceptually-based accounts of human action understanding of similar 2D animated displays [3, 11]. [sent-208, score-0.664]
76 5 Results Because our main interest is in judgments about the social goals of representationally complex agents, we analzyed only subjects’ judgments about the Large agents. [sent-214, score-0.874]
77 The Cue-based model correlates well with judgments for object goals (r = 0. [sent-228, score-0.588]
78 90 for flower, tree) – indeed slightly better the Inverse Planning model – but much less well for social goals (r = 0. [sent-230, score-0.58]
79 There are many stimuli for which people are very confident that the Large agent is either helping or hindering, and the Inverse Planning model is similarly confident (bar heights near 1). [sent-235, score-0.766]
80 The Cue-based model, in contrast, is unsure: it assigns roughly equal probabilities of helping or hindering to these cases (bar heights near 0. [sent-236, score-0.552]
81 In other words, the Cue-based model is effective at inferring simple object goals of maze-world agents, but 6 is generally unable to distinguish between the more complex goals of helping and hindering. [sent-238, score-1.041]
82 When constrained to simply differentiating between social and object goals both models succeed equally (r = 0. [sent-239, score-0.597]
83 84), where in the Cue-based model this is probably because moving away from the object goals serves as a good cue to separate these categories. [sent-240, score-0.506]
84 However, the Inverse Planning model is more successful in differentiating the right goal within social goals (r = 0. [sent-241, score-0.678]
85 Note that in the one scenario for which humans and the Inverse Planning model disagreed after observing the full sequence, both humans and the model were close to being ambivalent whether the Large agent was hindering or interested in the flower. [sent-250, score-0.862]
86 We divided scenarios into two groups depending on whether a boulder was moved around in the scenario, as movable boulders increase the range of variability in helping and hindering action sequences. [sent-252, score-0.801]
87 In contrast, the Inverse Planning model captures abstract relations between the agents and their possible goal and so lends itself to a variety of environments. [sent-258, score-0.578]
88 975 Human judgments Human judgments Figure 3: Correlations between human goal judgments and predictions of the Inverse Planning model (a) and the Cue-based model (b), broken down by goal type. [sent-355, score-0.758]
89 The inability of the heuristic model to distinguish between helping and hindering is illustrated by the plots in Fig. [sent-359, score-0.576]
90 In contrast, both the Inverse Planning model and the human subjects are often very confident that an agent is helping and not hindering (or vice versa). [sent-361, score-1.088]
91 2(a) but with goals switched), both the Inverse Planning model and humans subjects recognize the movement of the Large agent one step off the flower (or the tree in Fig. [sent-371, score-0.939]
92 In scenario 5, both agents start off in the bottom-left room, but with the Small agent right at the entrance to the top-left room. [sent-376, score-0.892]
93 As the Small agent tries to move towards the flower (the top-left goal), the Large agent moves up from below and pushes Small one step towards the flower before moving off to the right to the tree. [sent-377, score-0.852]
94 The first was to provide a formalization of social goal attribution incorporated into a general theory-based model for goal attribution. [sent-404, score-0.534]
95 This model had to enable the inference that A is helping or hindering B from interactions between A and B but without prior knowledge of either agent’s goal, and to account for the range of behaviors that humans judge as evidence of helping or hindering. [sent-405, score-0.942]
96 The second challenge was for the model to perform well on a demanding inference task in which social goals must be inferred from very few observations without directly observable evidence of agents’ goals. [sent-406, score-0.608]
97 The Inverse Planning model classified a diverse range of agent interactions as helping or hindering in line with human judgments. [sent-408, score-1.015]
98 It produced a closer fit to humans for both social and nonsocial goal attributions, and was far superior to the visual cue model in discriminating between helping and hindering. [sent-410, score-0.695]
99 One task is to augment this formal model of helping and hindering to capture more of the complexity behind human judgments. [sent-412, score-0.616]
100 This aspect of helping could be explored by supposing that the utility of a helping agent depends not just on another agent’s reward function but also his value function. [sent-415, score-1.045]
wordName wordTfidf (topN-words)
[('agents', 0.456), ('agent', 0.377), ('goals', 0.349), ('hindering', 0.298), ('planning', 0.264), ('helping', 0.254), ('social', 0.207), ('reward', 0.16), ('judgments', 0.147), ('inverse', 0.109), ('frame', 0.105), ('multiagent', 0.098), ('goal', 0.098), ('subjects', 0.095), ('inferences', 0.092), ('ower', 0.089), ('boulder', 0.073), ('attribution', 0.071), ('st', 0.068), ('stimuli', 0.067), ('action', 0.065), ('probe', 0.063), ('actions', 0.061), ('attributions', 0.06), ('scenario', 0.059), ('scenarios', 0.056), ('aj', 0.054), ('cues', 0.053), ('snippets', 0.052), ('moving', 0.048), ('maze', 0.048), ('circle', 0.046), ('people', 0.044), ('cue', 0.044), ('displays', 0.042), ('hinder', 0.042), ('infants', 0.042), ('object', 0.041), ('perceptual', 0.041), ('mdps', 0.041), ('human', 0.04), ('humans', 0.04), ('frames', 0.038), ('formalization', 0.036), ('gi', 0.036), ('blythe', 0.036), ('intentional', 0.036), ('movable', 0.036), ('predictions', 0.033), ('baker', 0.032), ('impoverished', 0.031), ('adults', 0.031), ('movement', 0.031), ('rewards', 0.03), ('room', 0.03), ('plans', 0.029), ('hill', 0.029), ('visual', 0.028), ('evidence', 0.028), ('correlates', 0.027), ('distance', 0.027), ('ai', 0.026), ('mdp', 0.026), ('moves', 0.026), ('rapid', 0.024), ('rational', 0.024), ('triangle', 0.024), ('model', 0.024), ('video', 0.024), ('caring', 0.024), ('determinism', 0.024), ('eai', 0.024), ('intentions', 0.024), ('lef', 0.024), ('owain', 0.024), ('pushes', 0.024), ('pushing', 0.024), ('wynn', 0.024), ('complex', 0.024), ('tree', 0.023), ('interactions', 0.022), ('stay', 0.022), ('judge', 0.022), ('goodman', 0.021), ('featured', 0.021), ('karen', 0.021), ('rationally', 0.021), ('snippet', 0.021), ('dent', 0.02), ('interacting', 0.02), ('videos', 0.02), ('points', 0.02), ('attribute', 0.019), ('geodesic', 0.019), ('flower', 0.019), ('rationality', 0.019), ('todd', 0.019), ('obstacles', 0.019), ('environment', 0.019), ('moved', 0.019)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999976 107 nips-2009-Help or Hinder: Bayesian Models of Social Goal Inference
Author: Tomer Ullman, Chris Baker, Owen Macindoe, Owain Evans, Noah Goodman, Joshua B. Tenenbaum
Abstract: Everyday social interactions are heavily influenced by our snap judgments about others’ goals. Even young infants can infer the goals of intentional agents from observing how they interact with objects and other agents in their environment: e.g., that one agent is ‘helping’ or ‘hindering’ another’s attempt to get up a hill or open a box. We propose a model for how people can infer these social goals from actions, based on inverse planning in multiagent Markov decision problems (MDPs). The model infers the goal most likely to be driving an agent’s behavior by assuming the agent acts approximately rationally given environmental constraints and its model of other agents present. We also present behavioral evidence in support of this model over a simpler, perceptual cue-based alternative. 1
2 0.3727735 53 nips-2009-Complexity of Decentralized Control: Special Cases
Author: Martin Allen, Shlomo Zilberstein
Abstract: The worst-case complexity of general decentralized POMDPs, which are equivalent to partially observable stochastic games (POSGs) is very high, both for the cooperative and competitive cases. Some reductions in complexity have been achieved by exploiting independence relations in some models. We show that these results are somewhat limited: when these independence assumptions are relaxed in very small ways, complexity returns to that of the general case. 1
3 0.31432995 242 nips-2009-The Infinite Partially Observable Markov Decision Process
Author: Finale Doshi-velez
Abstract: The Partially Observable Markov Decision Process (POMDP) framework has proven useful in planning domains where agents must balance actions that provide knowledge and actions that provide reward. Unfortunately, most POMDPs are complex structures with a large number of parameters. In many real-world problems, both the structure and the parameters are difficult to specify from domain knowledge alone. Recent work in Bayesian reinforcement learning has made headway in learning POMDP models; however, this work has largely focused on learning the parameters of the POMDP model. We define an infinite POMDP (iPOMDP) model that does not require knowledge of the size of the state space; instead, it assumes that the number of visited states will grow as the agent explores its world and only models visited states explicitly. We demonstrate the iPOMDP on several standard problems. 1
4 0.17761028 218 nips-2009-Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining
Author: George Konidaris, Andre S. Barreto
Abstract: We introduce a skill discovery method for reinforcement learning in continuous domains that constructs chains of skills leading to an end-of-task reward. We demonstrate experimentally that it creates appropriate skills and achieves performance benefits in a challenging continuous domain. 1
5 0.095748879 52 nips-2009-Code-specific policy gradient rules for spiking neurons
Author: Henning Sprekeler, Guillaume Hennequin, Wulfram Gerstner
Abstract: Although it is widely believed that reinforcement learning is a suitable tool for describing behavioral learning, the mechanisms by which it can be implemented in networks of spiking neurons are not fully understood. Here, we show that different learning rules emerge from a policy gradient approach depending on which features of the spike trains are assumed to influence the reward signals, i.e., depending on which neural code is in effect. We use the framework of Williams (1992) to derive learning rules for arbitrary neural codes. For illustration, we present policy-gradient rules for three different example codes - a spike count code, a spike timing code and the most general “full spike train” code - and test them on simple model problems. In addition to classical synaptic learning, we derive learning rules for intrinsic parameters that control the excitability of the neuron. The spike count learning rule has structural similarities with established Bienenstock-Cooper-Munro rules. If the distribution of the relevant spike train features belongs to the natural exponential family, the learning rules have a characteristic shape that raises interesting prediction problems. 1
6 0.093707323 134 nips-2009-Learning to Explore and Exploit in POMDPs
7 0.091430642 221 nips-2009-Solving Stochastic Games
8 0.082856216 250 nips-2009-Training Factor Graphs with Reinforcement Learning for Efficient MAP Inference
9 0.071142592 159 nips-2009-Multi-Step Dyna Planning for Policy Evaluation and Control
10 0.067536354 9 nips-2009-A Game-Theoretic Approach to Hypergraph Clustering
11 0.062035002 109 nips-2009-Hierarchical Learning of Dimensional Biases in Human Categorization
12 0.060319442 115 nips-2009-Individuation, Identification and Object Discovery
13 0.056878578 145 nips-2009-Manifold Embeddings for Model-Based Reinforcement Learning under Partial Observability
14 0.054716442 235 nips-2009-Structural inference affects depth perception in the context of potential occlusion
15 0.053295147 85 nips-2009-Explaining human multiple object tracking as resource-constrained approximate inference in a dynamic probabilistic model
16 0.053164158 188 nips-2009-Perceptual Multistability as Markov Chain Monte Carlo Inference
17 0.051840086 14 nips-2009-A Parameter-free Hedging Algorithm
18 0.04949566 174 nips-2009-Nonparametric Latent Feature Models for Link Prediction
19 0.047818858 137 nips-2009-Learning transport operators for image manifolds
20 0.045285095 12 nips-2009-A Generalized Natural Actor-Critic Algorithm
topicId topicWeight
[(0, -0.128), (1, -0.063), (2, 0.195), (3, -0.242), (4, -0.293), (5, 0.054), (6, 0.037), (7, -0.009), (8, -0.023), (9, 0.041), (10, 0.16), (11, 0.066), (12, 0.092), (13, -0.025), (14, 0.037), (15, 0.051), (16, -0.027), (17, 0.095), (18, -0.021), (19, 0.133), (20, -0.063), (21, 0.069), (22, -0.216), (23, -0.009), (24, 0.226), (25, -0.063), (26, 0.137), (27, -0.039), (28, -0.053), (29, -0.097), (30, 0.039), (31, -0.039), (32, 0.034), (33, 0.045), (34, 0.099), (35, -0.001), (36, -0.029), (37, -0.068), (38, 0.105), (39, 0.004), (40, -0.069), (41, -0.016), (42, -0.076), (43, 0.028), (44, 0.024), (45, 0.038), (46, -0.016), (47, 0.026), (48, -0.032), (49, 0.017)]
simIndex simValue paperId paperTitle
same-paper 1 0.98137265 107 nips-2009-Help or Hinder: Bayesian Models of Social Goal Inference
Author: Tomer Ullman, Chris Baker, Owen Macindoe, Owain Evans, Noah Goodman, Joshua B. Tenenbaum
Abstract: Everyday social interactions are heavily influenced by our snap judgments about others’ goals. Even young infants can infer the goals of intentional agents from observing how they interact with objects and other agents in their environment: e.g., that one agent is ‘helping’ or ‘hindering’ another’s attempt to get up a hill or open a box. We propose a model for how people can infer these social goals from actions, based on inverse planning in multiagent Markov decision problems (MDPs). The model infers the goal most likely to be driving an agent’s behavior by assuming the agent acts approximately rationally given environmental constraints and its model of other agents present. We also present behavioral evidence in support of this model over a simpler, perceptual cue-based alternative. 1
2 0.87395197 53 nips-2009-Complexity of Decentralized Control: Special Cases
Author: Martin Allen, Shlomo Zilberstein
Abstract: The worst-case complexity of general decentralized POMDPs, which are equivalent to partially observable stochastic games (POSGs) is very high, both for the cooperative and competitive cases. Some reductions in complexity have been achieved by exploiting independence relations in some models. We show that these results are somewhat limited: when these independence assumptions are relaxed in very small ways, complexity returns to that of the general case. 1
3 0.82294315 242 nips-2009-The Infinite Partially Observable Markov Decision Process
Author: Finale Doshi-velez
Abstract: The Partially Observable Markov Decision Process (POMDP) framework has proven useful in planning domains where agents must balance actions that provide knowledge and actions that provide reward. Unfortunately, most POMDPs are complex structures with a large number of parameters. In many real-world problems, both the structure and the parameters are difficult to specify from domain knowledge alone. Recent work in Bayesian reinforcement learning has made headway in learning POMDP models; however, this work has largely focused on learning the parameters of the POMDP model. We define an infinite POMDP (iPOMDP) model that does not require knowledge of the size of the state space; instead, it assumes that the number of visited states will grow as the agent explores its world and only models visited states explicitly. We demonstrate the iPOMDP on several standard problems. 1
4 0.7968424 218 nips-2009-Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining
Author: George Konidaris, Andre S. Barreto
Abstract: We introduce a skill discovery method for reinforcement learning in continuous domains that constructs chains of skills leading to an end-of-task reward. We demonstrate experimentally that it creates appropriate skills and achieves performance benefits in a challenging continuous domain. 1
5 0.64081883 134 nips-2009-Learning to Explore and Exploit in POMDPs
Author: Chenghui Cai, Xuejun Liao, Lawrence Carin
Abstract: A fundamental objective in reinforcement learning is the maintenance of a proper balance between exploration and exploitation. This problem becomes more challenging when the agent can only partially observe the states of its environment. In this paper we propose a dual-policy method for jointly learning the agent behavior and the balance between exploration exploitation, in partially observable environments. The method subsumes traditional exploration, in which the agent takes actions to gather information about the environment, and active learning, in which the agent queries an oracle for optimal actions (with an associated cost for employing the oracle). The form of the employed exploration is dictated by the specific problem. Theoretical guarantees are provided concerning the optimality of the balancing of exploration and exploitation. The effectiveness of the method is demonstrated by experimental results on benchmark problems.
6 0.35105312 39 nips-2009-Bayesian Belief Polarization
7 0.3134785 113 nips-2009-Improving Existing Fault Recovery Policies
8 0.30278644 115 nips-2009-Individuation, Identification and Object Discovery
9 0.28877917 250 nips-2009-Training Factor Graphs with Reinforcement Learning for Efficient MAP Inference
11 0.26642561 52 nips-2009-Code-specific policy gradient rules for spiking neurons
12 0.26610318 152 nips-2009-Measuring model complexity with the prior predictive
13 0.2631892 150 nips-2009-Maximum likelihood trajectories for continuous-time Markov chains
14 0.25448167 9 nips-2009-A Game-Theoretic Approach to Hypergraph Clustering
15 0.25206053 69 nips-2009-Discrete MDL Predicts in Total Variation
16 0.23346271 235 nips-2009-Structural inference affects depth perception in the context of potential occlusion
17 0.23216677 221 nips-2009-Solving Stochastic Games
18 0.22663242 159 nips-2009-Multi-Step Dyna Planning for Policy Evaluation and Control
19 0.21670513 145 nips-2009-Manifold Embeddings for Model-Based Reinforcement Learning under Partial Observability
20 0.20900777 216 nips-2009-Sequential effects reflect parallel learning of multiple environmental regularities
topicId topicWeight
[(24, 0.029), (25, 0.081), (32, 0.241), (35, 0.039), (36, 0.038), (39, 0.089), (55, 0.013), (58, 0.062), (61, 0.1), (71, 0.108), (81, 0.01), (86, 0.051), (91, 0.039)]
simIndex simValue paperId paperTitle
same-paper 1 0.83828038 107 nips-2009-Help or Hinder: Bayesian Models of Social Goal Inference
Author: Tomer Ullman, Chris Baker, Owen Macindoe, Owain Evans, Noah Goodman, Joshua B. Tenenbaum
Abstract: Everyday social interactions are heavily influenced by our snap judgments about others’ goals. Even young infants can infer the goals of intentional agents from observing how they interact with objects and other agents in their environment: e.g., that one agent is ‘helping’ or ‘hindering’ another’s attempt to get up a hill or open a box. We propose a model for how people can infer these social goals from actions, based on inverse planning in multiagent Markov decision problems (MDPs). The model infers the goal most likely to be driving an agent’s behavior by assuming the agent acts approximately rationally given environmental constraints and its model of other agents present. We also present behavioral evidence in support of this model over a simpler, perceptual cue-based alternative. 1
Author: Ed Vul, George Alvarez, Joshua B. Tenenbaum, Michael J. Black
Abstract: Multiple object tracking is a task commonly used to investigate the architecture of human visual attention. Human participants show a distinctive pattern of successes and failures in tracking experiments that is often attributed to limits on an object system, a tracking module, or other specialized cognitive structures. Here we use a computational analysis of the task of object tracking to ask which human failures arise from cognitive limitations and which are consequences of inevitable perceptual uncertainty in the tracking task. We find that many human performance phenomena, measured through novel behavioral experiments, are naturally produced by the operation of our ideal observer model (a Rao-Blackwelized particle filter). The tradeoff between the speed and number of objects being tracked, however, can only arise from the allocation of a flexible cognitive resource, which can be formalized as either memory or attention. 1
3 0.62002313 242 nips-2009-The Infinite Partially Observable Markov Decision Process
Author: Finale Doshi-velez
Abstract: The Partially Observable Markov Decision Process (POMDP) framework has proven useful in planning domains where agents must balance actions that provide knowledge and actions that provide reward. Unfortunately, most POMDPs are complex structures with a large number of parameters. In many real-world problems, both the structure and the parameters are difficult to specify from domain knowledge alone. Recent work in Bayesian reinforcement learning has made headway in learning POMDP models; however, this work has largely focused on learning the parameters of the POMDP model. We define an infinite POMDP (iPOMDP) model that does not require knowledge of the size of the state space; instead, it assumes that the number of visited states will grow as the agent explores its world and only models visited states explicitly. We demonstrate the iPOMDP on several standard problems. 1
Author: Anne Hsu, Thomas L. Griffiths
Abstract: A classic debate in cognitive science revolves around understanding how children learn complex linguistic rules, such as those governing restrictions on verb alternations, without negative evidence. Traditionally, formal learnability arguments have been used to claim that such learning is impossible without the aid of innate language-specific knowledge. However, recently, researchers have shown that statistical models are capable of learning complex rules from only positive evidence. These two kinds of learnability analyses differ in their assumptions about the distribution from which linguistic input is generated. The former analyses assume that learners seek to identify grammatical sentences in a way that is robust to the distribution from which the sentences are generated, analogous to discriminative approaches in machine learning. The latter assume that learners are trying to estimate a generative model, with sentences being sampled from that model. We show that these two learning approaches differ in their use of implicit negative evidence – the absence of a sentence – when learning verb alternations, and demonstrate that human learners can produce results consistent with the predictions of both approaches, depending on how the learning problem is presented. 1
5 0.59078044 254 nips-2009-Variational Gaussian-process factor analysis for modeling spatio-temporal data
Author: Jaakko Luttinen, Alexander T. Ihler
Abstract: We present a probabilistic factor analysis model which can be used for studying spatio-temporal datasets. The spatial and temporal structure is modeled by using Gaussian process priors both for the loading matrix and the factors. The posterior distributions are approximated using the variational Bayesian framework. High computational cost of Gaussian process modeling is reduced by using sparse approximations. The model is used to compute the reconstructions of the global sea surface temperatures from a historical dataset. The results suggest that the proposed model can outperform the state-of-the-art reconstruction systems.
6 0.57079607 133 nips-2009-Learning models of object structure
7 0.56102967 154 nips-2009-Modeling the spacing effect in sequential category learning
8 0.55940515 113 nips-2009-Improving Existing Fault Recovery Policies
9 0.54945511 215 nips-2009-Sensitivity analysis in HMMs with application to likelihood maximization
10 0.5477947 205 nips-2009-Rethinking LDA: Why Priors Matter
11 0.54555875 40 nips-2009-Bayesian Nonparametric Models on Decomposable Graphs
12 0.54541403 155 nips-2009-Modelling Relational Data using Bayesian Clustered Tensor Factorization
13 0.54420269 60 nips-2009-Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation
14 0.54293531 56 nips-2009-Conditional Neural Fields
15 0.54262394 226 nips-2009-Spatial Normalized Gamma Processes
16 0.54201055 64 nips-2009-Data-driven calibration of linear estimators with minimal penalties
17 0.54168391 19 nips-2009-A joint maximum-entropy model for binary neural population patterns and continuous signals
18 0.54100382 115 nips-2009-Individuation, Identification and Object Discovery
19 0.53979862 44 nips-2009-Beyond Categories: The Visual Memex Model for Reasoning About Object Relationships
20 0.53973478 218 nips-2009-Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining