nips nips2008 nips2008-33 knowledge-graph by maker-knowledge-mining

33 nips-2008-Bayesian Model of Behaviour in Economic Games


Source: pdf

Author: Debajyoti Ray, Brooks King-casas, P. R. Montague, Peter Dayan

Abstract: Classical game theoretic approaches that make strong rationality assumptions have difficulty modeling human behaviour in economic games. We investigate the role of finite levels of iterated reasoning and non-selfish utility functions in a Partially Observable Markov Decision Process model that incorporates game theoretic notions of interactivity. Our generative model captures a broad class of characteristic behaviours in a multi-round Investor-Trustee game. We invert the generative process for a recognition model that is used to classify 200 subjects playing this game against randomly matched opponents. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract Classical game theoretic approaches that make strong rationality assumptions have difficulty modeling human behaviour in economic games. [sent-19, score-0.324]

2 We investigate the role of finite levels of iterated reasoning and non-selfish utility functions in a Partially Observable Markov Decision Process model that incorporates game theoretic notions of interactivity. [sent-20, score-0.286]

3 We invert the generative process for a recognition model that is used to classify 200 subjects playing this game against randomly matched opponents. [sent-22, score-0.367]

4 1 Introduction Trust tasks such as the Dictator, Ultimatum and Investor-Trustee games provide an empirical basis for investigating social cooperation and reciprocity [11]. [sent-23, score-0.23]

5 Even in completely anonymous settings, human subjects show rich patterns of behavior that can be seen in terms of such personality concepts as charity, envy and guilt. [sent-24, score-0.121]

6 The burgeoning interaction between economic psychology and neuroscience requires formal treatments of these issues. [sent-27, score-0.084]

7 From the perspective of neuroscience, such treatments can provide a precise quantitative window into neural structures involved in assessing utilties of outcomes, capturing risk and probabilities associated with interpersonal interactions, and imputing intentions and beliefs to others. [sent-28, score-0.233]

8 In turn, evidence from brain responses associated with these factors should elucidate the neural algorithms of complex interpersonal choices, and thereby illuminate economic decision-making. [sent-29, score-0.108]

9 Here, we consider a sequence of paradigmatic trust tasks that have been used to motivate a variety of behaviorally-based economic models. [sent-30, score-0.212]

10 In brief, we provide a formalization in terms of partially observable Markov decision processes, approximating type-theoretic Bayes-Nash equilibria [8] using finite hierarchies of belief, where subjects’ private types are construed as parameters of their inequity averse utility functions [2]. [sent-31, score-0.201]

11 The game ends if the Investor chooses the safe option; alternatively, he can pass the decision to the Trustee. [sent-38, score-0.223]

12 The Trustee can now choose a fair option $[25,25] or choose to defect $[15,30]. [sent-39, score-0.102]

13 In turn, she has the option of repaying any (integer) amount of her resulting allocation to the Investor. [sent-42, score-0.075]

14 Figure 1b shows the more sophisticated game we consider, namely a multi-round, sequential, version of the Trust game [15]. [sent-45, score-0.39]

15 The fact that even in a purely anonymized setting, Investors invest at all, and Trustees reciprocate at all in games such as that of figure 1a, is a challenge to standard, money-maximizing doctrines (which expect to find the Nash equilibrium where neither happens), and pose a problem for modeling. [sent-46, score-0.141]

16 One popular strategy is to retain the notion that subjects attempt to optimize their utilities, but to include in these utilities social factors that penalize cases in which opponents win either more (crudely envy, parameterized by α) or less (guilt, parameterized by β) than themselves [2]. [sent-47, score-0.155]

17 One popular InequityAversion utility function [2] characterizes player i by the type Ti = (αi , βi ) of her utility function: U (αi , βi ) = xi − αi max{(xj − xi ), 0} − βi max{(xi − xj ), 0} (1) where xi , xj are the amounts received by players i and j respectively. [sent-48, score-0.494]

18 Social utility functions such as that of equation 1 mandate probing, belief manipulation and the like. [sent-51, score-0.139]

19 As in the standard formulation [8], players know their own types but not those of their opponents; dyads are thus playing games of incomplete information. [sent-53, score-0.343]

20 A player also has prior beliefs about their opponent that are updated in a Bayesian manner after observing the opponent’s actions. [sent-54, score-0.492]

21 This leads to an infinite hierarchy of beliefs: what the Trustee thinks of the Investor; what the Trustee thinks the Investor thinks of him; what the Trustee thinks the Investor thinks the Trustee thinks of her; and so on. [sent-56, score-0.426]

22 If players have common prior beliefs over the possible types in the game, and this prior is common knowledge, then (at least one) subjective equilibrium known as the Bayes-Nash Equilibrium (BNE), exists [8]. [sent-57, score-0.371]

23 Algorithms to compute BNE solutions have been developed but, in the general case, are NP-hard [6] and thus infeasible for complex multi-round games [9]. [sent-58, score-0.097]

24 First, a finite hierarchy of beliefs can provably approximate the equilibrium solution that arises in an infinite belief hierarchy arbitrarily closely [10], an idea that has indeed been employed in practice to compute equilibria in a multi-agent setting [5]. [sent-61, score-0.417]

25 Second, based on a whole wealth of games such as the p-Beauty game [11], it has been suggested that human subjects only employ a very restricted number of steps of strategic thinking. [sent-62, score-0.571]

26 Higher order beliefs are required as each player’s action influences the opponent’s beliefs which in turn influence their policy. [sent-69, score-0.405]

27 2 Partially Observable Markov Games As in the framework of Bayesian games, player i’s inequity aversion type Ti = (αi , βi ) is known to it, but not to the opponent. [sent-70, score-0.249]

28 Player i does have a prior distribution over the type of the other player j, (0) bi (Tj ); and, if suitably sophisticated, can also have higher-order priors over the whole hierarchy (0) (0) (0) (0) of recursive beliefs about types. [sent-71, score-0.591]

29 We denote the collection of priors as bi = {bi , bi , bi , . [sent-72, score-0.414]

30 (t) Play proceeds sequentially, with player i choosing action ai at time t according to the expected future value of this choice. [sent-76, score-0.313]

31 Given the interdependence of beliefs and actions, we expect to see probing (to find out the type and beliefs of one’s opponent) and belief manipulation (being nice now to take advantage of one’s opponent later). [sent-78, score-0.624]

32 In order to break this, we assume that each player has k levels of strategic thinking as in the Cognitive Hierarchy framework [13]. [sent-82, score-0.452]

33 Thus each k-level player assumes that his opponent is a k − 1-level player. [sent-83, score-0.307]

34 At the lowest level of the recursion, the 0-level player uses a simple likelihood to update their opponent’s beliefs. [sent-84, score-0.184]

35 (t) (t) (t) The utility Ui (ai ) is calculated at every round for each player i for action ai by marginalizing (t) over the current beliefs bi . [sent-85, score-0.758]

36 As an example, if there are only two types for a player the belief state, which is a continuous probability distribution over the interval 3 [0, 1] is discretized to take K values bi1 = 0, . [sent-88, score-0.223]

37 One characteristic of this explicit process model, or algorithmic approach, is that it is possible to consider what happens when the priors of the players differ. [sent-93, score-0.142]

38 We also verified our algorithm against the QRE and BNE solutions provided by GAMBIT ([14]) on a 1 and 2 round Trust game for k = 1, 2 respectively. [sent-95, score-0.248]

39 3 Generative Model for Investor-Trustee Game Reputation-formation plays a particularly critical role in the Investor-Trustee game, with even the most selfish players trying to benefit from cooperation, at least in the initial rounds. [sent-97, score-0.142]

40 7) such that in the last round the Trustee with type βT = 0. [sent-102, score-0.083]

41 3 will not return any amount to the Investor and will choose fair outcome if βT = 0. [sent-103, score-0.093]

42 We generate a rich tapestry of behavior by varying the prior expectations as to βT and the values of strategic (k) level (0,1,2) for the players. [sent-105, score-0.217]

43 3 shows the evolution of the Players’ Q-values and 1st-order beliefs of the Investor and 2nd-order beliefs of the Trustee (i. [sent-108, score-0.37]

44 , her beliefs as to the Investor’s beliefs about her value of βT ) over the course of a single game. [sent-110, score-0.37]

45 they are strategic players), but the Trustee is actually less guilty βT = 0. [sent-113, score-0.217]

46 This makes the Investor’s beliefs about βT go from being uniform to being about 0. [sent-116, score-0.185]

47 This causes the Q-value for the action corresponding to giving $20 dollars to be highest, inspiring the Investor’s generosity in round 2. [sent-121, score-0.123]

48 Equally, the Trustee’s (2nd-order) beliefs after receiving $15 in the first round peak for the value βT = 0. [sent-122, score-0.238]

49 In response, in rounds 5 and 7, the Trustee tries to coax the Investor. [sent-125, score-0.098]

50 We find this “reciprocal give and take” to be a characteristic behaviour of strategic Investors and Trustees (with k = 1). [sent-126, score-0.262]

51 For naive Players with k = 0, a return of a very low amount for a high amount invested would lead to a complete breakdown of Trust formation. [sent-127, score-0.182]

52 In round 1, Investors with kI = 0 and 1 offer $20 first (the optimal probing action based on uniform prior beliefs) and for kI = 2 offers $15 dollars. [sent-131, score-0.141]

53 A Trustee with kT = 0 and low βT will return nothing whereas an unconditionally cooperative Trustee (high βT ) returns roughly the same amount as received. [sent-133, score-0.219]

54 Irrespective of the Trustee’s βT type, the amount returned by strategic Trustees with kT = 1, 2 is higher (between 1. [sent-134, score-0.282]

55 In round 2 we find that the low amount received causes trust to break down for Investors with kI = 0. [sent-136, score-0.245]

56 Strategic Trustees return more initially and are able to coax naive Investors to give higher amounts in the game. [sent-138, score-0.125]

57 Generally unconditionally cooperative Trustees return more, and form Trust throughout the game if they are strategic or if they are playing against strategic Investors. [sent-139, score-0.841]

58 Trustees with low βT defect towards the end of the game but coax more investment in the beginning of the game. [sent-140, score-0.375]

59 4 Figure 3: The generated game shows the amount given by an Investor with kI = 1 and a Trustee with βT = 0. [sent-141, score-0.238]

60 The red bar indicates amount given by the Investor and the blue bar is the amount returned by the Trustee (after receiving 3 times amount given by the Investor). [sent-143, score-0.205]

61 The figures on the right reveal the inner workings of the algorithm: Q-values through the rounds of the game for 5 different actions of the Investor (0, 5, 10, 15, 20) and 5 actions of the Trustee between values 0 and 3 times amount given by Investor. [sent-144, score-0.289]

62 Also shown are the Investor’s 1st-order beliefs (left bar for βT = 0. [sent-145, score-0.212]

63 The top half shows Investor playing against Trustee with low βT (= 0. [sent-149, score-0.078]

64 The top dyad shows the amount given the Investor and the bottom dyad shows the amount returned by Trustee. [sent-152, score-0.202]

65 Within each dyad the rows represent the strategic (kI ) levels of Investor (0, 1 or 2) and the columns represent kT level of the Trustee (0, 1 or 2). [sent-153, score-0.286]

66 Two particular examples are highlighted within the dyads: Investor with kI = 0 and Trustee with kT = 2, high low uncooperative (βT ) and Investor kI = 1 and Trustee kT = 2, cooperative (βT ). [sent-155, score-0.186]

67 Lighter colours reveal higher amounts (with amount given by Investor in first round being 15 dollars). [sent-156, score-0.096]

68 The effect of strategic level is more dramatic for the Investor, since his ability to defect at any point places him in effective charge of the interaction. [sent-157, score-0.264]

69 Strategic Investors give more money in the game than naive Investors. [sent-158, score-0.269]

70 A further observation is that strategic Investors are more immune to the Trustee’s actions. [sent-160, score-0.217]

71 While this means that break-downs in the game due to 5 mistakes of the Trustee (or unfortunate choices from her softmax) are more easily corrected by the strategic Investor, he is also more likely to continue investing even if the Trustee doesn’t reciprocate. [sent-161, score-0.412]

72 The latter typically offer less in the game and are also less susceptible to the actions of their opponent. [sent-163, score-0.217]

73 Overall in this game, the Investors with kI = 1 make the most amount of money playing against a cooperative Trustee while kI = 0 Investors make the least. [sent-164, score-0.24]

74 The best dyad consists of a kI = 1 Investor playing with a cooperative Trustee with kT = 0 or 1. [sent-165, score-0.197]

75 Denote the sequence of plays in the (1) (1) (10) 10-round Investor-Trustee game as D = {[a1 , a2 ], . [sent-168, score-0.195]

76 Since the game is Markovian 2 (t) we can calculate the probability of player i taking the action sequence {ai , t = 1, . [sent-171, score-0.414]

77 This is a (0) likelihood function for Ti , bi , and so can be used for posterior inference about type given D. [sent-175, score-0.168]

78 We classify the players for their utility function (βT value for the Trustee), strategic (ToM) levels and (0) (0) prior beliefs using the MAP value (Ti∗ , bi ∗ ) = maxTi ,b(0) P (D|Ti , bi ). [sent-176, score-0.911]

79 i We used our recognition model to classify subject pairs playing the 10-round Investor-Trustee game [15]. [sent-177, score-0.282]

80 The data included 48 student pairs playing an Impersonal task for which the opponents’ identities were hidden and 54 student pairs playing a Personal task for which partners met. [sent-178, score-0.158]

81 Each Investor-Trustee pair was classified for their level of strategic thinking k and the Trustee’s βT type (cooperative/uncooperative; see the table in Figure 5). [sent-179, score-0.276]

82 The highlighted interactions reveal that many of the pairs in the Impersonal task consisted of strategic Investors and cooperative Trustees, who formed trust in the game with the levels of investment decreasing towards the end of the game. [sent-181, score-0.721]

83 We also highlight the difference between strategic and non-strategic Investors. [sent-182, score-0.217]

84 An Investor with kI = 0 will not form trust if the Trustee does not return a significant amount initially whilst an Investor with kI = 2 will continue offering money in the game even if the Trustee gives back less than fair amounts in return. [sent-183, score-0.463]

85 To test the robustness of the recognition model low we generated behaviours (450 dyads) with different values of βT (βT = [0, 0. [sent-188, score-0.086]

86 Figure 5 shows how confidently players of the given type were classified to have that type. [sent-198, score-0.172]

87 This is because the Trustees with those characteristics will offer high amounts to coax the Investor. [sent-200, score-0.093]

88 The number of subject-pairs with the classification are shown in each entry high low along with whether the Trustee was classified as uncooperative / cooperative (βT , βT ). [sent-204, score-0.161]

89 The subjects play an Impersonal game where they do not know the identities of the opponent and a Personal game where identities are revealed. [sent-205, score-0.643]

90 We also show the classification confidence for the types given the behaviour was generated from our model with other values of βT for the Trustee, as well as the type that the player is most likely to be classified as in brackets. [sent-207, score-0.259]

91 (A Trustee with low βT and kT = 1 is very likely to be misclassified as a player with kT = 2, while a player with kT = 2 will mostly be classified with kT = 2) 5 Discussion We built a generative model that captures classes of observed behavior in multi-round trust tasks. [sent-208, score-0.54]

92 We also do not suggest a normative rationale for pieces of the model such as the social utility function. [sent-211, score-0.115]

93 Nevertheless, the separation between the vagaries of utility and the exactness of inference is attractive, not the least by providing clearly distinct signals as to the inner workings of the algorithm that can be extremely useful to capture neural findings. [sent-212, score-0.093]

94 For instance, we postulate that higher activation will be observed in regions of the brain associated with theory of mind for Investors that give more in the game, and for Trustees that can coax more. [sent-216, score-0.071]

95 However, unlike [13] our Naive players still build models, albeit unsophisticated ones, of the other player (in contrast to level 0 players who assume the opponent to play a random strategy). [sent-217, score-0.613]

96 So this might lead to an investigation of how sophisticated and naive theory of mind models are built by subjects in the game. [sent-218, score-0.089]

97 We need to incorporate some of the other parameters of our model, such as the Investor’s envy and the temperature parameter of the softmax distribution in order to capture the nuances in the interactions. [sent-221, score-0.073]

98 7 Finally, this computational model provides a guide for designing experiments to probe aspects of social utility, strategic thinking levels and prior beliefs, as well as inviting ready extensions to related tasks such as Public Goods games. [sent-223, score-0.314]

99 Formulation of Bayesian analysis for games with incomplete information (1985). [sent-275, score-0.097]

100 Getting to know you: Reputation and Trust in a two-person economic exchange. [sent-314, score-0.084]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('trustee', 0.519), ('investor', 0.496), ('strategic', 0.217), ('game', 0.195), ('beliefs', 0.185), ('player', 0.184), ('trustees', 0.177), ('investors', 0.165), ('players', 0.142), ('bi', 0.138), ('trust', 0.128), ('opponent', 0.123), ('kt', 0.111), ('games', 0.097), ('ki', 0.096), ('ai', 0.094), ('cooperative', 0.093), ('economic', 0.084), ('coax', 0.071), ('utility', 0.069), ('thinks', 0.062), ('subjects', 0.062), ('impersonal', 0.059), ('ti', 0.058), ('playing', 0.057), ('hierarchy', 0.054), ('round', 0.053), ('cooperation', 0.052), ('bne', 0.047), ('defect', 0.047), ('dyad', 0.047), ('dyads', 0.047), ('money', 0.047), ('opponents', 0.047), ('uncooperative', 0.047), ('tj', 0.047), ('social', 0.046), ('behaviour', 0.045), ('aj', 0.045), ('equilibrium', 0.044), ('amount', 0.043), ('equilibria', 0.041), ('investment', 0.041), ('belief', 0.039), ('softmax', 0.038), ('personal', 0.036), ('camerer', 0.035), ('dollars', 0.035), ('envy', 0.035), ('fehr', 0.035), ('inequity', 0.035), ('reciprocity', 0.035), ('reputation', 0.035), ('unconditionally', 0.035), ('behaviours', 0.035), ('action', 0.035), ('dyadic', 0.033), ('option', 0.032), ('observable', 0.032), ('nash', 0.031), ('probing', 0.031), ('manipulation', 0.031), ('type', 0.03), ('recognition', 0.03), ('montague', 0.029), ('thinking', 0.029), ('qi', 0.029), ('bik', 0.028), ('safe', 0.028), ('naive', 0.027), ('bar', 0.027), ('rounds', 0.027), ('return', 0.027), ('highlighted', 0.025), ('sh', 0.024), ('anen', 0.024), ('averse', 0.024), ('baylor', 0.024), ('fairness', 0.024), ('gambit', 0.024), ('houston', 0.024), ('intentions', 0.024), ('interpersonal', 0.024), ('mccabe', 0.024), ('mckelvey', 0.024), ('personality', 0.024), ('sel', 0.024), ('tomlin', 0.024), ('workings', 0.024), ('fair', 0.023), ('identities', 0.023), ('generative', 0.023), ('levels', 0.022), ('offer', 0.022), ('returned', 0.022), ('play', 0.022), ('low', 0.021), ('invested', 0.021), ('partners', 0.021), ('quantal', 0.021)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000001 33 nips-2008-Bayesian Model of Behaviour in Economic Games

Author: Debajyoti Ray, Brooks King-casas, P. R. Montague, Peter Dayan

Abstract: Classical game theoretic approaches that make strong rationality assumptions have difficulty modeling human behaviour in economic games. We investigate the role of finite levels of iterated reasoning and non-selfish utility functions in a Partially Observable Markov Decision Process model that incorporates game theoretic notions of interactivity. Our generative model captures a broad class of characteristic behaviours in a multi-round Investor-Trustee game. We invert the generative process for a recognition model that is used to classify 200 subjects playing this game against randomly matched opponents. 1

2 0.097933196 141 nips-2008-Multi-Agent Filtering with Infinitely Nested Beliefs

Author: Luke Zettlemoyer, Brian Milch, Leslie P. Kaelbling

Abstract: In partially observable worlds with many agents, nested beliefs are formed when agents simultaneously reason about the unknown state of the world and the beliefs of the other agents. The multi-agent filtering problem is to efficiently represent and update these beliefs through time as the agents act in the world. In this paper, we formally define an infinite sequence of nested beliefs about the state of the world at the current time t, and present a filtering algorithm that maintains a finite representation which can be used to generate these beliefs. In some cases, this representation can be updated exactly in constant time; we also present a simple approximation scheme to compact beliefs if they become too complex. In experiments, we demonstrate efficient filtering in a range of multi-agent domains. 1

3 0.05615025 180 nips-2008-Playing Pinball with non-invasive BCI

Author: Matthias Krauledat, Konrad Grzeska, Max Sagebaum, Benjamin Blankertz, Carmen Vidaurre, Klaus-Robert Müller, Michael Schröder

Abstract: Compared to invasive Brain-Computer Interfaces (BCI), non-invasive BCI systems based on Electroencephalogram (EEG) signals have not been applied successfully for precisely timed control tasks. In the present study, however, we demonstrate and report on the interaction of subjects with a real device: a pinball machine. Results of this study clearly show that fast and well-timed control well beyond chance level is possible, even though the environment is extremely rich and requires precisely timed and complex predictive behavior. Using machine learning methods for mental state decoding, BCI-based pinball control is possible within the first session without the necessity to employ lengthy subject training. The current study shows clearly that very compelling control with excellent timing and dynamics is possible for a non-invasive BCI. 1

4 0.053281054 116 nips-2008-Learning Hybrid Models for Image Annotation with Partially Labeled Data

Author: Xuming He, Richard S. Zemel

Abstract: Extensive labeled data for image annotation systems, which learn to assign class labels to image regions, is difficult to obtain. We explore a hybrid model framework for utilizing partially labeled data that integrates a generative topic model for image appearance with discriminative label prediction. We propose three alternative formulations for imposing a spatial smoothness prior on the image labels. Tests of the new models and some baseline approaches on three real image datasets demonstrate the effectiveness of incorporating the latent structure. 1

5 0.049430896 223 nips-2008-Structure Learning in Human Sequential Decision-Making

Author: Daniel Acuna, Paul R. Schrater

Abstract: We use graphical models and structure learning to explore how people learn policies in sequential decision making tasks. Studies of sequential decision-making in humans frequently find suboptimal performance relative to an ideal actor that knows the graph model that generates reward in the environment. We argue that the learning problem humans face also involves learning the graph structure for reward generation in the environment. We formulate the structure learning problem using mixtures of reward models, and solve the optimal action selection problem using Bayesian Reinforcement Learning. We show that structure learning in one and two armed bandit problems produces many of the qualitative behaviors deemed suboptimal in previous studies. Our argument is supported by the results of experiments that demonstrate humans rapidly learn and exploit new reward structure. 1

6 0.048789192 118 nips-2008-Learning Transformational Invariants from Natural Movies

7 0.046353702 96 nips-2008-Hebbian Learning of Bayes Optimal Decisions

8 0.04552938 202 nips-2008-Robust Regression and Lasso

9 0.043397114 187 nips-2008-Psychiatry: Insights into depression through normative decision-making models

10 0.040486004 206 nips-2008-Sequential effects: Superstition or rational behavior?

11 0.040259168 92 nips-2008-Generative versus discriminative training of RBMs for classification of fMRI images

12 0.039790384 51 nips-2008-Convergence and Rate of Convergence of a Manifold-Based Dimension Reduction Algorithm

13 0.039205894 50 nips-2008-Continuously-adaptive discretization for message-passing algorithms

14 0.037662908 175 nips-2008-PSDBoost: Matrix-Generation Linear Programming for Positive Semidefinite Matrices Learning

15 0.037355501 97 nips-2008-Hierarchical Fisher Kernels for Longitudinal Data

16 0.035118509 122 nips-2008-Learning with Consistency between Inductive Functions and Kernels

17 0.03404082 177 nips-2008-Particle Filter-based Policy Gradient in POMDPs

18 0.034026798 121 nips-2008-Learning to Use Working Memory in Partially Observable Environments through Dopaminergic Reinforcement

19 0.033167388 40 nips-2008-Bounds on marginal probability distributions

20 0.032051425 88 nips-2008-From Online to Batch Learning with Cutoff-Averaging


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.091), (1, 0.04), (2, 0.024), (3, -0.034), (4, 0.021), (5, -0.023), (6, -0.021), (7, -0.002), (8, 0.03), (9, 0.013), (10, 0.036), (11, 0.039), (12, 0.006), (13, -0.008), (14, 0.001), (15, 0.02), (16, 0.023), (17, -0.077), (18, -0.025), (19, 0.005), (20, 0.047), (21, -0.078), (22, -0.071), (23, 0.028), (24, 0.039), (25, -0.096), (26, 0.009), (27, -0.0), (28, 0.074), (29, -0.029), (30, 0.028), (31, 0.066), (32, 0.102), (33, 0.025), (34, -0.014), (35, -0.151), (36, 0.087), (37, 0.013), (38, 0.039), (39, 0.089), (40, -0.064), (41, 0.041), (42, 0.042), (43, -0.111), (44, -0.003), (45, 0.09), (46, -0.048), (47, 0.002), (48, -0.017), (49, -0.041)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.93864608 33 nips-2008-Bayesian Model of Behaviour in Economic Games

Author: Debajyoti Ray, Brooks King-casas, P. R. Montague, Peter Dayan

Abstract: Classical game theoretic approaches that make strong rationality assumptions have difficulty modeling human behaviour in economic games. We investigate the role of finite levels of iterated reasoning and non-selfish utility functions in a Partially Observable Markov Decision Process model that incorporates game theoretic notions of interactivity. Our generative model captures a broad class of characteristic behaviours in a multi-round Investor-Trustee game. We invert the generative process for a recognition model that is used to classify 200 subjects playing this game against randomly matched opponents. 1

2 0.74462658 141 nips-2008-Multi-Agent Filtering with Infinitely Nested Beliefs

Author: Luke Zettlemoyer, Brian Milch, Leslie P. Kaelbling

Abstract: In partially observable worlds with many agents, nested beliefs are formed when agents simultaneously reason about the unknown state of the world and the beliefs of the other agents. The multi-agent filtering problem is to efficiently represent and update these beliefs through time as the agents act in the world. In this paper, we formally define an infinite sequence of nested beliefs about the state of the world at the current time t, and present a filtering algorithm that maintains a finite representation which can be used to generate these beliefs. In some cases, this representation can be updated exactly in constant time; we also present a simple approximation scheme to compact beliefs if they become too complex. In experiments, we demonstrate efficient filtering in a range of multi-agent domains. 1

3 0.53642321 187 nips-2008-Psychiatry: Insights into depression through normative decision-making models

Author: Quentin J. Huys, Joshua Vogelstein, Peter Dayan

Abstract: Decision making lies at the very heart of many psychiatric diseases. It is also a central theoretical concern in a wide variety of fields and has undergone detailed, in-depth, analyses. We take as an example Major Depressive Disorder (MDD), applying insights from a Bayesian reinforcement learning framework. We focus on anhedonia and helplessness. Helplessness—a core element in the conceptualizations of MDD that has lead to major advances in its treatment, pharmacological and neurobiological understanding—is formalized as a simple prior over the outcome entropy of actions in uncertain environments. Anhedonia, which is an equally fundamental aspect of the disease, is related to the effective reward size. These formulations allow for the design of specific tasks to measure anhedonia and helplessness behaviorally. We show that these behavioral measures capture explicit, questionnaire-based cognitions. We also provide evidence that these tasks may allow classification of subjects into healthy and MDD groups based purely on a behavioural measure and avoiding any verbal reports. There are strong ties between decision making and psychiatry, with maladaptive decisions and behaviors being very prominent in people with psychiatric disorders. Depression is classically seen as following life events such as divorces and job losses. Longitudinal studies, however, have revealed that a significant fraction of the stressors associated with depression do in fact follow MDD onset, and that they are likely due to maladaptive behaviors prominent in MDD (Kendler et al., 1999). Clinically effective ’talking’ therapies for MDD such as cognitive and dialectical behavior therapies (DeRubeis et al., 1999; Bortolotti et al., 2008; Gotlib and Hammen, 2002; Power, 2005) explicitly concentrate on altering patients’ maladaptive behaviors and decision making processes. Decision making is a promising avenue into psychiatry for at least two more reasons. First, it offers powerful analytical tools. Control problems related to decision making are prevalent in a huge diversity of fields, ranging from ecology to economics, computer science and engineering. These fields have produced well-founded and thoroughly characterized frameworks within which many issues in decision making can be framed. Here, we will focus on framing issues identified in psychiatric settings within a normative decision making framework. Its second major strength comes from its relationship to neurobiology, and particularly those neuromodulatory systems which are powerfully affected by all major clinically effective pharmacotherapies in psychiatry. The understanding of these systems has benefited significantly from theoretical accounts of optimal control such as reinforcement learning (Montague et al., 1996; Kapur and Remington, 1996; Smith et al., 1999; Yu and Dayan, 2005; Dayan and Yu, 2006). Such accounts may be useful to identify in more specific terms the roles of the neuromodulators in psychiatry (Smith et al., 2004; Williams and Dayan, 2005; Moutoussis et al., 2008; Dayan and Huys, 2008). ∗ qhuys@cantab.net, joshuav@jhu.edu, dayan@gatsby.ucl.ac.uk; www.gatsby.ucl.ac.uk/∼qhuys/pub.html 1 Master Yoked Control Figure 1: The learned helplessness (LH) paradigm. Three sets of rats are used in a sequence of two tasks. In the first task, rats are exposed to escapable or inescapable shocks. Shocks come on at random times. The master rat is given escapable shocks: it can switch off the shock by performing an action, usually turning a wheel mounted in front of it. The yoked rat is exposed to precisely the same shocks as the master rat, i.e its shocks are terminated when the master rat terminates the shock. Thus its shocks are inescapable, there is nothing it can do itself to terminate them. A third set of rats is not exposed to shocks. Then, all three sets of rats are exposed to a shuttlebox escape task. Shocks again come on at random times, and rats have to shuttle to the other side of the box to terminate the shock. Only yoked rats fail to acquire the escape response. Yoked rats generally fail to acquire a wide variety of instrumental behaviours, either determined by reward or, as here, by punishment contingencies. This paper represents an initial attempt at validating this approach experimentally. We will frame core notions of MDD in a reinforcement learning framework and use it to design behavioral decision making experiments. More specifically, we will concentrate on two concepts central to current thinking about MDD: anhedonia and learned helplessness (LH, Maier and Seligman 1976; Maier and Watkins 2005). We formulate helplessness parametrically as prior beliefs on aspects of decision trees, and anhedonia as the effective reward size. This allows us to use choice behavior to infer the degree to which subjects’ behavioral choices are characterized by either of these. For validation, we correlate the parameters inferred from subjects’ behavior with standard, questionnaire-based measures of hopelessness and anhedonia, and finally use the inferred parameters alone to attempt to recover the diagnostic classification. 1 Core concepts: helplessness and anhedonia The basic LH paradigm is explained in figure 1. Its importance is manifold: the effect of inescapable shock on subsequent learning is sensitive to most classes of clinically effective antidepressants; it has arguably been a motivation framework for the development of the main talking therapies for depression (cognitive behavioural therapy, Williams (1992), it has motivated the development of further, yet more specific animal models (Willner, 1997), and it has been the basis of very specific research into the cognitive basis of depression (Peterson et al., 1993). Behavioral control is the central concept in LH: yoked and master rat do not differ in terms of the amount of shock (stress) they have experienced, only in terms of the behavioural control over it. It is not a standard notion in reinforcement learning, and there are several ways one could translate the concept into RL terms. At a simple level, there is intuitively more behavioural control if, when repeating one action, the same outcome occurs again and again, than if this were not true. Thus, at a very first level, control might be related to the outcome entropy of actions (see Maier and Seligman 1976 for an early formulation). Of course, this is too simple. If all available actions deterministically led to the same outcome, the agent has very little control. Finally, if one were able to achieve all outcomes except for the one one cares about (in the rats’ case switching off or avoiding the shock), we would again not say that there is much control (see Huys (2007); Huys and Dayan (2007) for a more detailed discussion). Despite its obvious limitations, we will here concentrate on the simplest notion for reasons of mathematical expediency. 2 0.6 0.5 Exploration vs Exploitation Predictive Distributions Q(aknown)−Q(aunknown) P(reward a known ) 0.7 2 0 1 2 3 4 5 0.4 0.3 0.2 Choose blue slot machine 0.5 0 −0.5 0.1 0 1 1 2 3 4 5 Reward −1 Choose orange slot machine 1 High control Low control 2 3 4 5 Tree depth Figure 2: Effect of γ on predictions, Q-values and exploration behaviour. Assume a slot machine (blue) has been chosen five times, with possible rewards 1-5, and that reward 2 has been obtained twice, and reward 4 three times (inset in left panel). Left: Predictive distribution for a prior with negative γ (low control) in light gray, and large γ (extensive control) in dark gray. We see that, if the agent believes he has much control (and outcome distributions have low entropy), the predictive distribution puts all mass on the observations. Right: Assume now the agent gets up to 5 more pulls (tree depth 1-5) between the blue slot machine and a new, orange slot machine. The orange slot machine’s predictive distribution is flat as it has never been tried, and its expected value is therefore 3. The plot shows the difference between the values for the two slot machines. First consider the agent only has one more pull to take. In this case, independently of the priors about control, the agent will choose the blue machine, because it is just slightly better than average. Note though that the difference is more pronounced if the agent has a high control prior. But things change if the agent has two or more choices. Now, it is worth trying out the new machine if the agent has a high-control prior. For in that case, if the new machine turns out to yield a large reward on the first try, it is likely to do so again for the second and subsequent times. Thus, the prior about control determines the exploration bonus. The second central concept in current conceptions of MDD is that of reward sensitivity. Anhedonia, an inability to enjoy previously enjoyable things, is one of two symptoms necessary for the diagnosis of depression (American Psychiatric Association, 1994). A number of tasks in the literature have attempted to measure reward sensitivity behaviourally. While these generally concur in finding decreased reward sensitivity in subjects with MDD, these results need further clarification. Some studies show interactions between reward and punishment sensitivities with respect to MDD, but important aspects of the tasks are not clearly understood. For instance, Henriques et al. (1994); Henriques and Davidson (2000) show decreased resonsiveness of MDD subjects to rewards, but equally show decreased resonsiveness of healthy subjects to punishments. Pizzagalli et al. (2005) introduced an asymmetrically rewarded perceptual discrimination task and show that the rate of change of the response bias is anticorrelated with subjects’ anhedonic symptoms. Exactly how decreased reward responsivity can account for this is at pressent not clear. Great care has to be taken to disentangle these two concepts. Anhedonia and helplessness both provide good reasons for not taking an action: either because the reinforcements associated with the action are insufficient (anhedonia), or because the outcome is not judged a likely result of taking some particular action (if actions are thought to have large outcome entropy). 2 A Bayesian formulation of control We consider a scenario where subjects have no knowledge of the outcome distributions of actions, but rather learn about them. This means that their prior beliefs about the outcome distributions are not overwhelmed by the likelihood of observations, and may thus have measurable effects on their action choices. In terms of RL, this means that agents do not know the decision tree of the problem they face. Control is formulated as a prior distribution on the outcome distributions, and thereby as a prior distribution on the decision trees. The concentration parameter α of a Dirichlet process can very simply parametrise entropy, and, if used as a prior, allow for very efficient updates of the predictive distributions of actions. Let us assume we have actions A which have as outcomes rewards R, and keep count Nt (r, a) = 3 k:k < 0. Here, we included a regressor for the AGE as that was a confounding variable in our subject sample. Furthermore, if it is true that anhedonia, as expressed by the questionnaire, relates to reward sensitivity specifically, we should be able to write a similar regression for the learning rate ǫ (from equation 5) ǫ(BDIa, AGE) = θǫ BDIa + cǫ AGE + ζǫ but find that θǫ is not different from zero. Figure 4 shows the ML values for the parameters of interest (emphasized in blue in the equations) and confirms that people who express higher levels of anhedonia do indeed show less reward sensitivity, but do not differ in terms of learning rate. If it were the case that subjects with higher BDIa score were just less attentive to the task, one might also expect an effect of BDIa on learning rate. 3.2 Control Validation: The control task is new, and we first need to ascertain that subjects were indeed sensitive to main features of the task. We thus fit both a RW-learning rule (as in the previous section, but adjusted for the varying number of available actions), and the full control model. Importantly, both these models have two parameters, but only the full control model has a notion of outcome entropy, and evaluations a tree. The chance probability of subjects’ actions was 0.37, meaning that, on average, there were just under three machines on the screen. The probability of the actions under the RW-learning rule was better at 0.48, and that of the full control model 0.54. These differences are highly significant as the total number of choices is 29600. Thus, we conclude that subjects were indeed sensitive to the manipulation of outcome entropy, and that they did look ahead in a tree. Prior belief about control: Applying the procedure from the previous task to the main task, we write the main parameters of equations 2 and 4 as functions of the questionnaire measures and infer linear parameters: γ1 (BDIa, BHS, age) = χγ1 BHS + θγ1 BDIa + cγ1 AGE + ζγ1 γ2 (BDIa, BHS, age) = χγ2 BHS + θγ2 BDIa + cγ2 AGE + ζγ2 β(BDIa, BHS, age) = χβ BHS + θβ BDIa + cβ AGE + ζβ Importantly, because the BDIa scores and the BHS scores are correlated in our sample (they tend to be large for the subjects with MDD), we include the cross-terms (θγ1 , θγ2 , χγ ), as we are interested in the specific effects of BDIa on β, as before, and of BHS on γ. 6 3 control γ 2 Figure 6: Classification. Controls are shown as black dots, and depressed subjects as red crosses. The blue line is a linear classifier. Thus, the patients and controls can be approximately classified purely on the basis of behaviour. 1 0 83% correct 69% sensitivity 94% specificity −1 −2 2 4 6 8 10 12 14 16 reward sensitivity β We here infer and display two separate values γ1 and γ2 . These correspond to the level of control in the first and the second half of the experiment. In fact, to parallel the LH experiments better, the slot machines in the first 50 rooms were actually very noisy (low true γ), which means that subjects were here exposed to low levels of control just like the yoked rats in the original experiment. In the second half of the experiment on the other hand, slot machines tended to be quite reliable (high true γ). Figure 5 shows again the ML values for the parameters of interest (emphasized in blue in the equations). Again, we find that our parameter estimate are very significantly different from zero (> three standard deviations). The effect of the BHS score on the prior beliefs about control γ is much stronger in the second half than of the experiment in the first half, i.e. the effect of BHS on the prior belief about control is particularly prominent when subjects are in a high-control environment and have previously been exposed to a low-control environment. This is an interesting parallel to the learned helplessness experiments in animals. 3.3 Classification Finally we combine the two tasks. We integrate out the learning rate ǫ, which we had found not be related to the questionnaire measures (c.f. figure 4), and use the distribution over β from the first task as a prior distribution on β for the second task. We also put weak priors on γ and infer both β and γ for the second task on a subject-by-subject basis. Figure 6 shows the posterior values for γ and β for MDD and healthy subjects and the ability of a linear classifier to classify them. 4 Discussion In this paper, we have attempted to provide a specific formulation of core psychiatric concepts in reinforcement learning terms, i.e. hopelessness as a prior belief about controllability, and anhedonia as reward sensitivity. We have briefly explained how we expect these formulations to have effect in a behavioural situation, have presented a behavioral task explicitly designed to be sensitive to our formulations, and shown that people’s verbal expression of hopelessness and anhedonia do have specific behavioral impacts. Subjects who express anhedonia display insensitivity to rewards and those expressing hopelessness behave as if they had prior beliefs that outcome distributions of actions (slot machines) are very broad. Finally, we have shown that these purely behavioural measures are also predictive of their psychiatric status, in that we were able to classify patients and healthy controls purely on the basis of performance. Several aspects of this work are novel. There have been previous attempts to map aspects of psychiatric dysfunction onto specific parametrizations (Cohen et al., 1996; Smith et al., 2004; Williams and Dayan, 2005; Moutoussis et al., 2008), but we believe that our work represents the first attempt to a) apply it to MDD; b) make formal predictions about subject behavior c) present strong evidence linking anhedonia specifically to reward insensitivity across two tasks d) combine tasks to tease helplessness and anhedonia apart and e) to use the behavioral inferences for classification. The latter point is particularly important, as it will determine any potential clinical significance (Veiel, 1997). In the future, rather than cross-validating with respect to say DSM-IV criteria, it may also be important to validate measures such as ours in their own right in longitudinal studies. 7 Several important caveats do remain. First, the populations are not fully matched for age. We included age as an additional regressor and found all results to be robust. Secondly, only the healthy subjects were remunerated. However, repeating the analyses presented using only the MDD subjects yields the same results (data not shown). Thirdly, we have not yet fully mirrored the LH experiments. We have so far only tested the transfer from a low-control environment to a high-control environment. To make statements like those in animal learned helplessness experiments, the transfer from high-control to low-control environments will need to be examined, too. Fourth, the notion of control we have used is very simple, and more complex notions should certainly be tested (see Dayan and Huys 2008). Fifth, and maybe most importantly, we have so far only attempted to classify MDD and healthy subjects, and can thus not yet make any statements about the specificity of these effects with respect to MDD. Finally, it will be important to replicate these results independently, and possibly in a different modality. Nevertheless, we believe these results to be very encouraging. Acknowledgments: This work would not have been possible without the help of Sarah Hollingsworth Lisanby, Kenneth Miller and Ramin V. Parsey. We would also like to thank Nathaniel Daw and Hanneke EM Den Ouden and Ren´ Hen for invaluable discussions. Support for this work was provided by the Gatsby Charitable e Foundation (PD), a UCL Bogue Fellowship and the Swartz Foundation (QH) and a Columbia University startup grant to Kenneth Miller. References American Psychiatric Association (1994). Diagnostic and Statistical Manual of Mental Disorders. American Psychiatric Association Press. Bortolotti, B., Menchetti, M., Bellini, F., Montaguti, M. B., and Berardi, D. (2008). Psychological interventions for major depression in primary care: a meta-analytic review of randomized controlled trials. Gen Hosp Psychiatry, 30(4):293–302. Cohen, J. D., Braver, T. S., and O’Reilly, R. C. (1996). A computational approach to prefrontal cortex, cognitive control and schizophrenia: recent developments and current challenges. Philos Trans R Soc Lond B Biol Sci, 351(1346):1515–1527. Daw, N. D., O’Doherty, J. P., Dayan, P., Seymour, B., and Dolan, R. J. (2006). Cortical substrates for exploratory decisions in humans. Nature, 441(7095):876–879. Dayan, P. and Huys, Q. J. M. (2008). Serotonin, inhibition, and negative mood. PLoS Comput Biol, 4(2):e4. Dayan, P. and Yu, A. J. (2006). Phasic norepinephrine: a neural interrupt signal for unexpected events. Network, 17(4):335– 350. DeRubeis, R. J., Gelfand, L. A., Tang, T. Z., and Simons, A. D. (1999). Medications versus cognitive behavior therapy for severely depressed outpatients: mega-analysis of four randomized comparisons. Am J Psychiatry, 156(7):1007–1013. First, M. B., Spitzer, R. L., Gibbon, M., and Williams, J. B. (2002a). Structured Clinical Interview for DSM-IV-TR Axis I Disorders, Research Version, Non-Patient Edition. (SCID-I/NP). Biometrics Research, New York State Psychiatric Institute. First, M. B., Spitzer, R. L., Gibbon, M., and Williams, J. B. (2002b). Structured Clinical Interview for DSM-IV-TR Axis I Disorders, Research Version, Patient Edition. (SCID-I/P). Biometrics Research, New York State Psychiatric Institute. Gotlib, I. H. and Hammen, C. L., editors (2002). Handbook of Depression. The Guilford Press. Henriques, J. B. and Davidson, R. J. (2000). Decreased responsiveness to reward in depression. Cognition and Emotion, 14(5):711–24. Henriques, J. B., Glowacki, J. M., and Davidson, R. J. (1994). Reward fails to alter response bias in depression. J Abnorm Psychol, 103(3):460–6. Huys, Q. J. M. (2007). Reinforcers and control. Towards a computational ætiology of depression. PhD thesis, Gatsby Computational Neuroscience Unit, UCL, University of London. Huys, Q. J. M. and Dayan, P. (2007). A bayesian formulation of behavioral control. Under Review, 0:00. Kapur, S. and Remington, G. (1996). Serotonin-dopamine interaction and its relevance to schizophrenia. Am J Psychiatry, 153(4):466–76. Kendler, K. S., Karkowski, L. M., and Prescott, C. A. (1999). Causal relationship between stressful life events and the onset of major depression. Am. J. Psychiatry, 156:837–41. Maier, S. and Seligman, M. (1976). Learned Helplessness: Theory and Evidence. Journal of Experimental Psychology: General, 105(1):3–46. Maier, S. F. and Watkins, L. R. (2005). Stressor controllability and learned helplessness: the roles of the dorsal raphe nucleus, serotonin, and corticotropin-releasing factor. Neurosci. Biobehav. Rev., 29(4-5):829–41. Montague, P. R., Dayan, P., and Sejnowski, T. J. (1996). A framework for mesencephalic dopamine systems based on predictive hebbian learning. J. Neurosci., 16(5):1936–47. Moutoussis, M., Bentall, R. P., Williams, J., and Dayan, P. (2008). A temporal difference account of avoidance learning. Network, 19(2):137–160. Peterson, C., Maier, S. F., and Seligman, M. E. P. (1993). Learned Helplessness: A theory for the age of personal control. OUP, Oxford, UK. Pizzagalli, D. A., Jahn, A. L., and O’Shea, J. P. (2005). Toward an objective characterization of an anhedonic phenotype: a signal-detection approach. Biol Psychiatry, 57(4):319–327. Power, M., editor (2005). Mood Disorders: A Handbook of Science and Practice. John Wiley and Sons, paperback edition. Smith, A., Li, M., Becker, S., and Kapur, S. (2004). A model of antipsychotic action in conditioned avoidance: a computational approach. Neuropsychopharm., 29(6):1040–9. Smith, K. A., Morris, J. S., Friston, K. J., Cowen, P. J., and Dolan, R. J. (1999). Brain mechanisms associated with depressive relapse and associated cognitive impairment following acute tryptophan depletion. Br. J. Psychiatry, 174:525–9. Veiel, H. O. F. (1997). A preliminary profile of neuropsychological deficits associated with major depression. J. Clin. Exp. Neuropsychol., 19:587–603. Williams, J. and Dayan, P. (2005). Dopamine, learning, and impulsivity: a biological account of attentiondeficit/hyperactivity disorder. J Child Adolesc Psychopharmacol, 15(2):160–79; discussion 157–9. Williams, J. M. G. (1992). The psychological treatment of depression. Routledge. Willner, P. (1997). Validity, reliability and utility of the chronic mild stress model of depression: a 10-year review and evaluation. Psychopharm, 134:319–29. Yu, A. J. and Dayan, P. (2005). Uncertainty, neuromodulation, and attention. Neuron, 46(4):681–692. 8

4 0.51618618 211 nips-2008-Simple Local Models for Complex Dynamical Systems

Author: Erik Talvitie, Satinder P. Singh

Abstract: We present a novel mathematical formalism for the idea of a “local model” of an uncontrolled dynamical system, a model that makes only certain predictions in only certain situations. As a result of its restricted responsibilities, a local model may be far simpler than a complete model of the system. We then show how one might combine several local models to produce a more detailed model. We demonstrate our ability to learn a collection of local models on a large-scale example and do a preliminary empirical comparison of learning a collection of local models and some other model learning methods. 1

5 0.40836608 212 nips-2008-Skill Characterization Based on Betweenness

Author: Ozgur Simsek, Andre S. Barreto

Abstract: We present a characterization of a useful class of skills based on a graphical representation of an agent’s interaction with its environment. Our characterization uses betweenness, a measure of centrality on graphs. It captures and generalizes (at least intuitively) the bottleneck concept, which has inspired many of the existing skill-discovery algorithms. Our characterization may be used directly to form a set of skills suitable for a given task. More importantly, it serves as a useful guide for developing incremental skill-discovery algorithms that do not rely on knowing or representing the interaction graph in its entirety. 1

6 0.39990363 119 nips-2008-Learning a discriminative hidden part model for human action recognition

7 0.37456673 180 nips-2008-Playing Pinball with non-invasive BCI

8 0.3477914 50 nips-2008-Continuously-adaptive discretization for message-passing algorithms

9 0.33807641 116 nips-2008-Learning Hybrid Models for Image Annotation with Partially Labeled Data

10 0.32790491 223 nips-2008-Structure Learning in Human Sequential Decision-Making

11 0.32624847 175 nips-2008-PSDBoost: Matrix-Generation Linear Programming for Positive Semidefinite Matrices Learning

12 0.32286486 70 nips-2008-Efficient Inference in Phylogenetic InDel Trees

13 0.31219104 118 nips-2008-Learning Transformational Invariants from Natural Movies

14 0.31086376 96 nips-2008-Hebbian Learning of Bayes Optimal Decisions

15 0.30911338 13 nips-2008-Adapting to a Market Shock: Optimal Sequential Market-Making

16 0.29551527 222 nips-2008-Stress, noradrenaline, and realistic prediction of mouse behaviour using reinforcement learning

17 0.29543456 49 nips-2008-Clusters and Coarse Partitions in LP Relaxations

18 0.28125486 97 nips-2008-Hierarchical Fisher Kernels for Longitudinal Data

19 0.27810284 122 nips-2008-Learning with Consistency between Inductive Functions and Kernels

20 0.27717486 87 nips-2008-Fitted Q-iteration by Advantage Weighted Regression


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(4, 0.014), (6, 0.038), (7, 0.05), (12, 0.03), (15, 0.026), (18, 0.448), (28, 0.11), (57, 0.058), (59, 0.017), (63, 0.016), (71, 0.023), (77, 0.041), (78, 0.017), (83, 0.02)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.81151652 33 nips-2008-Bayesian Model of Behaviour in Economic Games

Author: Debajyoti Ray, Brooks King-casas, P. R. Montague, Peter Dayan

Abstract: Classical game theoretic approaches that make strong rationality assumptions have difficulty modeling human behaviour in economic games. We investigate the role of finite levels of iterated reasoning and non-selfish utility functions in a Partially Observable Markov Decision Process model that incorporates game theoretic notions of interactivity. Our generative model captures a broad class of characteristic behaviours in a multi-round Investor-Trustee game. We invert the generative process for a recognition model that is used to classify 200 subjects playing this game against randomly matched opponents. 1

2 0.61247218 186 nips-2008-Probabilistic detection of short events, with application to critical care monitoring

Author: Norm Aleks, Stuart Russell, Michael G. Madden, Diane Morabito, Kristan Staudenmayer, Mitchell Cohen, Geoffrey T. Manley

Abstract: We describe an application of probabilistic modeling and inference technology to the problem of analyzing sensor data in the setting of an intensive care unit (ICU). In particular, we consider the arterial-line blood pressure sensor, which is subject to frequent data artifacts that cause false alarms in the ICU and make the raw data almost useless for automated decision making. The problem is complicated by the fact that the sensor data are averaged over fixed intervals whereas the events causing data artifacts may occur at any time and often have durations significantly shorter than the data collection interval. We show that careful modeling of the sensor, combined with a general technique for detecting sub-interval events and estimating their duration, enables detection of artifacts and accurate estimation of the underlying blood pressure values. Our model’s performance identifying artifacts is superior to two other classifiers’ and about as good as a physician’s. 1

3 0.56908309 97 nips-2008-Hierarchical Fisher Kernels for Longitudinal Data

Author: Zhengdong Lu, Jeffrey Kaye, Todd K. Leen

Abstract: We develop new techniques for time series classification based on hierarchical Bayesian generative models (called mixed-effect models) and the Fisher kernel derived from them. A key advantage of the new formulation is that one can compute the Fisher information matrix despite varying sequence lengths and varying sampling intervals. This avoids the commonly-used ad hoc replacement of the Fisher information matrix with the identity which destroys the geometric invariance of the kernel. Our construction retains the geometric invariance, resulting in a kernel that is properly invariant under change of coordinates in the model parameter space. Experiments on detecting cognitive decline show that classifiers based on the proposed kernel out-perform those based on generative models and other feature extraction routines, and on Fisher kernels that use the identity in place of the Fisher information.

4 0.42141515 101 nips-2008-Human Active Learning

Author: Rui M. Castro, Charles Kalish, Robert Nowak, Ruichen Qian, Tim Rogers, Xiaojin Zhu

Abstract: We investigate a topic at the interface of machine learning and cognitive science. Human active learning, where learners can actively query the world for information, is contrasted with passive learning from random examples. Furthermore, we compare human active learning performance with predictions from statistical learning theory. We conduct a series of human category learning experiments inspired by a machine learning task for which active and passive learning error bounds are well understood, and dramatically distinct. Our results indicate that humans are capable of actively selecting informative queries, and in doing so learn better and faster than if they are given random training data, as predicted by learning theory. However, the improvement over passive learning is not as dramatic as that achieved by machine active learning algorithms. To the best of our knowledge, this is the first quantitative study comparing human category learning in active versus passive settings. 1

5 0.31398222 223 nips-2008-Structure Learning in Human Sequential Decision-Making

Author: Daniel Acuna, Paul R. Schrater

Abstract: We use graphical models and structure learning to explore how people learn policies in sequential decision making tasks. Studies of sequential decision-making in humans frequently find suboptimal performance relative to an ideal actor that knows the graph model that generates reward in the environment. We argue that the learning problem humans face also involves learning the graph structure for reward generation in the environment. We formulate the structure learning problem using mixtures of reward models, and solve the optimal action selection problem using Bayesian Reinforcement Learning. We show that structure learning in one and two armed bandit problems produces many of the qualitative behaviors deemed suboptimal in previous studies. Our argument is supported by the results of experiments that demonstrate humans rapidly learn and exploit new reward structure. 1

6 0.3130284 66 nips-2008-Dynamic visual attention: searching for coding length increments

7 0.31211644 118 nips-2008-Learning Transformational Invariants from Natural Movies

8 0.3112182 216 nips-2008-Sparse probabilistic projections

9 0.30989897 90 nips-2008-Gaussian-process factor analysis for low-dimensional single-trial analysis of neural population activity

10 0.30964482 231 nips-2008-Temporal Dynamics of Cognitive Control

11 0.30954978 200 nips-2008-Robust Kernel Principal Component Analysis

12 0.30802178 62 nips-2008-Differentiable Sparse Coding

13 0.30729875 197 nips-2008-Relative Performance Guarantees for Approximate Inference in Latent Dirichlet Allocation

14 0.30712399 192 nips-2008-Reducing statistical dependencies in natural signals using radial Gaussianization

15 0.30693948 63 nips-2008-Dimensionality Reduction for Data in Multiple Feature Representations

16 0.30689698 138 nips-2008-Modeling human function learning with Gaussian processes

17 0.30669591 4 nips-2008-A Scalable Hierarchical Distributed Language Model

18 0.30655986 79 nips-2008-Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning

19 0.30643913 30 nips-2008-Bayesian Experimental Design of Magnetic Resonance Imaging Sequences

20 0.30533296 135 nips-2008-Model Selection in Gaussian Graphical Models: High-Dimensional Consistency of \boldmath$\ell 1$-regularized MLE