nips nips2011 nips2011-122 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Faisal Khan, Bilge Mutlu, Xiaojin Zhu
Abstract: We study the empirical strategies that humans follow as they teach a target concept with a simple 1D threshold to a robot.1 Previous studies of computational teaching, particularly the teaching dimension model and the curriculum learning principle, offer contradictory predictions on what optimal strategy the teacher should follow in this teaching task. We show through behavioral studies that humans employ three distinct teaching strategies, one of which is consistent with the curriculum learning principle, and propose a novel theoretical framework as a potential explanation for this strategy. This framework, which assumes a teaching goal of minimizing the learner’s expected generalization error at each iteration, extends the standard teaching dimension model and offers a theoretical justification for curriculum learning. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract We study the empirical strategies that humans follow as they teach a target concept with a simple 1D threshold to a robot. [sent-4, score-0.248]
2 1 Previous studies of computational teaching, particularly the teaching dimension model and the curriculum learning principle, offer contradictory predictions on what optimal strategy the teacher should follow in this teaching task. [sent-5, score-2.179]
3 We show through behavioral studies that humans employ three distinct teaching strategies, one of which is consistent with the curriculum learning principle, and propose a novel theoretical framework as a potential explanation for this strategy. [sent-6, score-1.114]
4 This framework, which assumes a teaching goal of minimizing the learner’s expected generalization error at each iteration, extends the standard teaching dimension model and offers a theoretical justification for curriculum learning. [sent-7, score-1.964]
5 Computational teaching has been well studied in the machine learning community [9, 12, 10, 1, 2, 11, 13, 18, 4, 14, 15]. [sent-9, score-0.869]
6 A better understanding of the teaching strategies that humans follow might inspire the development of new machine learning models and the design of learning agents that more naturally accommodate these strategies. [sent-12, score-0.967]
7 Studies of computational teaching have followed two prominent threads. [sent-13, score-0.869]
8 However, they make conflicting predictions on what optimal strategy a teacher should follow in a simple teaching task. [sent-17, score-1.104]
9 This conflict serves as an opportunity to compare these predictions to human teaching strategies in the same task. [sent-18, score-0.963]
10 This paper makes two main contributions: (i) it enriches our empirical understanding of human teaching and (ii) it offers a theoretical explanation for a particular teaching strategy humans follow. [sent-19, score-1.933]
11 We first conduct a behavioral study with human participants in which participants teach a robot, following teaching strategies of their choice. [sent-21, score-1.378]
12 This approach differs from most previous studies of computational teaching in machine learning and psychology that involve a predetermined teaching strategy and that focus on the behavior of the learner rather than the teacher. [sent-22, score-1.939]
13 We then compare the observed human teaching strategies to those predicted by the teaching dimension model and the curriculum learning principle. [sent-23, score-2.038]
14 Empirical results indicate that human teachers follow the curriculum learning principle, while no evidence of the teaching dimension model is observed. [sent-30, score-1.174]
15 Finally, we provide a novel theoretical analysis that extends recent ideas in teaching dimension model [13, 3] and offers curriculum learning a rigorous underpinning. [sent-31, score-1.095]
16 2 Competing Models of Teaching We first review the classic teaching dimension model [9, 1]. [sent-32, score-0.954]
17 A set of instances is called a teaching set of a concept h with respect to H, if h is the only concept in H that is consistent with the set. [sent-44, score-1.045]
18 The teaching dimension of h with respect to H is the minimum size of its teaching set. [sent-45, score-1.8]
19 The teaching dimension of H is the maximum teaching dimension of its concepts. [sent-46, score-1.862]
20 Then, the teaching dimension of most hj is 2, as one needs the minimum teaching set {(xj , 0), (xj+1 , 1)}; for the special cases h0 = {x1 , . [sent-60, score-1.834]
21 , xn } and hn = ∅ the teaching dimension is 1 with the teaching set {(x1 , 1)} and {(xn , 0)}, respectively. [sent-63, score-1.829]
22 For our purpose, the most important argument is the following: The teaching strategy for most hj ’s suggested by teaching dimension is to show two instances {(xj , 0), (xj+1 , 1)} closest to the decision boundary. [sent-65, score-2.035]
23 Alternatively, curriculum learning suggests an easy-to-hard (or clear-to-ambiguous) teaching strategy [4]. [sent-67, score-1.095]
24 For the target concept in Figure 1, “easy” instances are those farthest from the decision boundary in each class, while “hard” ones are the closest to the boundary. [sent-68, score-0.275]
25 One such teaching strategy is to present instances from alternating classes, e. [sent-69, score-1.025]
26 Such a strategy has been used for secondlanguage teaching in humans. [sent-75, score-0.951]
27 Hence, for the task in Figure 1, we have two sharply contrasting teaching strategies at hand: the boundary strategy starts near the decision boundary, while the extreme strategy starts with extreme instances and gradually approaches the decision boundary from both sides. [sent-80, score-1.693]
28 Our goal in this paper is to compare human teaching strategies with these two predictions to shed more light on models of teaching. [sent-81, score-0.963]
29 While the teaching task used in our exploration is simple, as most real-world teaching situations do not involve a threshold in a 1D space, we believe that it is important to lay the foundation in a tractable task before studying more complex tasks. [sent-82, score-1.757]
30 3 A Human Teaching Behavioral Study Under IRB approval, we conducted a behavioral study with human participants to explore human teaching behaviors in a task similar to that illustrated in Figure 1. [sent-83, score-1.181]
31 In our study, participants teach the target concept of “graspability”—whether an object can be grasped and picked up with one hand—to a robot. [sent-84, score-0.292]
32 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 iteration t (a) (b) Figure 2: (a) A participant performing the card sorting/labeling and teaching tasks. [sent-94, score-0.951]
33 (b) Human teaching sequences that follow the extreme strategy gradually shrink the version space V1 . [sent-95, score-1.143]
34 The robot also affords embodied behavioral cues that facilitate natural interaction and teaching strategies that computers do not afford. [sent-100, score-1.055]
35 It also corresponds to the no-feedback assumption in most teaching models [3]. [sent-116, score-0.869]
36 In the first subtask, participants sorted the objects based on their subjective ratings of their graspability following the steps below. [sent-121, score-0.376]
37 To provide baselines on the two ends of the graspability spectrum, we fixed a highly graspable object (a toothbrush) and a highly non-graspable object (a building) on the two ends of the ruler. [sent-123, score-0.325]
38 The sorted cards and the decision boundary from one of the participants is illustrated in Figure 3. [sent-134, score-0.393]
39 In step 3, we asked participants to leave the room for a short duration so that “the robot could examine the sorted cards on the table without looking at the labels provided at the back,” creating the impression that the learner will associate the cards with the corresponding values x1 , . [sent-135, score-0.488]
40 In the second subtask, participants taught the robot the (binary) concept of graspability using the cards. [sent-139, score-0.391]
41 In this task, participants picked up a card from the table, turned toward the robot, and held the card up while providing a verbal description of the object’s graspability (i. [sent-140, score-0.336]
42 In the “natural” condition, participants were allowed to use natural language to describe the graspability of the objects, while those in the “constrained” condition were only allowed 3 to say either “graspable” or “not graspable. [sent-145, score-0.311]
43 The teaching sequences from all participants are presented in Figure 4. [sent-149, score-1.05]
44 In theory, following the boundary strategy, the teacher should start with teaching instances on these two lines as suggested by the teaching dimension model. [sent-155, score-2.109]
45 The black line and dots represent the participant’s teaching sequence. [sent-160, score-0.869]
46 For example, participant P01 started teaching at t = 1 with an object she rated as x = 1 and labeled as “graspable;” at t = 2, she chose an example with rating x = 0 and label “not graspable;” and so on. [sent-161, score-1.018]
47 The average teaching sequence had approximately 8 examples, while the longest teaching sequence had a length of 15 examples. [sent-162, score-1.78]
48 We categorized most teaching sequences into these three strategies following a simple heuristic. [sent-164, score-0.94]
49 Then, we assigned the sequences whose first two teaching examples had different labels to the extreme strategy and the others to the linear strategy. [sent-166, score-1.099]
50 In fact, among all 31 participants, 20 started teaching with the most graspable object (according to their own rating), 6 with the least graspable, none in or around the ambiguous region (as boundary strategy would predict), and 5 with some other objects. [sent-173, score-1.25]
51 In brief, people showed a tendency to start teaching with extreme objects, especially the most graspable ones. [sent-174, score-1.109]
52 During post-interview, when asked why they did not start with objects around their decision boundary, most participants mentioned that they wanted to start with clear examples of graspability. [sent-175, score-0.244]
53 For participants who followed the extreme strategy, we are interested in whether their teaching sequences approach the decision boundary as curriculum learning predicts. [sent-176, score-1.424]
54 Specifically, at any time t, let the partial teaching sequence be (x1 , y1 ), . [sent-177, score-0.89]
55 The aforementioned ambiguous region with respect to this partial sequence is the interval between the inner-most pair of teaching examples with different labels. [sent-181, score-0.91]
56 4 The extreme strategy P03, natural P01, natural P13, natural P15, natural P25, natural P06, constrained P31, natural 15 15 10 10 10 10 10 10 10 0 0 0. [sent-191, score-0.365]
57 5 x 1 The linear strategy P05, natural P07, natural P09, natural P11, natural P17, natural P23, natural P19, natural 15 10 10 10 10 5 0 0 5 0. [sent-205, score-0.257]
58 We observed that these participants elaborated in English to the robot why they thought that their objects were graspable. [sent-226, score-0.28]
59 We posit that although our teaching task was designed to mimic the one-dimensional task illustrated in Figure 1 (e. [sent-233, score-0.869]
60 , the linear layout of the cards in Figure 3), our teachers might still have believed (perhaps subconsciously) that the robot learner, like humans, associates each teaching object with multiple feature dimensions. [sent-235, score-1.065]
61 Note that the classic teaching dimension model [9] fails to predict the extreme strategy even under this assumption. [sent-237, score-1.122]
62 Our analysis is inspired by recent advances in teaching dimension, which assume that teaching progresses in iterations and learning is to be maximized after each iteration [13, 3]. [sent-238, score-1.754]
63 Departing from classic teaching models, we consider a “pool-based sequential” teaching setting. [sent-249, score-1.761]
64 The teacher can only sequentially teach 2 instances selected from the pool (e. [sent-258, score-0.311]
65 2 As in classic teaching models, our learner is consistent (i. [sent-269, score-0.98]
66 , it never contradicts with the teaching instances it receives). [sent-271, score-0.943]
67 , the set of hypotheses that is consistent with the teaching sequence (x1 , y1 ), . [sent-274, score-0.932]
68 The version space can be thought of as the union of inner intervals surviving the teaching examples. [sent-278, score-0.885]
69 Second, similar to the randomized learners in [2], our learner selects a hypothesis h uniformly from the version space V (t), follows it until when h is no longer in V (t), and then randomly selects a replacement hypothesis—a strategy known as “win stay, lose shift” in cognitive psychology [5]. [sent-279, score-0.257]
70 2 Starting with Extreme Teaching Instances is Asymptotically Optimal We now show why starting with extreme teaching instances as in curriculum learning, as opposed to the boundary strategy, is optimal under our setting. [sent-284, score-1.272]
71 Specifically, we consider the problem of selecting an optimal teaching sequence of length t = 2, one positive and one negative, (x1 , 1), (x2 , 0). [sent-285, score-0.89]
72 Introducing the shorthand a ≡ x11 , b ≡ x21 , the teacher seeks a, b to minimize the risk: min R(2) (1) a,b∈[0,1] Note that we allow a, b to take any value within their domains, which is equivalent to having an infinite pool for the teacher to choose from. [sent-286, score-0.315]
73 For any teaching sequence of length 2, the individual intervals of the version space are of size |V1 (2)| = a − b, |Vk (2)| = |x1k − x2k | for k = 2 . [sent-289, score-0.906]
74 Figure 5(a) shows that for all h1θ1 1 ∈ V1 (2), the decision boundary is parallel to the true decision boundary and the test error is E(x,y)∼p(x,y) 1{h1θ1 1 (x)=y} = |θ1 − 1/2|. [sent-294, score-0.303]
75 Figure 5(b) shows that for all hkθk s ∈ ∪d Vk (2), the decision boundary is orthogonal to the true decision boundary and the test error k=2 is 1/2. [sent-295, score-0.288]
76 Consequently, our human teachers might not have expected the robot learner to perform information integration either. [sent-301, score-0.229]
77 (c) Theoretical teaching sequences gradually shrink |V1 |, similar to human behaviors. [sent-313, score-0.995]
78 Here, the learner can form so many bad hypotheses in the many wrong dimensions that the best strategy for the teacher is to make V1 (2) as large as possible, even though many hypotheses in V1 (2) have nonzero error. [sent-333, score-0.346]
79 2 2γ Corollary 2 has an interesting cognitive interpretation; the teacher only needs to pay attention to the relevant (first) dimension x11 , x21 when selecting the two teaching instances. [sent-352, score-1.1]
80 She does not need to consider the irrelevant dimensions, as those will add up to a large c, which simplifies the teacher’s task in choosing a teaching sequence; she simply picks two extreme instances in the first dimension. [sent-353, score-1.064]
81 So far, we have assumed an infinite pool, such that the teacher can select the extreme teaching instances with x11 = 1, x21 = 0. [sent-359, score-1.165]
82 Note the similarity to curriculum learning which starts with extreme (easy) instances. [sent-367, score-0.251]
83 3 The Teaching Sequence should Gradually Approach the Boundary Thus far, we have focused on choosing the first two teaching instances. [sent-369, score-0.869]
84 We now show that, as teaching continues, the teacher should choose instances with a and b gradually approaching 1/2. [sent-370, score-1.122]
85 Consider the moment when the teacher has already presented a teaching sequence (x1 , y1 ), . [sent-373, score-1.026]
86 , (xt−2 , yt−2 ) and is about to select the next pair of teaching instances (xt−1 , 1), (xt , 0). [sent-376, score-0.943]
87 Following the discussion after Corollary 2, we assume that the teacher only pays attention to the first dimension when selecting teaching instances. [sent-378, score-1.067]
88 What is the value of c for a teaching sequence of length t? [sent-381, score-0.89]
89 Let the teaching sequence contain t0 negative labels and t − t0 positive ones. [sent-383, score-0.922]
90 d, after t teaching instances, |Vk (t)| = αk βk . [sent-393, score-0.869]
91 As mentioned above, these t teaching instances can be viewed as unif[0, 1] random variables in the kth dimension. [sent-394, score-0.943]
92 To illustrate how fast the optimal teaching sequence approaches 1/2 in the first dimension, Figure 5(c) shows a plot of |V1 | = a − b as a function of t by using E(c) in Theorem 1 (note in general that this is not E(|V1 |), but only a typical value). [sent-418, score-0.89]
93 5 Conclusion and Future Work We conducted a human teaching experiment and observed three distinct human teaching strategies. [sent-424, score-1.844]
94 Empirical results yielded no evidence for the boundary strategy but showed that the extreme strategy is consistent with the curriculum learning principle. [sent-425, score-0.515]
95 We presented a theoretical framework that extends teaching dimension and explains two defining properties of the extreme strategy: (1) teaching starts with extreme instances and (2) teaching gradually approaches the decision boundary. [sent-426, score-3.024]
96 Our framework predicts that, in the absence of irrelevant dimensions (d = 1), teaching should start at the decision boundary. [sent-427, score-0.971]
97 To verify this prediction, in our future work, we plan to conduct additional human teaching studies where the objects have no irrelevant attributes. [sent-428, score-1.005]
98 Generalized teaching dimensions and the query complexity of learning. [sent-501, score-0.891]
99 Complexity of teaching by a restricted number of examples. [sent-506, score-0.869]
100 Success and failure in teaching the [r]-[l] contrast to Japanese adults: Tests of a Hebbian model of plasticity and stabilization in spoken language perception. [sent-530, score-0.885]
wordName wordTfidf (topN-words)
[('teaching', 0.869), ('graspable', 0.154), ('participants', 0.151), ('curriculum', 0.144), ('teacher', 0.136), ('graspability', 0.119), ('boundary', 0.099), ('extreme', 0.086), ('strategy', 0.082), ('robot', 0.081), ('instances', 0.074), ('learner', 0.066), ('dimension', 0.062), ('cards', 0.06), ('teach', 0.058), ('human', 0.053), ('vk', 0.052), ('participant', 0.049), ('objects', 0.048), ('constrained', 0.047), ('decision', 0.045), ('pool', 0.043), ('gradually', 0.043), ('xjk', 0.042), ('rating', 0.041), ('strategies', 0.041), ('concept', 0.04), ('humans', 0.04), ('behavioral', 0.039), ('psychology', 0.038), ('sorted', 0.038), ('limc', 0.036), ('irrelevant', 0.035), ('hj', 0.034), ('cognitive', 0.033), ('card', 0.033), ('labels', 0.032), ('ck', 0.031), ('xj', 0.03), ('sequences', 0.03), ('xn', 0.029), ('teachers', 0.029), ('yj', 0.028), ('beta', 0.028), ('hk', 0.027), ('object', 0.026), ('wisconsin', 0.026), ('natural', 0.025), ('ascending', 0.024), ('balbach', 0.024), ('bilge', 0.024), ('faisal', 0.024), ('mccandliss', 0.024), ('mitsubishi', 0.024), ('subtask', 0.024), ('toothbrush', 0.024), ('minj', 0.023), ('classic', 0.023), ('madison', 0.023), ('hypothesis', 0.022), ('dimensions', 0.022), ('consistent', 0.022), ('english', 0.022), ('risk', 0.022), ('sequence', 0.021), ('industries', 0.021), ('deyne', 0.021), ('felt', 0.021), ('japanese', 0.021), ('photos', 0.021), ('salmon', 0.021), ('starts', 0.021), ('offers', 0.02), ('ambiguous', 0.02), ('hypotheses', 0.02), ('ratings', 0.02), ('maxj', 0.02), ('xt', 0.019), ('goldman', 0.019), ('exempli', 0.019), ('threshold', 0.019), ('thread', 0.018), ('label', 0.017), ('separators', 0.017), ('unif', 0.017), ('follow', 0.017), ('target', 0.017), ('labeled', 0.016), ('instructed', 0.016), ('corollary', 0.016), ('colt', 0.016), ('study', 0.016), ('yt', 0.016), ('version', 0.016), ('language', 0.016), ('associations', 0.016), ('progresses', 0.016), ('behavior', 0.015), ('parallel', 0.015), ('instance', 0.015)]
simIndex simValue paperId paperTitle
same-paper 1 0.9999997 122 nips-2011-How Do Humans Teach: On Curriculum Learning and Teaching Dimension
Author: Faisal Khan, Bilge Mutlu, Xiaojin Zhu
Abstract: We study the empirical strategies that humans follow as they teach a target concept with a simple 1D threshold to a robot.1 Previous studies of computational teaching, particularly the teaching dimension model and the curriculum learning principle, offer contradictory predictions on what optimal strategy the teacher should follow in this teaching task. We show through behavioral studies that humans employ three distinct teaching strategies, one of which is consistent with the curriculum learning principle, and propose a novel theoretical framework as a potential explanation for this strategy. This framework, which assumes a teaching goal of minimizing the learner’s expected generalization error at each iteration, extends the standard teaching dimension model and offers a theoretical justification for curriculum learning. 1
2 0.10924203 48 nips-2011-Blending Autonomous Exploration and Apprenticeship Learning
Author: Thomas J. Walsh, Daniel K. Hewlett, Clayton T. Morrison
Abstract: We present theoretical and empirical results for a framework that combines the benefits of apprenticeship and autonomous reinforcement learning. Our approach modifies an existing apprenticeship learning framework that relies on teacher demonstrations and does not necessarily explore the environment. The first change is replacing previously used Mistake Bound model learners with a recently proposed framework that melds the KWIK and Mistake Bound supervised learning protocols. The second change is introducing a communication of expected utility from the student to the teacher. The resulting system only uses teacher traces when the agent needs to learn concepts it cannot efficiently learn on its own. 1
3 0.07080882 90 nips-2011-Evaluating the inverse decision-making approach to preference learning
Author: Alan Jern, Christopher G. Lucas, Charles Kemp
Abstract: Psychologists have recently begun to develop computational accounts of how people infer others’ preferences from their behavior. The inverse decision-making approach proposes that people infer preferences by inverting a generative model of decision-making. Existing data sets, however, do not provide sufficient resolution to thoroughly evaluate this approach. We introduce a new preference learning task that provides a benchmark for evaluating computational accounts and use it to compare the inverse decision-making approach to a feature-based approach, which relies on a discriminative combination of decision features. Our data support the inverse decision-making approach to preference learning. A basic principle of decision-making is that knowing people’s preferences allows us to predict how they will behave: if you know your friend likes comedies and hates horror films, you can probably guess which of these options she will choose when she goes to the theater. Often, however, we do not know what other people like and we can only infer their preferences from their behavior. If you know that a different friend saw a comedy today, does that mean that he likes comedies in general? The conclusion you draw will likely depend on what else was playing and what movie choices he has made in the past. A goal for social cognition research is to develop a computational account of people’s ability to infer others’ preferences. One computational approach is based on inverse decision-making. This approach begins with a model of how someone’s preferences lead to a decision. Then, this model is inverted to determine the most likely preferences that motivated an observed decision. An alternative approach might simply learn a functional mapping between features of an observed decision and the preferences that motivated it. For instance, in your friend’s decision to see a comedy, perhaps the more movie options he turned down, the more likely it is that he has a true preference for comedies. The difference between the inverse decision-making approach and the feature-based approach maps onto the standard dichotomy between generative and discriminative models. Economists have developed an instance of the inverse decision-making approach known as the multinomial logit model [1] that has been widely used to infer consumer’s preferences from their choices. This model has recently been explored as a psychological model [2, 3, 4], but there are few behavioral data sets for evaluating it as a model of how people learn others’ preferences. Additionally, the data sets that do exist tend to be drawn from the developmental literature, which focuses on simple tasks that collect only one or two judgments from children [5, 6, 7]. The limitations of these data sets make it difficult to evaluate the multinomial logit model with respect to alternative accounts of preference learning like the feature-based approach. In this paper, we use data from a new experimental task that elicits a detailed set of preference judgments from a single participant in order to evaluate the predictions of several preference learning models from both the inverse decision-making and feature-based classes. Our task requires each participant to sort a large number of observed decisions on the basis of how strongly they indicate 1 (a) (b) (c) d c c (d) b b a a x d x d c b a x 1. Number of chosen effects (−/+) 2. Number of forgone effects (+/+) 3. Number of forgone options (+/+) 4. Number of forgone options containing x (−/−) 5. Max/min number of effects in a forgone option (+/−) 6. Is x in every option? (−/−) 7. Chose only option with x? (+/+) 8. Is x the only difference between options? (+/+) 9. Do all options have same number of effects? (+/+) 10. Chose option with max/min number of effects? (−/−) Figure 1: (a)–(c) Examples of the decisions used in the experiments. Each column represents one option and the boxes represent different effects. The chosen option is indicated by the black rectangle. (d) Features used by the weighted feature and ranked feature models. Features 5 and 10 involved maxima in Experiment 1, which focused on all positive effects, and minima in Experiment 2, which focused on all negative effects. The signs in parentheses indicate the direction of the feature that suggests a stronger preference in Experiment 1 / Experiment 2. a preference for a chosen item. Because the number of decisions is large and these decisions vary on multiple dimensions, predicting how people will order them offers a challenging benchmark on which to compare computational models of preference learning. Data sets from these sorts of detailed tasks have proved fruitful in other domains. For example, data reported by Shepard, Hovland, and Jenkins [8]; Osherson, Smith, Wilkie, L´ pez, and Shafir [9]; and Wasserman, Elek, Chatlosh, o and Baker [10] have motivated much subsequent research on category learning, inductive reasoning, and causal reasoning, respectively. We first describe our preference learning task in detail. We then present several inverse decisionmaking and feature-based models of preference learning and compare these models’ predictions to people’s judgments in two experiments. The data are well predicted by models that follow the inverse decision-making approach, suggesting that this computational approach may help explain how people learn others’ preferences. 1 Multi-attribute decisions and revealed preferences We designed a task that can be used to elicit a large number of preference judgments from a single participant. The task involves a set of observed multi-attribute decisions, some examples of which are represented visually in Figure 1. Each decision is among a set of options and each option produces a set of effects. Figure 1 shows several decisions involving a total of five effects distributed among up to five options. The differently colored boxes represent different effects and the chosen option is marked by a black rectangle. For example, 1a shows a choice between an option with four effects and an option with a single effect; here, the decision maker chose the second option. In our task, people are asked to rank a large number of these decisions by how strongly they suggest that the decision maker had a preference for a particular effect (e.g., effect x in Figure 1). By imposing some minimal constraints, the space of unique multi-attribute decisions is finite and we can obtain rankings for every decision in the space. For example, Figure 2c shows a complete list of 47 unique decisions involving up to five effects, subject to several constraints described later. Three of these decisions are shown in Figure 1. If all the effects are positive—pieces of candy, for example—the first decision (1a) suggests a strong preference for candy x, because the decision maker turned down four pieces in favor of one. The second decision (1b), however, offers much weaker evidence because nearly everyone would choose four pieces of candy over one, even without a specific preference for x. The third decision (1c) provides evidence that is strong but perhaps not quite as strong as the first decision. When all effects are negative—like electric shocks at different body locations—decision makers may still find some effects more tolerable than others, but different inferences are sometimes supported. For example, for negative effects, 1a provides weak evidence that x is relatively tolerable because nearly everyone would choose one shock over four. 2 A computational account of preference learning We now describe a simple computational model for learning a person’s preferences after observing that person make a decision like the ones in Figure 1. We assume that there are n available options 2 {o1 , . . . , on }, each of which produces one or more effects from the set {f1 , f2 , ..., fm }. For simplicity, we assume that effects are binary. Let ui denote the utility the decision maker assigns to effect fi . We begin by specifying a model of decision-making that makes the standard assumptions that decision makers tend to choose things with greater utility and that utilities are additive. That is, if fj is a binary vector indicating the effects produced by option oj and u is a vector of utilities assigned to each of the m effects, then the total utility associated with option oj can be expressed as Uj = fj T u. We complete the specification of the model by applying the Luce choice rule [11], a common psychological model of choice behavior, as the function that chooses among the options: p(c = oj |u, f ) = exp(Uj ) = exp(Uk ) n k=1 exp(fj T u) n T k=1 exp(fk u) (1) where c denotes the choice made. This model can predict the choice someone will make among a specified set of options, given the utilities that person assigns to the effects in each option. To obtain estimates of someone’s utilities, we invert this model by applying Bayes’ rule: p(u|c, F) = p(c|u, F)p(u) p(c|F) (2) where F = {f1 , . . . , fn } specifies the available options and their corresponding effects. This is the multinomial logit model [1], a standard econometric model. In order to apply Equation 2 we must specify a prior p(u) on the utilities. We adopt a standard approach that places independent Gaussian priors on the utilities: ui ∼ N (µ, σ 2 ). For decisions where effects are positive—like candies—we set µ = 2σ, which corresponds to a prior distribution that places approximately 2% of the probability mass below zero. Similarly, for negative effects—like electric shocks—we set µ = −2σ. 2.1 Ordering a set of observed decisions Equation 2 specifies a posterior probability distribution over utilities for a single observed decision but does not provide a way to compare the inferences drawn from multiple decisions for the purposes of ordering them. Suppose we are interested in a decision maker’s preference for effect x and we wish to order a set of decisions by how strongly they support this preference. Two criteria for ordering the decisions are as follows: Absolute utility Relative utility p(c|ux , F)p(ux ) p(c|F) p(c|∀j ux ≥ uj , F)p(∀j ux ≥ uj ) p(∀j ux ≥ uj |c, F) = p(c|F) E(ux |c, F) = Eux The absolute utility model orders decisions by the mean posterior utility for effect x. This criterion is perhaps the most natural way to assess how much a decision indicates a preference for x, but it requires an inference about the utility of x in isolation, and research suggests that people often think about the utility of an effect only in relation to other salient possibilities [12]. The relative utility model applies this idea to preference learning by ordering decisions based on how strongly they suggest that x has a greater utility than all other effects. The decisions in Figures 1b and 1c are cases where the two models lead to different predictions. If the effects are all negative (e.g., electric shocks), the absolute utility model predicts that 1b provides stronger evidence for a tolerance for x because the decision maker chose to receive four shocks instead of just one. The relative utility model predicts that 1c provides stronger evidence because 1b offers no way to determine the relative tolerance of the four chosen effects with respect to one another. Like all generative models, the absolute and relative models incorporate three qualitatively different components: the likelihood term p(c|u, F), the prior p(u), and the reciprocal of the marginal likelihood 1/p(c|F). We assume that the total number of effects is fixed in advance and, as a result, the prior term will be the same for all decisions that we consider. The two other components, however, will vary across decisions. The inverse decision-making approach predicts that both components should influence preference judgments, and we will test this prediction by comparing our 3 two inverse decision-making models to two alternatives that rely only one of these components as an ordering criterion: p(c|∀j ux ≥ uj , F) 1/p(c|F) Representativeness Surprise The representativeness model captures how likely the observed decision would be if the utility for x were high, and previous research has shown that people sometimes rely on a representativeness computation of this kind [13]. The surprise model captures how unexpected the observed decision is overall; surprising decisions may be best explained in terms of a strong preference for x, but unsurprising decisions provide little information about x in particular. 2.2 Feature-based models We also consider a class of feature-based models that use surface features to order decisions. The ten features that we consider are shown in Figure 1d, where x is the effect of interest. As an example, the first feature specifies the number of effects chosen; because x is always among the chosen effects, decisions where few or no other effects belong to the chosen option suggest the strongest preference for x (when all effects are positive). This and the second feature were previously identified by Newtson [14]; we included the eight additional features shown in Figure 1d in an attempt to include all possible features that seemed both simple and relevant. We consider two methods for combining this set of features to order a set of decisions by how strongly they suggest a preference for x. The first model is a standard linear regression model, which we refer to as the weighted feature model. The model learns a weight for each feature, and the rank of a given decision is determined by a weighted sum of its features. The second model is a ranked feature model that sorts the observed decisions with respect to a strict ranking of the features. The top-ranked feature corresponds to the primary sort key, the second-ranked feature to the secondary sort key, and so on. For example, suppose that the top-ranked feature is the number of chosen effects and the second-ranked feature is the number of forgone options. Sorting the three decisions in Figure 1 according to this criterion produces the following ordering: 1a,1c,1b. This notion of sorting items on the basis of ranked features has been applied before to decision-making [15, 16] and other domains of psychology [17], but we are not aware of any previous applications to preference learning. Although our inverse decision-making and feature-based models represent two very different approaches, both may turn out to be valuable. An inverse decision-making approach may be the appropriate account of preference learning at Marr’s [18] computational level, and a feature-based approach may capture the psychological processes by which the computational-level account is implemented. Our goal, therefore, is not necessarily to accept one of these approaches and dismiss the other. Instead, we entertain three distinct possibilities. First, both approaches may account well for the data, which would support the idea that they are valid accounts operating at different levels of analysis. Second, the inverse decision-making approach may offer a better account, suggesting that process-level accounts other than the feature-based approach should be explored. Finally, the feature-based approach may offer a better account, suggesting that inverse decision-making does not constitute an appropriate computational-level account of preference learning. 3 Experiment 1: Positive effects Our first experiment focuses on decisions involving only positive effects. The full set of 47 decisions we used is shown in Figure 2c. This set includes every possible unique decision with up to five different effects, subject to the following constraints: (1) one of the effects (effect x) must always appear in the chosen option, (2) there are no repeated options, (3) each effect may appear in an option at most once, (4) only effects in the chosen option may be repeated in other options, and (5) when effects appear in multiple options, the number of effects is held constant across options. The first constraint is necessary for the sorting task, the second two constraints create a finite space of decisions, and the final two constraints limit attention to what we deemed the most interesting cases. Method 43 Carnegie Mellon undergraduates participated for course credit. Each participant was given a set of cards, with one decision printed on each card. The decisions were represented visually 4 (a) (c) Decisions 42 40 45 Mean human rankings 38 30 23 20 22 17 13 12 11 10 9 8 7 6 19 18 31 34 28 21 26 36 35 33 37 27 29 32 25 24 16 15 14 5 4 3 2 1 Absolute utility model rankings (b) Mean human rankings (Experiment 1) 47 43 44 46 45 38 37 36 34 35 30 32 33 31 29 28 24 26 27 25 21 19 22 20 18 16 17 12 13 7 6 11 5 9 4 10 8 1 2 3 42 40 41 39 47 46 44 41 43 39 23 15 14 Mean human rankings (Experiment 2) 1. dcbax 2. cbax 3. bax 4. ax 5. x 6. dcax | bcax 7. dx | cx | bx | ax 8. cax | bax 9. bdx | bcx | bax 10. dcx | bax 11. bx | ax 12. bdx | cax | bax 13. cx | bx | ax 14. d | cbax 15. c | bax 16. b | ax 17. d | c | bax 18. dc | bax 19. c | b | ax 20. dc | bx | ax 21. bdc | bax 22. ad | cx | bx | ax 23. d | c | b | ax 24. bad | bcx | bax 25. ac | bx | ax 26. cb | ax 27. cbad | cbax 28. dc | b | ax 29. ad | ac | bx | ax 30. ab | ax 31. bad | bax 32. dc | ab | ax 33. dcb | ax 34. a | x 35. bad | bac | bax 36. ac | ab | ax 37. ad | ac | ab | ax 38. b | a | x 39. ba | x 40. c | b | a | x 41. cb | a | x 42. d | c | b | a | x 43. cba | x 44. dc | ba | x 45. dc | b | a | x 46. dcb | a | x 47. dcba | x Figure 2: (a) Comparison between the absolute utility model rankings and the mean human rankings for Experiment 1. Each point represents one decision, numbered with respect to the list in panel c. (b) Comparison between the mean human rankings in Experiments 1 and 2. In both scatter plots, the solid diagonal lines indicate a perfect correspondence between the two sets of rankings. (c) The complete set of decisions, ordered by the mean human rankings from Experiment 1. Options are separated by vertical bars and the chosen option is always at the far right. Participants were always asked about a preference for effect x. as in Figure 1 but without the letter labels. Participants were told that the effects were different types of candy and each option was a bag containing one or more pieces of candy. They were asked to sort the cards by how strongly each decision suggested that the decision maker liked a particular target candy, labeled x in Figure 2c. They sorted the cards freely on a table but reported their final rankings by writing them on a sheet of paper, from weakest to strongest evidence. They were instructed to order the cards as completely as possible, but were told that they could assign the same ranking to a set of cards if they believed those cards provided equal evidence. 3.1 Results Two participants were excluded as outliers based on the criterion that their rankings for at least five decisions were at least three standard deviations from the mean rankings. We performed a hierarchical clustering analysis of the remaining 41 participants’ rankings using rank correlation as a similarity metric. Participants’ rankings were highly correlated: cutting the resulting dendrogram at 0.2 resulted in one cluster that included 33 participants and the second largest cluster included 5 Surprise MAE = 17.8 MAE = 7.0 MAE = 4.3 MAE = 17.3 MAE = 9.5 Human rankings Experiment 2 Negative effects Representativeness MAE = 2.3 MAE = 6.7 Experiment 1 Positive effects Relative utility MAE = 2.3 Human rankings Absolute utility Model rankings Model rankings Model rankings Model rankings Figure 3: Comparison between human rankings in both experiments and predicted rankings from four models. The solid diagonal lines indicate a perfect correspondence between human and model rankings. only 3 participants. Thus, we grouped all participants together and analyzed their mean rankings. The 0.2 threshold was chosen because it produced the most informative clustering in Experiment 2. Inverse decision-making models We implemented the inverse decision-making models using importance sampling with 5 million samples drawn from the prior distribution p(u). Because all the effects were positive, we used a prior on utilities that placed nearly all probability mass above zero (µ = 4, σ = 2). The mean human rankings are compared with the absolute utility model rankings in Figure 2a, and the mean human rankings are listed in order in 2c. Fractional rankings were used for both the human data and the model predictions. The human rankings in the figure are the means of participants’ fractional rankings. The first row of Figure 3 contains similar plots that allow comparison of the four models we considered. In these plots, the solid diagonal lines indicate a perfect correspondence between model and human rankings. Thus, the largest deviations from this line represent the largest deviations in the data from the model’s predictions. Figure 3 shows that the absolute and relative utility models make virtually identical predictions and both models provide a strong account of the human rankings as measured by mean absolute error (MAE = 2.3 in both cases). Moreover, both models correctly predict the highest ranked decision and the set of lowest ranked decisions. The only clear discrepancy between the model predictions and the data is the cluster of points at the lower left, labeled as Decisions 6–13 in Figure 2a. These are all cases in which effect x appears in all options and therefore these decisions provide no information about a decision maker’s preference for x. Consequently, the models assign the same ranking to this group as to the group of decisions in which there is only a single option (Decisions 1–5). Although people appeared to treat these groups somewhat differently, the models still correctly predict that the entire group of decisions 1–13 is ranked lower than all other decisions. The surprise and representativeness models do not perform nearly as well (MAE = 7.0 and 17.8, respectively). Although the surprise model captures some of the general trends in the human rankings, it makes several major errors. For example, consider Decision 7: dx|cx|bx|ax. This decision provides no information about a preference for x because it appears in every option. The decision is surprising, however, because a decision maker choosing at random from these options would make the observed choice only 1/4 of the time. The representativeness model performs even worse, primarily because it does not take into account alternative explanations for why an option was chosen, such as the fact that no other options were available (e.g., Decision 1 in Figure 2c). The failure of these models to adequately account for the data suggests that both the likelihood p(c|u, F) and marginal likelihood p(c|F) are important components of the absolute and relative utility models. Feature-based models We compared the performance of the absolute and relative utility models to our two feature-based models: the weighted feature and ranked feature models. For each participant, 6 (b) Ranked feature 10 10 5 Figure 4: Results of the feature-based model analysis from Experiment 1 for (a) the weighted feature models and (b) the ranked feature models. The histograms show the minimum number of features needed to match the accuracy (measured by MAE) of the absolute utility model for each participant. 15 5 1 2 3 4 5 6 >6 15 1 2 3 4 5 6 7 8 9 10 >10 Number of participants (a) Weighted feature Number of features needed we considered every subset of features1 in Figure 1d in order to determine the minimum number of features needed by the two models to achieve the same level of accuracy as the absolute utility model, as measured by mean absolute error. The results of these analyses are shown in Figure 4. For the majority of participants, at least four features were needed by both models to match the accuracy of the absolute utility model. For the weighted feature model, 14 participants could not be fit as well as the absolute utility model even when all ten features were considered. These results indicate that a feature-based account of people’s inferences in our task must be supplied with a relatively large number of features. By contrast, the inverse decision-making approach provides a relatively parsimonious account of the data. 4 Experiment 2: Negative effects Experiment 2 focused on a setting in which all effects are negative, motivated by the fact that the inverse decision-making models predict several major differences in orderings when effects are negative rather than positive. For instance, the absolute utility model’s relative rankings of the decisions in Figures 1a and 1b are reversed when all effects are negative rather than positive. Method 42 Carnegie Mellon undergraduates participated for course credit. The experimental design was identical to Experiment 1 except that participants were told that the effects were electric shocks at different body locations. They were asked to sort the cards on the basis of how strongly each decision suggested that the decision maker finds shocks at the target location relatively tolerable. The model predictions were derived in the same way as for Experiment 1, but with a prior distribution on utilities that placed nearly all probability mass below zero (µ = −4, σ = 2) to reflect the fact that effects were all negative. 4.1 Results Three participants were excluded as outliers by the same criterion applied in Experiment 1. The resulting mean rankings are compared with the corresponding rankings from Experiment 1 in Figure 2b. The figure shows that responses based on positive and negative effects were substantially different in a number of cases. Figure 3 shows how the mean rankings compare to the predictions of the four models we considered. Although the relative utility model is fairly accurate, no model achieves the same level of accuracy as the absolute and relative utility models in Experiment 1. In addition, the relative utility model provides a poor account of the responses of many individual participants. To better understand responses at the individual level, we repeated the hierarchical clustering analysis described in Experiment 1, which revealed that 29 participants could be grouped into one of four clusters, with the remaining participants each in their own clusters. We analyzed these four clusters independently, excluding the 10 participants that could not be naturally grouped. We compared the mean rankings of each cluster to the absolute and relative utility models, as well as all one- and two-feature weighted feature and ranked feature models. Figure 5 shows that the mean rankings of participants in Cluster 1 (N = 8) were best fit by the absolute utility model, the mean rankings of participants in Cluster 2 (N = 12) were best fit by the relative utility model, and the mean rankings of participants in Clusters 3 (N = 3) and 4 (N = 6) were better fit by feature-based models than by either the absolute or relative utility models. 1 A maximum of six features was considered for the ranked feature model because considering more features was computationally intractable. 7 Cluster 4 N =6 MAE = 4.9 MAE = 14.0 MAE = 7.9 MAE = 5.3 MAE = 2.6 MAE = 13.0 MAE = 6.2 Human rankings Relative utility Cluster 3 N =3 MAE = 2.6 Absolute utility Cluster 2 N = 12 Human rankings Cluster 1 N =8 Factors: 1,3 Factors: 1,8 MAE = 2.3 MAE = 5.2 Model rankings Best−fitting weighted feature Factors: 6,7 MAE = 4.0 Model rankings Model rankings Model rankings Human rankings Factors: 3,8 MAE = 4.8 Figure 5: Comparison between human rankings for four clusters of participants identified in Experiment 2 and predicted rankings from three models. Each point in the plots corresponds to one decision and the solid diagonal lines indicate a perfect correspondence between human and model rankings. The third row shows the predictions of the best-fitting two-factor weighted feature model for each cluster. The two factors listed refer to Figure 1d. To examine how well the models accounted for individuals’ rankings within each cluster, we compared the predictions of the inverse decision-making models to the best-fitting two-factor featurebased model for each participant. In Cluster 1, 7 out of 8 participants were best fit by the absolute utility model; in Cluster 2, 8 out of 12 participants were best fit by the relative utility model; in Clusters 3 and 4, all participants were better fit by feature-based models. No single feature-based model provided the best fit for more than two participants, suggesting that participants not fit well by the inverse decision-making models were not using a single alternative strategy. Applying the feature-based model analysis from Experiment 1 to the current results revealed that the weighted feature model required an average of 6.0 features to match the performance of the absolute utility model for participants in Cluster 1, and an average of 3.9 features to match the performance of the relative utility model for participants in Cluster 2. Thus, although a single model did not fit all participants well in the current experiment, many participants were fit well by one of the two inverse decision-making models, suggesting that this general approach is useful for explaining how people reason about negative effects as well as positive effects. 5 Conclusion In two experiments, we found that an inverse decision-making approach offered a good computational account of how people make judgments about others’ preferences. Although this approach is conceptually simple, our analyses indicated that it captures the influence of a fairly large number of relevant decision features. Indeed, the feature-based models that we considered as potential process models of preference learning could only match the performance of the inverse decision-making approach when supplied with a relatively large number of features. We feel that this result rules out the feature-based approach as psychologically implausible, meaning that alternative process-level accounts will need to be explored. One possibility is sampling, which has been proposed as a psychological mechanism for approximating probabilistic inferences [19, 20]. However, even if process models that use large numbers of features are considered plausible, the inverse decision-making approach provides a valuable computational-level account that helps to explain which decision features are informative. Acknowledgments This work was supported in part by the Pittsburgh Life Sciences Greenhouse Opportunity Fund and by NSF grant CDI-0835797. 8 References [1] D. McFadden. Conditional logit analysis of qualitative choice behavior. In P. Zarembka, editor, Frontiers in Econometrics. Amademic Press, New York, 1973. [2] C. G. Lucas, T. L. Griffiths, F. Xu, and C. Fawcett. A rational model of preference learning and choice prediction by children. In Proceedings of Neural Information Processing Systems 21, 2009. [3] L. Bergen, O. R. Evans, and J. B. Tenenbaum. Learning structured preferences. In Proceedings of the 32nd Annual Conference of the Cognitive Science Society, 2010. [4] A. Jern and C. Kemp. Decision factors that support preference learning. In Proceedings of the 33rd Annual Conference of the Cognitive Science Society, 2011. [5] T. Kushnir, F. Xu, and H. M. Wellman. Young children use statistical sampling to infer the preferences of other people. Psychological Science, 21(8):1134–1140, 2010. [6] L. Ma and F. Xu. Young children’s use of statistical sampling evidence to infer the subjectivity of preferences. Cognition, in press. [7] M. J. Doherty. Theory of Mind: How Children Understand Others’ Thoughts and Feelings. Psychology Press, New York, 2009. [8] R. N. Shepard, C. I. Hovland, and H. M. Jenkins. Learning and memorization of classifications. Psychological Monographs, 75, Whole No. 517, 1961. [9] D. N. Osherson, E. E. Smith, O. Wilkie, A. L´ pez, and E. Shafir. Category-based induction. Psychological o Review, 97(2):185–200, 1990. [10] E. A. Wasserman, S. M. Elek, D. L. Chatlosh, and A. G. Baker. Rating causal relations: Role of probability in judgments of response-outcome contingency. Journal of Experimental Psychology: Learning, Memory, and Cognition, 19(1):174–188, 1993. [11] R. D. Luce. Individual choice behavior. John Wiley, 1959. [12] D. Ariely, G. Loewenstein, and D. Prelec. Tom Sawyer and the construction of value. Journal of Economic Behavior & Organization, 60:1–10, 2006. [13] D. Kahneman and A. Tversky. Subjective probability: A judgment of representativeness. Cognitive Psychology, 3(3):430–454, 1972. [14] D. Newtson. Dispositional inference from effects of actions: Effects chosen and effects forgone. Journal of Experimental Social Psychology, 10:489–496, 1974. [15] P. C. Fishburn. Lexicographic orders, utilities and decision rules: A survey. Management Science, 20(11):1442–1471, 1974. [16] G. Gigerenzer and P. M. Todd. Fast and frugal heuristics: The adaptive toolbox. Oxford University Press, New York, 1999. [17] A. Prince and P. Smolensky. Optimality Theory: Constraint Interaction in Generative Grammar. WileyBlackwell, 2004. [18] D. Marr. Vision. W. H. Freeman, San Francisco, 1982. [19] A. N. Sanborn, T. L. Griffiths, and D. J. Navarro. Rational approximations to rational models: Alternative algorithms for category learning. Psychological Review, 117:1144–1167, 2010. [20] L. Shi and T. L. Griffiths. Neural implementation of Bayesian inference by importance sampling. In Proceedings of Neural Information Processing Systems 22, 2009. 9
4 0.06584359 34 nips-2011-An Unsupervised Decontamination Procedure For Improving The Reliability Of Human Judgments
Author: Michael C. Mozer, Benjamin Link, Harold Pashler
Abstract: Psychologists have long been struck by individuals’ limitations in expressing their internal sensations, impressions, and evaluations via rating scales. Instead of using an absolute scale, individuals rely on reference points from recent experience. This relativity of judgment limits the informativeness of responses on surveys, questionnaires, and evaluation forms. Fortunately, the cognitive processes that map stimuli to responses are not simply noisy, but rather are influenced by recent experience in a lawful manner. We explore techniques to remove sequential dependencies, and thereby decontaminate a series of ratings to obtain more meaningful human judgments. In our formulation, the problem is to infer latent (subjective) impressions from a sequence of stimulus labels (e.g., movie names) and responses. We describe an unsupervised approach that simultaneously recovers the impressions and parameters of a contamination model that predicts how recent judgments affect the current response. We test our iterated impression inference, or I3 , algorithm in three domains: rating the gap between dots, the desirability of a movie based on an advertisement, and the morality of an action. We demonstrate significant objective improvements in the quality of the recovered impressions. 1
5 0.059576776 22 nips-2011-Active Ranking using Pairwise Comparisons
Author: Kevin G. Jamieson, Robert Nowak
Abstract: This paper examines the problem of ranking a collection of objects using pairwise comparisons (rankings of two objects). In general, the ranking of n objects can be identified by standard sorting methods using n log2 n pairwise comparisons. We are interested in natural situations in which relationships among the objects may allow for ranking using far fewer pairwise comparisons. Specifically, we assume that the objects can be embedded into a d-dimensional Euclidean space and that the rankings reflect their relative distances from a common reference point in Rd . We show that under this assumption the number of possible rankings grows like n2d and demonstrate an algorithm that can identify a randomly selected ranking using just slightly more than d log n adaptively selected pairwise comparisons, on average. If instead the comparisons are chosen at random, then almost all pairwise comparisons must be made in order to identify any ranking. In addition, we propose a robust, error-tolerant algorithm that only requires that the pairwise comparisons are probably correct. Experimental studies with synthetic and real datasets support the conclusions of our theoretical analysis. 1
6 0.059561748 41 nips-2011-Autonomous Learning of Action Models for Planning
7 0.056437504 35 nips-2011-An ideal observer model for identifying the reference frame of objects
8 0.055047102 3 nips-2011-A Collaborative Mechanism for Crowdsourcing Prediction Problems
9 0.045425858 15 nips-2011-A rational model of causal inference with continuous causes
10 0.044111889 130 nips-2011-Inductive reasoning about chimeric creatures
11 0.037938599 28 nips-2011-Agnostic Selective Classification
12 0.037299976 162 nips-2011-Lower Bounds for Passive and Active Learning
13 0.036676764 88 nips-2011-Environmental statistics and the trade-off between model-based and TD learning in humans
14 0.035475679 29 nips-2011-Algorithms and hardness results for parallel large margin learning
15 0.032651264 204 nips-2011-Online Learning: Stochastic, Constrained, and Smoothed Adversaries
16 0.032578643 196 nips-2011-On Strategy Stitching in Large Extensive Form Multiplayer Games
17 0.032360479 260 nips-2011-Sparse Features for PCA-Like Linear Regression
18 0.031941414 280 nips-2011-Testing a Bayesian Measure of Representativeness Using a Large Image Database
19 0.031818308 247 nips-2011-Semantic Labeling of 3D Point Clouds for Indoor Scenes
20 0.031082628 218 nips-2011-Predicting Dynamic Difficulty
topicId topicWeight
[(0, 0.095), (1, -0.01), (2, -0.019), (3, 0.032), (4, 0.024), (5, 0.022), (6, -0.029), (7, -0.065), (8, -0.021), (9, 0.015), (10, 0.009), (11, -0.025), (12, 0.058), (13, -0.003), (14, 0.133), (15, -0.052), (16, 0.088), (17, 0.019), (18, 0.068), (19, -0.08), (20, 0.049), (21, -0.005), (22, -0.053), (23, -0.004), (24, -0.034), (25, 0.061), (26, -0.005), (27, -0.041), (28, -0.043), (29, 0.024), (30, -0.019), (31, -0.051), (32, -0.03), (33, -0.021), (34, 0.05), (35, -0.03), (36, -0.029), (37, 0.026), (38, -0.052), (39, 0.005), (40, 0.033), (41, 0.059), (42, 0.091), (43, 0.099), (44, 0.038), (45, -0.097), (46, 0.089), (47, -0.009), (48, -0.119), (49, 0.08)]
simIndex simValue paperId paperTitle
same-paper 1 0.93820161 122 nips-2011-How Do Humans Teach: On Curriculum Learning and Teaching Dimension
Author: Faisal Khan, Bilge Mutlu, Xiaojin Zhu
Abstract: We study the empirical strategies that humans follow as they teach a target concept with a simple 1D threshold to a robot.1 Previous studies of computational teaching, particularly the teaching dimension model and the curriculum learning principle, offer contradictory predictions on what optimal strategy the teacher should follow in this teaching task. We show through behavioral studies that humans employ three distinct teaching strategies, one of which is consistent with the curriculum learning principle, and propose a novel theoretical framework as a potential explanation for this strategy. This framework, which assumes a teaching goal of minimizing the learner’s expected generalization error at each iteration, extends the standard teaching dimension model and offers a theoretical justification for curriculum learning. 1
2 0.66581333 90 nips-2011-Evaluating the inverse decision-making approach to preference learning
Author: Alan Jern, Christopher G. Lucas, Charles Kemp
Abstract: Psychologists have recently begun to develop computational accounts of how people infer others’ preferences from their behavior. The inverse decision-making approach proposes that people infer preferences by inverting a generative model of decision-making. Existing data sets, however, do not provide sufficient resolution to thoroughly evaluate this approach. We introduce a new preference learning task that provides a benchmark for evaluating computational accounts and use it to compare the inverse decision-making approach to a feature-based approach, which relies on a discriminative combination of decision features. Our data support the inverse decision-making approach to preference learning. A basic principle of decision-making is that knowing people’s preferences allows us to predict how they will behave: if you know your friend likes comedies and hates horror films, you can probably guess which of these options she will choose when she goes to the theater. Often, however, we do not know what other people like and we can only infer their preferences from their behavior. If you know that a different friend saw a comedy today, does that mean that he likes comedies in general? The conclusion you draw will likely depend on what else was playing and what movie choices he has made in the past. A goal for social cognition research is to develop a computational account of people’s ability to infer others’ preferences. One computational approach is based on inverse decision-making. This approach begins with a model of how someone’s preferences lead to a decision. Then, this model is inverted to determine the most likely preferences that motivated an observed decision. An alternative approach might simply learn a functional mapping between features of an observed decision and the preferences that motivated it. For instance, in your friend’s decision to see a comedy, perhaps the more movie options he turned down, the more likely it is that he has a true preference for comedies. The difference between the inverse decision-making approach and the feature-based approach maps onto the standard dichotomy between generative and discriminative models. Economists have developed an instance of the inverse decision-making approach known as the multinomial logit model [1] that has been widely used to infer consumer’s preferences from their choices. This model has recently been explored as a psychological model [2, 3, 4], but there are few behavioral data sets for evaluating it as a model of how people learn others’ preferences. Additionally, the data sets that do exist tend to be drawn from the developmental literature, which focuses on simple tasks that collect only one or two judgments from children [5, 6, 7]. The limitations of these data sets make it difficult to evaluate the multinomial logit model with respect to alternative accounts of preference learning like the feature-based approach. In this paper, we use data from a new experimental task that elicits a detailed set of preference judgments from a single participant in order to evaluate the predictions of several preference learning models from both the inverse decision-making and feature-based classes. Our task requires each participant to sort a large number of observed decisions on the basis of how strongly they indicate 1 (a) (b) (c) d c c (d) b b a a x d x d c b a x 1. Number of chosen effects (−/+) 2. Number of forgone effects (+/+) 3. Number of forgone options (+/+) 4. Number of forgone options containing x (−/−) 5. Max/min number of effects in a forgone option (+/−) 6. Is x in every option? (−/−) 7. Chose only option with x? (+/+) 8. Is x the only difference between options? (+/+) 9. Do all options have same number of effects? (+/+) 10. Chose option with max/min number of effects? (−/−) Figure 1: (a)–(c) Examples of the decisions used in the experiments. Each column represents one option and the boxes represent different effects. The chosen option is indicated by the black rectangle. (d) Features used by the weighted feature and ranked feature models. Features 5 and 10 involved maxima in Experiment 1, which focused on all positive effects, and minima in Experiment 2, which focused on all negative effects. The signs in parentheses indicate the direction of the feature that suggests a stronger preference in Experiment 1 / Experiment 2. a preference for a chosen item. Because the number of decisions is large and these decisions vary on multiple dimensions, predicting how people will order them offers a challenging benchmark on which to compare computational models of preference learning. Data sets from these sorts of detailed tasks have proved fruitful in other domains. For example, data reported by Shepard, Hovland, and Jenkins [8]; Osherson, Smith, Wilkie, L´ pez, and Shafir [9]; and Wasserman, Elek, Chatlosh, o and Baker [10] have motivated much subsequent research on category learning, inductive reasoning, and causal reasoning, respectively. We first describe our preference learning task in detail. We then present several inverse decisionmaking and feature-based models of preference learning and compare these models’ predictions to people’s judgments in two experiments. The data are well predicted by models that follow the inverse decision-making approach, suggesting that this computational approach may help explain how people learn others’ preferences. 1 Multi-attribute decisions and revealed preferences We designed a task that can be used to elicit a large number of preference judgments from a single participant. The task involves a set of observed multi-attribute decisions, some examples of which are represented visually in Figure 1. Each decision is among a set of options and each option produces a set of effects. Figure 1 shows several decisions involving a total of five effects distributed among up to five options. The differently colored boxes represent different effects and the chosen option is marked by a black rectangle. For example, 1a shows a choice between an option with four effects and an option with a single effect; here, the decision maker chose the second option. In our task, people are asked to rank a large number of these decisions by how strongly they suggest that the decision maker had a preference for a particular effect (e.g., effect x in Figure 1). By imposing some minimal constraints, the space of unique multi-attribute decisions is finite and we can obtain rankings for every decision in the space. For example, Figure 2c shows a complete list of 47 unique decisions involving up to five effects, subject to several constraints described later. Three of these decisions are shown in Figure 1. If all the effects are positive—pieces of candy, for example—the first decision (1a) suggests a strong preference for candy x, because the decision maker turned down four pieces in favor of one. The second decision (1b), however, offers much weaker evidence because nearly everyone would choose four pieces of candy over one, even without a specific preference for x. The third decision (1c) provides evidence that is strong but perhaps not quite as strong as the first decision. When all effects are negative—like electric shocks at different body locations—decision makers may still find some effects more tolerable than others, but different inferences are sometimes supported. For example, for negative effects, 1a provides weak evidence that x is relatively tolerable because nearly everyone would choose one shock over four. 2 A computational account of preference learning We now describe a simple computational model for learning a person’s preferences after observing that person make a decision like the ones in Figure 1. We assume that there are n available options 2 {o1 , . . . , on }, each of which produces one or more effects from the set {f1 , f2 , ..., fm }. For simplicity, we assume that effects are binary. Let ui denote the utility the decision maker assigns to effect fi . We begin by specifying a model of decision-making that makes the standard assumptions that decision makers tend to choose things with greater utility and that utilities are additive. That is, if fj is a binary vector indicating the effects produced by option oj and u is a vector of utilities assigned to each of the m effects, then the total utility associated with option oj can be expressed as Uj = fj T u. We complete the specification of the model by applying the Luce choice rule [11], a common psychological model of choice behavior, as the function that chooses among the options: p(c = oj |u, f ) = exp(Uj ) = exp(Uk ) n k=1 exp(fj T u) n T k=1 exp(fk u) (1) where c denotes the choice made. This model can predict the choice someone will make among a specified set of options, given the utilities that person assigns to the effects in each option. To obtain estimates of someone’s utilities, we invert this model by applying Bayes’ rule: p(u|c, F) = p(c|u, F)p(u) p(c|F) (2) where F = {f1 , . . . , fn } specifies the available options and their corresponding effects. This is the multinomial logit model [1], a standard econometric model. In order to apply Equation 2 we must specify a prior p(u) on the utilities. We adopt a standard approach that places independent Gaussian priors on the utilities: ui ∼ N (µ, σ 2 ). For decisions where effects are positive—like candies—we set µ = 2σ, which corresponds to a prior distribution that places approximately 2% of the probability mass below zero. Similarly, for negative effects—like electric shocks—we set µ = −2σ. 2.1 Ordering a set of observed decisions Equation 2 specifies a posterior probability distribution over utilities for a single observed decision but does not provide a way to compare the inferences drawn from multiple decisions for the purposes of ordering them. Suppose we are interested in a decision maker’s preference for effect x and we wish to order a set of decisions by how strongly they support this preference. Two criteria for ordering the decisions are as follows: Absolute utility Relative utility p(c|ux , F)p(ux ) p(c|F) p(c|∀j ux ≥ uj , F)p(∀j ux ≥ uj ) p(∀j ux ≥ uj |c, F) = p(c|F) E(ux |c, F) = Eux The absolute utility model orders decisions by the mean posterior utility for effect x. This criterion is perhaps the most natural way to assess how much a decision indicates a preference for x, but it requires an inference about the utility of x in isolation, and research suggests that people often think about the utility of an effect only in relation to other salient possibilities [12]. The relative utility model applies this idea to preference learning by ordering decisions based on how strongly they suggest that x has a greater utility than all other effects. The decisions in Figures 1b and 1c are cases where the two models lead to different predictions. If the effects are all negative (e.g., electric shocks), the absolute utility model predicts that 1b provides stronger evidence for a tolerance for x because the decision maker chose to receive four shocks instead of just one. The relative utility model predicts that 1c provides stronger evidence because 1b offers no way to determine the relative tolerance of the four chosen effects with respect to one another. Like all generative models, the absolute and relative models incorporate three qualitatively different components: the likelihood term p(c|u, F), the prior p(u), and the reciprocal of the marginal likelihood 1/p(c|F). We assume that the total number of effects is fixed in advance and, as a result, the prior term will be the same for all decisions that we consider. The two other components, however, will vary across decisions. The inverse decision-making approach predicts that both components should influence preference judgments, and we will test this prediction by comparing our 3 two inverse decision-making models to two alternatives that rely only one of these components as an ordering criterion: p(c|∀j ux ≥ uj , F) 1/p(c|F) Representativeness Surprise The representativeness model captures how likely the observed decision would be if the utility for x were high, and previous research has shown that people sometimes rely on a representativeness computation of this kind [13]. The surprise model captures how unexpected the observed decision is overall; surprising decisions may be best explained in terms of a strong preference for x, but unsurprising decisions provide little information about x in particular. 2.2 Feature-based models We also consider a class of feature-based models that use surface features to order decisions. The ten features that we consider are shown in Figure 1d, where x is the effect of interest. As an example, the first feature specifies the number of effects chosen; because x is always among the chosen effects, decisions where few or no other effects belong to the chosen option suggest the strongest preference for x (when all effects are positive). This and the second feature were previously identified by Newtson [14]; we included the eight additional features shown in Figure 1d in an attempt to include all possible features that seemed both simple and relevant. We consider two methods for combining this set of features to order a set of decisions by how strongly they suggest a preference for x. The first model is a standard linear regression model, which we refer to as the weighted feature model. The model learns a weight for each feature, and the rank of a given decision is determined by a weighted sum of its features. The second model is a ranked feature model that sorts the observed decisions with respect to a strict ranking of the features. The top-ranked feature corresponds to the primary sort key, the second-ranked feature to the secondary sort key, and so on. For example, suppose that the top-ranked feature is the number of chosen effects and the second-ranked feature is the number of forgone options. Sorting the three decisions in Figure 1 according to this criterion produces the following ordering: 1a,1c,1b. This notion of sorting items on the basis of ranked features has been applied before to decision-making [15, 16] and other domains of psychology [17], but we are not aware of any previous applications to preference learning. Although our inverse decision-making and feature-based models represent two very different approaches, both may turn out to be valuable. An inverse decision-making approach may be the appropriate account of preference learning at Marr’s [18] computational level, and a feature-based approach may capture the psychological processes by which the computational-level account is implemented. Our goal, therefore, is not necessarily to accept one of these approaches and dismiss the other. Instead, we entertain three distinct possibilities. First, both approaches may account well for the data, which would support the idea that they are valid accounts operating at different levels of analysis. Second, the inverse decision-making approach may offer a better account, suggesting that process-level accounts other than the feature-based approach should be explored. Finally, the feature-based approach may offer a better account, suggesting that inverse decision-making does not constitute an appropriate computational-level account of preference learning. 3 Experiment 1: Positive effects Our first experiment focuses on decisions involving only positive effects. The full set of 47 decisions we used is shown in Figure 2c. This set includes every possible unique decision with up to five different effects, subject to the following constraints: (1) one of the effects (effect x) must always appear in the chosen option, (2) there are no repeated options, (3) each effect may appear in an option at most once, (4) only effects in the chosen option may be repeated in other options, and (5) when effects appear in multiple options, the number of effects is held constant across options. The first constraint is necessary for the sorting task, the second two constraints create a finite space of decisions, and the final two constraints limit attention to what we deemed the most interesting cases. Method 43 Carnegie Mellon undergraduates participated for course credit. Each participant was given a set of cards, with one decision printed on each card. The decisions were represented visually 4 (a) (c) Decisions 42 40 45 Mean human rankings 38 30 23 20 22 17 13 12 11 10 9 8 7 6 19 18 31 34 28 21 26 36 35 33 37 27 29 32 25 24 16 15 14 5 4 3 2 1 Absolute utility model rankings (b) Mean human rankings (Experiment 1) 47 43 44 46 45 38 37 36 34 35 30 32 33 31 29 28 24 26 27 25 21 19 22 20 18 16 17 12 13 7 6 11 5 9 4 10 8 1 2 3 42 40 41 39 47 46 44 41 43 39 23 15 14 Mean human rankings (Experiment 2) 1. dcbax 2. cbax 3. bax 4. ax 5. x 6. dcax | bcax 7. dx | cx | bx | ax 8. cax | bax 9. bdx | bcx | bax 10. dcx | bax 11. bx | ax 12. bdx | cax | bax 13. cx | bx | ax 14. d | cbax 15. c | bax 16. b | ax 17. d | c | bax 18. dc | bax 19. c | b | ax 20. dc | bx | ax 21. bdc | bax 22. ad | cx | bx | ax 23. d | c | b | ax 24. bad | bcx | bax 25. ac | bx | ax 26. cb | ax 27. cbad | cbax 28. dc | b | ax 29. ad | ac | bx | ax 30. ab | ax 31. bad | bax 32. dc | ab | ax 33. dcb | ax 34. a | x 35. bad | bac | bax 36. ac | ab | ax 37. ad | ac | ab | ax 38. b | a | x 39. ba | x 40. c | b | a | x 41. cb | a | x 42. d | c | b | a | x 43. cba | x 44. dc | ba | x 45. dc | b | a | x 46. dcb | a | x 47. dcba | x Figure 2: (a) Comparison between the absolute utility model rankings and the mean human rankings for Experiment 1. Each point represents one decision, numbered with respect to the list in panel c. (b) Comparison between the mean human rankings in Experiments 1 and 2. In both scatter plots, the solid diagonal lines indicate a perfect correspondence between the two sets of rankings. (c) The complete set of decisions, ordered by the mean human rankings from Experiment 1. Options are separated by vertical bars and the chosen option is always at the far right. Participants were always asked about a preference for effect x. as in Figure 1 but without the letter labels. Participants were told that the effects were different types of candy and each option was a bag containing one or more pieces of candy. They were asked to sort the cards by how strongly each decision suggested that the decision maker liked a particular target candy, labeled x in Figure 2c. They sorted the cards freely on a table but reported their final rankings by writing them on a sheet of paper, from weakest to strongest evidence. They were instructed to order the cards as completely as possible, but were told that they could assign the same ranking to a set of cards if they believed those cards provided equal evidence. 3.1 Results Two participants were excluded as outliers based on the criterion that their rankings for at least five decisions were at least three standard deviations from the mean rankings. We performed a hierarchical clustering analysis of the remaining 41 participants’ rankings using rank correlation as a similarity metric. Participants’ rankings were highly correlated: cutting the resulting dendrogram at 0.2 resulted in one cluster that included 33 participants and the second largest cluster included 5 Surprise MAE = 17.8 MAE = 7.0 MAE = 4.3 MAE = 17.3 MAE = 9.5 Human rankings Experiment 2 Negative effects Representativeness MAE = 2.3 MAE = 6.7 Experiment 1 Positive effects Relative utility MAE = 2.3 Human rankings Absolute utility Model rankings Model rankings Model rankings Model rankings Figure 3: Comparison between human rankings in both experiments and predicted rankings from four models. The solid diagonal lines indicate a perfect correspondence between human and model rankings. only 3 participants. Thus, we grouped all participants together and analyzed their mean rankings. The 0.2 threshold was chosen because it produced the most informative clustering in Experiment 2. Inverse decision-making models We implemented the inverse decision-making models using importance sampling with 5 million samples drawn from the prior distribution p(u). Because all the effects were positive, we used a prior on utilities that placed nearly all probability mass above zero (µ = 4, σ = 2). The mean human rankings are compared with the absolute utility model rankings in Figure 2a, and the mean human rankings are listed in order in 2c. Fractional rankings were used for both the human data and the model predictions. The human rankings in the figure are the means of participants’ fractional rankings. The first row of Figure 3 contains similar plots that allow comparison of the four models we considered. In these plots, the solid diagonal lines indicate a perfect correspondence between model and human rankings. Thus, the largest deviations from this line represent the largest deviations in the data from the model’s predictions. Figure 3 shows that the absolute and relative utility models make virtually identical predictions and both models provide a strong account of the human rankings as measured by mean absolute error (MAE = 2.3 in both cases). Moreover, both models correctly predict the highest ranked decision and the set of lowest ranked decisions. The only clear discrepancy between the model predictions and the data is the cluster of points at the lower left, labeled as Decisions 6–13 in Figure 2a. These are all cases in which effect x appears in all options and therefore these decisions provide no information about a decision maker’s preference for x. Consequently, the models assign the same ranking to this group as to the group of decisions in which there is only a single option (Decisions 1–5). Although people appeared to treat these groups somewhat differently, the models still correctly predict that the entire group of decisions 1–13 is ranked lower than all other decisions. The surprise and representativeness models do not perform nearly as well (MAE = 7.0 and 17.8, respectively). Although the surprise model captures some of the general trends in the human rankings, it makes several major errors. For example, consider Decision 7: dx|cx|bx|ax. This decision provides no information about a preference for x because it appears in every option. The decision is surprising, however, because a decision maker choosing at random from these options would make the observed choice only 1/4 of the time. The representativeness model performs even worse, primarily because it does not take into account alternative explanations for why an option was chosen, such as the fact that no other options were available (e.g., Decision 1 in Figure 2c). The failure of these models to adequately account for the data suggests that both the likelihood p(c|u, F) and marginal likelihood p(c|F) are important components of the absolute and relative utility models. Feature-based models We compared the performance of the absolute and relative utility models to our two feature-based models: the weighted feature and ranked feature models. For each participant, 6 (b) Ranked feature 10 10 5 Figure 4: Results of the feature-based model analysis from Experiment 1 for (a) the weighted feature models and (b) the ranked feature models. The histograms show the minimum number of features needed to match the accuracy (measured by MAE) of the absolute utility model for each participant. 15 5 1 2 3 4 5 6 >6 15 1 2 3 4 5 6 7 8 9 10 >10 Number of participants (a) Weighted feature Number of features needed we considered every subset of features1 in Figure 1d in order to determine the minimum number of features needed by the two models to achieve the same level of accuracy as the absolute utility model, as measured by mean absolute error. The results of these analyses are shown in Figure 4. For the majority of participants, at least four features were needed by both models to match the accuracy of the absolute utility model. For the weighted feature model, 14 participants could not be fit as well as the absolute utility model even when all ten features were considered. These results indicate that a feature-based account of people’s inferences in our task must be supplied with a relatively large number of features. By contrast, the inverse decision-making approach provides a relatively parsimonious account of the data. 4 Experiment 2: Negative effects Experiment 2 focused on a setting in which all effects are negative, motivated by the fact that the inverse decision-making models predict several major differences in orderings when effects are negative rather than positive. For instance, the absolute utility model’s relative rankings of the decisions in Figures 1a and 1b are reversed when all effects are negative rather than positive. Method 42 Carnegie Mellon undergraduates participated for course credit. The experimental design was identical to Experiment 1 except that participants were told that the effects were electric shocks at different body locations. They were asked to sort the cards on the basis of how strongly each decision suggested that the decision maker finds shocks at the target location relatively tolerable. The model predictions were derived in the same way as for Experiment 1, but with a prior distribution on utilities that placed nearly all probability mass below zero (µ = −4, σ = 2) to reflect the fact that effects were all negative. 4.1 Results Three participants were excluded as outliers by the same criterion applied in Experiment 1. The resulting mean rankings are compared with the corresponding rankings from Experiment 1 in Figure 2b. The figure shows that responses based on positive and negative effects were substantially different in a number of cases. Figure 3 shows how the mean rankings compare to the predictions of the four models we considered. Although the relative utility model is fairly accurate, no model achieves the same level of accuracy as the absolute and relative utility models in Experiment 1. In addition, the relative utility model provides a poor account of the responses of many individual participants. To better understand responses at the individual level, we repeated the hierarchical clustering analysis described in Experiment 1, which revealed that 29 participants could be grouped into one of four clusters, with the remaining participants each in their own clusters. We analyzed these four clusters independently, excluding the 10 participants that could not be naturally grouped. We compared the mean rankings of each cluster to the absolute and relative utility models, as well as all one- and two-feature weighted feature and ranked feature models. Figure 5 shows that the mean rankings of participants in Cluster 1 (N = 8) were best fit by the absolute utility model, the mean rankings of participants in Cluster 2 (N = 12) were best fit by the relative utility model, and the mean rankings of participants in Clusters 3 (N = 3) and 4 (N = 6) were better fit by feature-based models than by either the absolute or relative utility models. 1 A maximum of six features was considered for the ranked feature model because considering more features was computationally intractable. 7 Cluster 4 N =6 MAE = 4.9 MAE = 14.0 MAE = 7.9 MAE = 5.3 MAE = 2.6 MAE = 13.0 MAE = 6.2 Human rankings Relative utility Cluster 3 N =3 MAE = 2.6 Absolute utility Cluster 2 N = 12 Human rankings Cluster 1 N =8 Factors: 1,3 Factors: 1,8 MAE = 2.3 MAE = 5.2 Model rankings Best−fitting weighted feature Factors: 6,7 MAE = 4.0 Model rankings Model rankings Model rankings Human rankings Factors: 3,8 MAE = 4.8 Figure 5: Comparison between human rankings for four clusters of participants identified in Experiment 2 and predicted rankings from three models. Each point in the plots corresponds to one decision and the solid diagonal lines indicate a perfect correspondence between human and model rankings. The third row shows the predictions of the best-fitting two-factor weighted feature model for each cluster. The two factors listed refer to Figure 1d. To examine how well the models accounted for individuals’ rankings within each cluster, we compared the predictions of the inverse decision-making models to the best-fitting two-factor featurebased model for each participant. In Cluster 1, 7 out of 8 participants were best fit by the absolute utility model; in Cluster 2, 8 out of 12 participants were best fit by the relative utility model; in Clusters 3 and 4, all participants were better fit by feature-based models. No single feature-based model provided the best fit for more than two participants, suggesting that participants not fit well by the inverse decision-making models were not using a single alternative strategy. Applying the feature-based model analysis from Experiment 1 to the current results revealed that the weighted feature model required an average of 6.0 features to match the performance of the absolute utility model for participants in Cluster 1, and an average of 3.9 features to match the performance of the relative utility model for participants in Cluster 2. Thus, although a single model did not fit all participants well in the current experiment, many participants were fit well by one of the two inverse decision-making models, suggesting that this general approach is useful for explaining how people reason about negative effects as well as positive effects. 5 Conclusion In two experiments, we found that an inverse decision-making approach offered a good computational account of how people make judgments about others’ preferences. Although this approach is conceptually simple, our analyses indicated that it captures the influence of a fairly large number of relevant decision features. Indeed, the feature-based models that we considered as potential process models of preference learning could only match the performance of the inverse decision-making approach when supplied with a relatively large number of features. We feel that this result rules out the feature-based approach as psychologically implausible, meaning that alternative process-level accounts will need to be explored. One possibility is sampling, which has been proposed as a psychological mechanism for approximating probabilistic inferences [19, 20]. However, even if process models that use large numbers of features are considered plausible, the inverse decision-making approach provides a valuable computational-level account that helps to explain which decision features are informative. Acknowledgments This work was supported in part by the Pittsburgh Life Sciences Greenhouse Opportunity Fund and by NSF grant CDI-0835797. 8 References [1] D. McFadden. Conditional logit analysis of qualitative choice behavior. In P. Zarembka, editor, Frontiers in Econometrics. Amademic Press, New York, 1973. [2] C. G. Lucas, T. L. Griffiths, F. Xu, and C. Fawcett. A rational model of preference learning and choice prediction by children. In Proceedings of Neural Information Processing Systems 21, 2009. [3] L. Bergen, O. R. Evans, and J. B. Tenenbaum. Learning structured preferences. In Proceedings of the 32nd Annual Conference of the Cognitive Science Society, 2010. [4] A. Jern and C. Kemp. Decision factors that support preference learning. In Proceedings of the 33rd Annual Conference of the Cognitive Science Society, 2011. [5] T. Kushnir, F. Xu, and H. M. Wellman. Young children use statistical sampling to infer the preferences of other people. Psychological Science, 21(8):1134–1140, 2010. [6] L. Ma and F. Xu. Young children’s use of statistical sampling evidence to infer the subjectivity of preferences. Cognition, in press. [7] M. J. Doherty. Theory of Mind: How Children Understand Others’ Thoughts and Feelings. Psychology Press, New York, 2009. [8] R. N. Shepard, C. I. Hovland, and H. M. Jenkins. Learning and memorization of classifications. Psychological Monographs, 75, Whole No. 517, 1961. [9] D. N. Osherson, E. E. Smith, O. Wilkie, A. L´ pez, and E. Shafir. Category-based induction. Psychological o Review, 97(2):185–200, 1990. [10] E. A. Wasserman, S. M. Elek, D. L. Chatlosh, and A. G. Baker. Rating causal relations: Role of probability in judgments of response-outcome contingency. Journal of Experimental Psychology: Learning, Memory, and Cognition, 19(1):174–188, 1993. [11] R. D. Luce. Individual choice behavior. John Wiley, 1959. [12] D. Ariely, G. Loewenstein, and D. Prelec. Tom Sawyer and the construction of value. Journal of Economic Behavior & Organization, 60:1–10, 2006. [13] D. Kahneman and A. Tversky. Subjective probability: A judgment of representativeness. Cognitive Psychology, 3(3):430–454, 1972. [14] D. Newtson. Dispositional inference from effects of actions: Effects chosen and effects forgone. Journal of Experimental Social Psychology, 10:489–496, 1974. [15] P. C. Fishburn. Lexicographic orders, utilities and decision rules: A survey. Management Science, 20(11):1442–1471, 1974. [16] G. Gigerenzer and P. M. Todd. Fast and frugal heuristics: The adaptive toolbox. Oxford University Press, New York, 1999. [17] A. Prince and P. Smolensky. Optimality Theory: Constraint Interaction in Generative Grammar. WileyBlackwell, 2004. [18] D. Marr. Vision. W. H. Freeman, San Francisco, 1982. [19] A. N. Sanborn, T. L. Griffiths, and D. J. Navarro. Rational approximations to rational models: Alternative algorithms for category learning. Psychological Review, 117:1144–1167, 2010. [20] L. Shi and T. L. Griffiths. Neural implementation of Bayesian inference by importance sampling. In Proceedings of Neural Information Processing Systems 22, 2009. 9
3 0.60248101 34 nips-2011-An Unsupervised Decontamination Procedure For Improving The Reliability Of Human Judgments
Author: Michael C. Mozer, Benjamin Link, Harold Pashler
Abstract: Psychologists have long been struck by individuals’ limitations in expressing their internal sensations, impressions, and evaluations via rating scales. Instead of using an absolute scale, individuals rely on reference points from recent experience. This relativity of judgment limits the informativeness of responses on surveys, questionnaires, and evaluation forms. Fortunately, the cognitive processes that map stimuli to responses are not simply noisy, but rather are influenced by recent experience in a lawful manner. We explore techniques to remove sequential dependencies, and thereby decontaminate a series of ratings to obtain more meaningful human judgments. In our formulation, the problem is to infer latent (subjective) impressions from a sequence of stimulus labels (e.g., movie names) and responses. We describe an unsupervised approach that simultaneously recovers the impressions and parameters of a contamination model that predicts how recent judgments affect the current response. We test our iterated impression inference, or I3 , algorithm in three domains: rating the gap between dots, the desirability of a movie based on an advertisement, and the morality of an action. We demonstrate significant objective improvements in the quality of the recovered impressions. 1
4 0.60045141 48 nips-2011-Blending Autonomous Exploration and Apprenticeship Learning
Author: Thomas J. Walsh, Daniel K. Hewlett, Clayton T. Morrison
Abstract: We present theoretical and empirical results for a framework that combines the benefits of apprenticeship and autonomous reinforcement learning. Our approach modifies an existing apprenticeship learning framework that relies on teacher demonstrations and does not necessarily explore the environment. The first change is replacing previously used Mistake Bound model learners with a recently proposed framework that melds the KWIK and Mistake Bound supervised learning protocols. The second change is introducing a communication of expected utility from the student to the teacher. The resulting system only uses teacher traces when the agent needs to learn concepts it cannot efficiently learn on its own. 1
5 0.58915818 3 nips-2011-A Collaborative Mechanism for Crowdsourcing Prediction Problems
Author: Jacob D. Abernethy, Rafael M. Frongillo
Abstract: Machine Learning competitions such as the Netflix Prize have proven reasonably successful as a method of “crowdsourcing” prediction tasks. But these competitions have a number of weaknesses, particularly in the incentive structure they create for the participants. We propose a new approach, called a Crowdsourced Learning Mechanism, in which participants collaboratively “learn” a hypothesis for a given prediction task. The approach draws heavily from the concept of a prediction market, where traders bet on the likelihood of a future event. In our framework, the mechanism continues to publish the current hypothesis, and participants can modify this hypothesis by wagering on an update. The critical incentive property is that a participant will profit an amount that scales according to how much her update improves performance on a released test set. 1
6 0.58695012 41 nips-2011-Autonomous Learning of Action Models for Planning
7 0.47977015 280 nips-2011-Testing a Bayesian Measure of Representativeness Using a Large Image Database
8 0.43925905 21 nips-2011-Active Learning with a Drifting Distribution
9 0.39619106 130 nips-2011-Inductive reasoning about chimeric creatures
10 0.36289799 52 nips-2011-Clustering via Dirichlet Process Mixture Models for Portable Skill Discovery
11 0.35815296 22 nips-2011-Active Ranking using Pairwise Comparisons
12 0.35755646 256 nips-2011-Solving Decision Problems with Limited Information
13 0.34618717 35 nips-2011-An ideal observer model for identifying the reference frame of objects
14 0.33909565 219 nips-2011-Predicting response time and error rates in visual search
15 0.33882976 38 nips-2011-Anatomically Constrained Decoding of Finger Flexion from Electrocorticographic Signals
16 0.33560184 300 nips-2011-Variance Reduction in Monte-Carlo Tree Search
17 0.31013665 79 nips-2011-Efficient Offline Communication Policies for Factored Multiagent POMDPs
18 0.30802038 62 nips-2011-Continuous-Time Regression Models for Longitudinal Networks
19 0.29680154 15 nips-2011-A rational model of causal inference with continuous causes
20 0.29160306 20 nips-2011-Active Learning Ranking from Pairwise Preferences with Almost Optimal Query Complexity
topicId topicWeight
[(0, 0.02), (4, 0.042), (20, 0.036), (26, 0.037), (31, 0.09), (35, 0.267), (43, 0.066), (45, 0.105), (56, 0.031), (57, 0.027), (65, 0.019), (74, 0.044), (83, 0.044), (84, 0.011), (99, 0.047)]
simIndex simValue paperId paperTitle
same-paper 1 0.77216917 122 nips-2011-How Do Humans Teach: On Curriculum Learning and Teaching Dimension
Author: Faisal Khan, Bilge Mutlu, Xiaojin Zhu
Abstract: We study the empirical strategies that humans follow as they teach a target concept with a simple 1D threshold to a robot.1 Previous studies of computational teaching, particularly the teaching dimension model and the curriculum learning principle, offer contradictory predictions on what optimal strategy the teacher should follow in this teaching task. We show through behavioral studies that humans employ three distinct teaching strategies, one of which is consistent with the curriculum learning principle, and propose a novel theoretical framework as a potential explanation for this strategy. This framework, which assumes a teaching goal of minimizing the learner’s expected generalization error at each iteration, extends the standard teaching dimension model and offers a theoretical justification for curriculum learning. 1
2 0.6454981 245 nips-2011-Selecting the State-Representation in Reinforcement Learning
Author: Odalric-ambrym Maillard, Daniil Ryabko, Rémi Munos
Abstract: The problem of selecting the right state-representation in a reinforcement learning problem is considered. Several models (functions mapping past observations to a finite set) of the observations are given, and it is known that for at least one of these models the resulting state dynamics are indeed Markovian. Without knowing neither which of the models is the correct one, nor what are the probabilistic characteristics of the resulting MDP, it is required to obtain as much reward as the optimal policy for the correct model (or for the best of the correct models, if there are several). We propose an algorithm that achieves that, with a regret of order T 2/3 where T is the horizon time. 1
3 0.55757356 258 nips-2011-Sparse Bayesian Multi-Task Learning
Author: Shengbo Guo, Onno Zoeter, Cédric Archambeau
Abstract: We propose a new sparse Bayesian model for multi-task regression and classification. The model is able to capture correlations between tasks, or more specifically a low-rank approximation of the covariance matrix, while being sparse in the features. We introduce a general family of group sparsity inducing priors based on matrix-variate Gaussian scale mixtures. We show the amount of sparsity can be learnt from the data by combining an approximate inference approach with type II maximum likelihood estimation of the hyperparameters. Empirical evaluations on data sets from biology and vision demonstrate the applicability of the model, where on both regression and classification tasks it achieves competitive predictive performance compared to previously proposed methods. 1
4 0.55392039 37 nips-2011-Analytical Results for the Error in Filtering of Gaussian Processes
Author: Alex K. Susemihl, Ron Meir, Manfred Opper
Abstract: Bayesian filtering of stochastic stimuli has received a great deal of attention recently. It has been applied to describe the way in which biological systems dynamically represent and make decisions about the environment. There have been no exact results for the error in the biologically plausible setting of inference on point process, however. We present an exact analysis of the evolution of the meansquared error in a state estimation task using Gaussian-tuned point processes as sensors. This allows us to study the dynamics of the error of an optimal Bayesian decoder, providing insights into the limits obtainable in this task. This is done for Markovian and a class of non-Markovian Gaussian processes. We find that there is an optimal tuning width for which the error is minimized. This leads to a characterization of the optimal encoding for the setting as a function of the statistics of the stimulus, providing a mathematically sound primer for an ecological theory of sensory processing. 1
5 0.55370831 159 nips-2011-Learning with the weighted trace-norm under arbitrary sampling distributions
Author: Rina Foygel, Ohad Shamir, Nati Srebro, Ruslan Salakhutdinov
Abstract: We provide rigorous guarantees on learning with the weighted trace-norm under arbitrary sampling distributions. We show that the standard weighted-trace norm might fail when the sampling distribution is not a product distribution (i.e. when row and column indexes are not selected independently), present a corrected variant for which we establish strong learning guarantees, and demonstrate that it works better in practice. We provide guarantees when weighting by either the true or empirical sampling distribution, and suggest that even if the true distribution is known (or is uniform), weighting by the empirical distribution may be beneficial. 1
6 0.55277622 204 nips-2011-Online Learning: Stochastic, Constrained, and Smoothed Adversaries
7 0.55273736 67 nips-2011-Data Skeletonization via Reeb Graphs
8 0.55237401 180 nips-2011-Multiple Instance Filtering
9 0.55196065 84 nips-2011-EigenNet: A Bayesian hybrid of generative and conditional models for sparse learning
10 0.55075687 273 nips-2011-Structural equations and divisive normalization for energy-dependent component analysis
11 0.54926711 186 nips-2011-Noise Thresholds for Spectral Clustering
12 0.54889947 206 nips-2011-Optimal Reinforcement Learning for Gaussian Systems
13 0.54697347 29 nips-2011-Algorithms and hardness results for parallel large margin learning
14 0.54661757 57 nips-2011-Comparative Analysis of Viterbi Training and Maximum Likelihood Estimation for HMMs
15 0.54643947 231 nips-2011-Randomized Algorithms for Comparison-based Search
16 0.54577965 144 nips-2011-Learning Auto-regressive Models from Sequence and Non-sequence Data
17 0.54575771 276 nips-2011-Structured sparse coding via lateral inhibition
18 0.5456633 271 nips-2011-Statistical Tests for Optimization Efficiency
19 0.54566228 118 nips-2011-High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity
20 0.54467106 283 nips-2011-The Fixed Points of Off-Policy TD