nips nips2011 nips2011-22 knowledge-graph by maker-knowledge-mining

22 nips-2011-Active Ranking using Pairwise Comparisons

Source: pdf

Author: Kevin G. Jamieson, Robert Nowak

Abstract: This paper examines the problem of ranking a collection of objects using pairwise comparisons (rankings of two objects). In general, the ranking of n objects can be identiﬁed by standard sorting methods using n log2 n pairwise comparisons. We are interested in natural situations in which relationships among the objects may allow for ranking using far fewer pairwise comparisons. Speciﬁcally, we assume that the objects can be embedded into a d-dimensional Euclidean space and that the rankings reﬂect their relative distances from a common reference point in Rd . We show that under this assumption the number of possible rankings grows like n2d and demonstrate an algorithm that can identify a randomly selected ranking using just slightly more than d log n adaptively selected pairwise comparisons, on average. If instead the comparisons are chosen at random, then almost all pairwise comparisons must be made in order to identify any ranking. In addition, we propose a robust, error-tolerant algorithm that only requires that the pairwise comparisons are probably correct. Experimental studies with synthetic and real datasets support the conclusions of our theoretical analysis. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract This paper examines the problem of ranking a collection of objects using pairwise comparisons (rankings of two objects). [sent-6, score-1.29]

2 In general, the ranking of n objects can be identiﬁed by standard sorting methods using n log2 n pairwise comparisons. [sent-7, score-1.122]

3 We are interested in natural situations in which relationships among the objects may allow for ranking using far fewer pairwise comparisons. [sent-8, score-1.147]

4 Speciﬁcally, we assume that the objects can be embedded into a d-dimensional Euclidean space and that the rankings reﬂect their relative distances from a common reference point in Rd . [sent-9, score-0.793]

5 We show that under this assumption the number of possible rankings grows like n2d and demonstrate an algorithm that can identify a randomly selected ranking using just slightly more than d log n adaptively selected pairwise comparisons, on average. [sent-10, score-1.163]

6 If instead the comparisons are chosen at random, then almost all pairwise comparisons must be made in order to identify any ranking. [sent-11, score-0.713]

7 In addition, we propose a robust, error-tolerant algorithm that only requires that the pairwise comparisons are probably correct. [sent-12, score-0.538]

8 1 Introduction This paper addresses the problem of ranking a set of objects based on a limited number of pairwise comparisons (rankings between pairs of the objects). [sent-14, score-1.29]

9 A ranking over a set of n objects Θ = (θ1 , θ2 , . [sent-15, score-0.791]

10 A ranking uniquely determines the collection of pairwise comparisons between all pairs of objects. [sent-25, score-0.984]

11 The primary objective here is to bound the number of pairwise comparisons needed to correctly determine the ranking when the objects (and hence rankings) satisfy certain known structural constraints. [sent-26, score-1.348]

12 Speciﬁcally, we suppose that the objects may be embedded into a low-dimensional Euclidean space such that the ranking is consistent with distances in the space. [sent-27, score-0.936]

13 We wish to exploit such structure in order to discover the ranking using a very small number of pairwise comparisons. [sent-28, score-0.797]

14 There are practical and theoretical motivations for restricting our attention to pairwise rankings that are discussed in Section 2. [sent-30, score-0.616]

15 , in general, specifying a ranking requires Θ(n log n) bits of information. [sent-35, score-0.524]

16 This implies that at least this many pairwise comparisons are required without additional assumptions about the ranking. [sent-36, score-0.519]

17 In large-scale problems or when humans are queried for pairwise comparisons, obtaining this many pairwise comparisons may be impractical and therefore we consider situations in which the space of rankings is structured and thereby less complex. [sent-38, score-1.134]

18 1 A natural way to induce a structure on the space of rankings is to suppose that the objects can be embedded into a d-dimensional Euclidean space so that the distances between objects are consistent with the ranking. [sent-39, score-1.036]

19 It is not difﬁcult to show (see Section 3) that the number of full rankings that could arise from n objects embedded in Rd grows like n2d , and so specifying a ranking from this class requires only O(d log n) bits. [sent-42, score-1.144]

20 The main results of the paper show that under this assumption a randomly selected ranking can be determined using O(d log n) pairwise comparisons selected in an adaptive and sequential fashion, but almost all n pairwise 2 rankings are needed if they are picked randomly rather than selectively. [sent-43, score-1.748]

21 The objective is to learn the ranking by querying the reference for pairwise comparisons of the form qi,j := {θi ≺ θj }. [sent-47, score-1.099]

22 Every ranking σ can be speciﬁed by a reference point rσ ∈ Rd , as follows. [sent-54, score-0.6]

23 The Euclidean distances between the reference and objects are consistent with the ranking in the following sense: if the σ ranks θi ≺ θj , then θi − rσ < θj − rσ . [sent-55, score-1.011]

24 Let Σn,d denote the set of all possible rankings of the n objects that satisfy this embedding condition. [sent-56, score-0.7]

25 The ranking to be learned, speciﬁed by the reference (e. [sent-58, score-0.6]

26 We assume the embedding is given and our interest is minimizing the number of queries needed to learn the ranking, and for this we require a second assumption. [sent-65, score-0.513]

27 A2 Consistency: Every pairwise comparison is consistent with the ranking to be learned. [sent-66, score-0.866]

28 2, these two assumptions alone are not enough to rule out pathological arrangements of objects in the embedding for which at least Ω(n) queries must be made to recover the ranking. [sent-69, score-0.837]

29 If Mn (σ) denotes the number of pairwise comparisons requested by an algorithm to identify the ranking σ, then the average query complexity with respect to π is denoted by Eπ [Mn ]. [sent-76, score-1.323]

30 If the queries are chosen deterministically or randomly in advance of collecting the corresponding pairwise comparisons, then we show that almost all n pairwise comparisons queries are needed to identify a ranking under the assumptions above. [sent-81, score-2.106]

31 2 However, if the queries are selected in an adaptive and sequential fashion according to the algorithm 2 Query Selection Algorithm input: n objects in Rd initialize: objects θ1 , . [sent-82, score-1.091]

32 output: ranking of n objects q1 q2,3 ,2 θ2 q 1 ,3 θ1 θ3 Figure 2: Objects θ1 , θ2 , θ3 and queries. [sent-92, score-0.791]

33 The dotted (dashed) lines represent new queries whose labels are (are not) ambiguous given those labels. [sent-94, score-0.584]

34 in Figure 1, then we show that the number of pairwise rankings required to identify a ranking is no more than a constant multiple of d log n, on average. [sent-98, score-1.123]

35 The algorithm requests a query if and only if the corresponding pairwise ranking is ambiguous (see Section 4. [sent-99, score-1.251]

36 2), meaning that it cannot be determined from previously collected pairwise comparisons and the locations of the objects in Rd . [sent-100, score-0.805]

37 In the case of persistent errors (see Section 5) we show that at least O(n/ log n) objects can be correctly ranked in a partial ranking with high probability by requesting just O(d log2 n) pairwise comparisons. [sent-105, score-1.432]

38 Geometrical interpretations of our problem derive from the seminal works of [8] in ranking and [9] in learning. [sent-108, score-0.518]

39 In the ranking problem, the underlying halfspaces are not in general position and have strong dependencies with each other. [sent-110, score-0.562]

40 2 Motivation and related work The problem of learning a ranking from few pairwise comparisons is motivated by what we perceive as a signiﬁcant gap in the theory of ranking and permutation learning. [sent-115, score-1.469]

41 Most work in ranking assumes a passive approach to learning; pairwise comparisons or partial rankings are collected in a random or non-adaptive fashion and then aggregated to obtain a full ranking (cf. [sent-116, score-1.872]

42 However, this may be quite inefﬁcient in terms of the number of pairwise comparisons or partial rankings needed to learn the (full) ranking. [sent-118, score-0.844]

43 Furthermore, empirical evidence suggests that, even under complex ranking models, adaptively selecting pairwise comparisons can reduce the number needed to learn the ranking [18]. [sent-120, score-1.501]

44 For example, psychologists and market researchers collect pairwise comparisons to gauge human preferences over a set of objects, for scientiﬁc understanding or product placement. [sent-122, score-0.518]

45 We are interested in taking advantage of underlying structure in the set of objects in order to choose more informative pairwise comparison queries. [sent-125, score-0.682]

46 We focus on pairwise comparison queries for two reasons. [sent-127, score-0.722]

47 First, pairwise comparisons admit a halfspace representation in embedding spaces which allows for a geometrical approach to learning in such structured ranking spaces. [sent-128, score-1.221]

48 Second, pairwise comparisons are the most common form of queries in many applications, especially those involving human subjects. [sent-129, score-0.865]

49 We assume that the objects can be embedded in Rd and that the distances between objects and the reference are consistent with the ranking (Assumption A1). [sent-135, score-1.33]

50 The problem of learning a general function f : Rd → R using just pairwise comparisons that correctly ranks the objects embedded in Rd has previously been studied in the passive setting [13, 14, 15, 16]. [sent-136, score-0.965]

51 For example, the objects might have an embedding (in a feature space) and the ranking is generated by distances in this space. [sent-141, score-0.945]

52 3 Geometry of rankings from pairwise comparisons The embedding assumption A1 gives rise to geometrical interpretations of the ranking problem, which are developed in this section. [sent-144, score-1.503]

53 The pairwise comparison qi,j can be viewed as the membership query: is θi ranked before θj in the (full) ranking σ? [sent-145, score-0.962]

54 The set of all possible pairwise comparison queries can be represented as n distinct halfspaces in Rd . [sent-150, score-0.779]

55 The intersections of these halfspaces 2 partition Rd into a number of cells, and each one corresponds to a unique ranking of Θ. [sent-151, score-0.587]

56 Arbitrary rankings are not possible due to the embedding assumption A1, and recall that the set of rankings possible under A1 is denoted by Σn,d . [sent-152, score-0.713]

57 Let Q(n, d) denote the number of d-cells deﬁned by the hyperplane arrangement of pairwise comparisons between these objects (i. [sent-161, score-0.967]

58 (3) In the hyperplane arrangement induced by the n objects in d dimensions, each hyperplane is intersected by every other and is partitioned into Q(n − 1, d − 1) subsets or (d − 1)-cells. [sent-165, score-0.695]

59 2 Lower bounds on query complexity Since the cardinality of the set of possible rankings is |Σn,d | = Q(n, d), we have a simple lower bound on the number of queries needed to determine the ranking. [sent-177, score-0.93]

60 To reconstruct an arbitrary ranking σ ∈ Σn,d any algorithm will require at least log2 |Σn,d | = Θ(2d log2 n) pairwise comparisons. [sent-180, score-0.817]

61 For example, in the one-dimensional case (d = 1) the objects can be ordered and binary search can be used to select pairwise comparison queries, achieving the lower bound. [sent-185, score-0.683]

62 Even in two dimensions there are placements of the objects (still in general position) that produce d-cells in the partition induced by queries that have n − 1 faces (i. [sent-187, score-0.747]

63 3 Inefﬁciency of random queries The geometrical representation of the ranking problem reveals that randomly choosing pairwise comparison queries is inefﬁcient relative to the lower bound above. [sent-194, score-1.684]

64 The answers to m queries narrows the 2 set of possible rankings to a d-cell in Rd . [sent-196, score-0.667]

65 If it contains more than one of the partition cells, then the underlying ranking is ambiguous. [sent-198, score-0.53]

66 Then for all positive integers N ≥ m ≥ d the 2 probability that the m queries yield a unique ranking is m / N ≤ ( em )d . [sent-203, score-0.851]

67 Therefore, if the queries are randomly chosen, then d d we will need to ask almost all queries to guarantee that the inferred ranking is probably correct. [sent-210, score-1.275]

68 This places the reference rσ within a d-cell (deﬁned by the labels of the comparison queries between objects 1, . [sent-214, score-0.869]

69 A comparison query between object k and one of objects 1, . [sent-220, score-0.631]

70 Recall that each possible ranking can be represented by a reference point r ∈ Rd . [sent-240, score-0.6]

71 2 Characterization of an ambiguous query The characterization of an ambiguous query has interpretations in both the primal and dual spaces. [sent-245, score-0.88]

72 Assume that we have labels for all the pairwise comparisons of k − 1 objects. [sent-253, score-0.537]

73 In the primal, this separating hyperplane corresponds to a point lying on the hyperplane deﬁned by the associated pairwise comparison. [sent-259, score-0.613]

74 3 The probability that a query is ambiguous An essential component of the sequential algorithm of Figure 1 is the initial random order of the objects; every sequence in which it could consider objects is equally probable. [sent-261, score-0.783]

75 This allows us to state a nontrivial fact about the partial rankings of the ﬁrst k objects observed in this sequence. [sent-262, score-0.619]

76 If Σk,d denotes the set of possible k rankings of these k objects then every σ ∈ Σk,d is equally probable. [sent-266, score-0.61]

77 As described above, for 1 ≤ i ≤ k some of the pairwise comparisons qi,k+1 may be ambiguous. [sent-273, score-0.499]

78 Let A(k, d, U ) denote the probability of the event that the pairwise comparison qi,k+1 is ambiguous for i = 1, 2, . [sent-288, score-0.536]

79 This implies that the hyperplane representation of the pairwise comparison in the primal intersects the cell containing rσ (see Figure 2 for an illustration of this concept). [sent-295, score-0.588]

80 Consider the partition of Rd generated by the hyperplanes corresponding to pairwise comparisons between objects 1, . [sent-296, score-0.899]

81 Let P (k, d) denote the number of d-cells in this partition that are intersected by a hyperplane corresponding to one of the queries qi,k+1 , i ∈ {1, . [sent-300, score-0.608]

82 By Lemma 3, every d-cell in the partition induced by the k objects corresponds to an equally probable ranking of those objects. [sent-306, score-0.891]

83 Therefore, the probability that a query is ambiguous is the number of cells intersected by the corresponding P (k,d) hyperplane divided by the total number of d-cells, and therefore A(k, d, U ) = Q(k,d) . [sent-307, score-0.641]

84 Because the individual events of requesting each query are conditionally independent, the total numn−1 k ber of queries requested by the algorithm is just Mn = k=1 i=1 1{Request qi,k+1 }. [sent-309, score-0.73]

85 Let the random variable Mn denote the number of pairwise comparisons that are requested in the algorithm of Figure 1, then EU [Mn ] ≤ 2d log2 2d + 2da2 log n. [sent-313, score-0.602]

86 5 Robust sequential algorithm for query selection We now extend the algorithm of Figure 1 to situations in which the response to each query is only probably correct. [sent-315, score-0.594]

87 The robust algorithm operates in the same fashion as the algorithm in Figure 1, with the exception that when an ambiguous query is encountered several (equivalent) queries are made and a decision is based on the majority vote. [sent-318, score-0.839]

88 This voting procedure allows us to construct a ranking (or partial ranking) that is correct with high probability by requesting just O(d log2 n) queries where the extra log factor comes from voting. [sent-319, score-1.047]

89 If all ambiguous queries are decided by the majority vote of R independent responses to each such query, then with probability greater than 1 − 2n log2 (n) exp(− 1 (1 − 2p)2 R) this procedure correctly identiﬁes the correct 2 ranking and requests no more than O(Rd log n) queries on average. [sent-328, score-1.567]

90 Under this model, if two rankings differ by only a single pairwise comparison, then they cannot be distinguished with probability greater than 1 − p. [sent-331, score-0.613]

91 The best we can hope for is to exactly recover a partial ranking of the objects (i. [sent-333, score-0.825]

92 Henceforth, we will assume the noise is persistent and aim to exactly recover a partial ranking of the objects. [sent-336, score-0.575]

93 The key ingredient in the persistent noise setting is the design of a voting set for each ambiguous query encountered. [sent-337, score-0.517]

94 In practice, we cannot identify the subset of objects ranked between i and j, but it is contained within the set Ti,j , deﬁned to be the subset of objects θk such that qi,k , qk,j , or both are ambiguous. [sent-341, score-0.741]

95 Note however, if objects i and j are closely ranked, then Ti,j may be rather small, and so it is not 7 Numb er of query requests Table 1: Statistics for the algorithm robust to persistent noise of Section 5 with respect to all n pairwise comparisons. [sent-344, score-0.98]

96 This allows us to construct a probably correct ranking of the objects that are not passed over. [sent-357, score-0.869]

97 At the end of the process, some objects that were passed over may then be unambiguously ranked (based on queries made after they were passed over) or they can be ranked without voting (and without guarantees). [sent-359, score-1.025]

98 For any size-threshold R ≥ 1, with probability greater than 1 − 2n log2 (n) exp − 2 (1 − 2p)2 R the procedure above correctly ranks 9 at least n/(2R + 1) objects and requests no more than O(Rd log n) queries on average. [sent-363, score-0.865]

99 Because responses are noiseless, exact identiﬁcation of the ranking is guaranteed. [sent-371, score-0.523]

100 For each object θk , we use the embedding to derive pairwise comparison labels between all other objects as follows: (k) yi,j := 1{||θk − θi || < ||θk − θj ||}, which can be considered as the best approximation to the la˜ (k) bels yi,j (deﬁned above) in this embedding. [sent-386, score-0.886]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('ranking', 0.485), ('queries', 0.366), ('pairwise', 0.312), ('objects', 0.306), ('rankings', 0.279), ('query', 0.21), ('comparisons', 0.187), ('ambiguous', 0.18), ('hyperplane', 0.141), ('rd', 0.138), ('reference', 0.115), ('embedding', 0.115), ('ranked', 0.102), ('requested', 0.083), ('geometrical', 0.071), ('requesting', 0.071), ('voting', 0.071), ('object', 0.071), ('mn', 0.069), ('requests', 0.064), ('sequential', 0.062), ('halfspaces', 0.057), ('intersected', 0.056), ('prescriptions', 0.056), ('persistent', 0.056), ('embedded', 0.054), ('cells', 0.054), ('inef', 0.053), ('halfspace', 0.051), ('fashion', 0.051), ('hyperplanes', 0.049), ('doctor', 0.049), ('partition', 0.045), ('comparison', 0.044), ('situations', 0.044), ('lemma', 0.043), ('intersects', 0.043), ('ranks', 0.041), ('audio', 0.041), ('passed', 0.039), ('probably', 0.039), ('passive', 0.039), ('dual', 0.039), ('distances', 0.039), ('labels', 0.038), ('responses', 0.038), ('prescription', 0.037), ('noiseless', 0.034), ('partial', 0.034), ('ck', 0.033), ('interpretations', 0.033), ('needed', 0.032), ('robust', 0.032), ('label', 0.031), ('pathological', 0.03), ('induced', 0.03), ('response', 0.029), ('primal', 0.028), ('identify', 0.027), ('suppose', 0.027), ('signals', 0.027), ('request', 0.027), ('nowak', 0.027), ('wisconsin', 0.027), ('correctly', 0.026), ('hd', 0.026), ('corollary', 0.026), ('consistent', 0.025), ('theorem', 0.025), ('equally', 0.025), ('motivations', 0.025), ('teaching', 0.025), ('madison', 0.024), ('uniformly', 0.023), ('poly', 0.023), ('henceforth', 0.023), ('greater', 0.022), ('cardinality', 0.022), ('answers', 0.022), ('pd', 0.022), ('assumption', 0.021), ('similarities', 0.021), ('lower', 0.021), ('arrangement', 0.021), ('qj', 0.021), ('informative', 0.02), ('least', 0.02), ('euclidean', 0.02), ('log', 0.02), ('cell', 0.02), ('position', 0.02), ('interpretation', 0.02), ('preferences', 0.019), ('recursion', 0.019), ('separating', 0.019), ('sorting', 0.019), ('membership', 0.019), ('randomly', 0.019), ('denoted', 0.019), ('bits', 0.019)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000006 22 nips-2011-Active Ranking using Pairwise Comparisons

Author: Kevin G. Jamieson, Robert Nowak

2 0.22935522 231 nips-2011-Randomized Algorithms for Comparison-based Search

Author: Dominique Tschopp, Suhas Diggavi, Payam Delgosha, Soheil Mohajer

Abstract: This paper addresses the problem of ﬁnding the nearest neighbor (or one of the R-nearest neighbors) of a query object q in a database of n objects, when we can only use a comparison oracle. The comparison oracle, given two reference objects and a query object, returns the reference object most similar to the query object. The main problem we study is how to search the database for the nearest neighbor (NN) of a query, while minimizing the questions. The difﬁculty of this problem depends on properties of the underlying database. We show the importance of a characterization: combinatorial disorder D which deﬁnes approximate triangle n inequalities on ranks. We present a lower bound of Ω(D log D + D2 ) average number of questions in the search phase for any randomized algorithm, which demonstrates the fundamental role of D for worst case behavior. We develop 3 a randomized scheme for NN retrieval in O(D3 log2 n + D log2 n log log nD ) 3 questions. The learning requires asking O(nD3 log2 n + D log2 n log log nD ) questions and O(n log2 n/ log(2D)) bits to store.

3 0.16705956 35 nips-2011-An ideal observer model for identifying the reference frame of objects

Author: Joseph L. Austerweil, Abram L. Friesen, Thomas L. Griffiths

Abstract: The object people perceive in an image can depend on its orientation relative to the scene it is in (its reference frame). For example, the images of the symbols × and + differ by a 45 degree rotation. Although real scenes have multiple images and reference frames, psychologists have focused on scenes with only one reference frame. We propose an ideal observer model based on nonparametric Bayesian statistics for inferring the number of reference frames in a scene and their parameters. When an ambiguous image could be assigned to two conﬂicting reference frames, the model predicts two factors should inﬂuence the reference frame inferred for the image: The image should be more likely to share the reference frame of the closer object (proximity) and it should be more likely to share the reference frame containing the most objects (alignment). We conﬁrm people use both cues using a novel methodology that allows for easy testing of human reference frame inference. 1

4 0.15614508 229 nips-2011-Query-Aware MCMC

Author: Michael L. Wick, Andrew McCallum

Abstract: Traditional approaches to probabilistic inference such as loopy belief propagation and Gibbs sampling typically compute marginals for all the unobserved variables in a graphical model. However, in many real-world applications the user’s interests are focused on a subset of the variables, speciﬁed by a query. In this case it would be wasteful to uniformly sample, say, one million variables when the query concerns only ten. In this paper we propose a query-speciﬁc approach to MCMC that accounts for the query variables and their generalized mutual information with neighboring variables in order to achieve higher computational efﬁciency. Surprisingly there has been almost no previous work on query-aware MCMC. We demonstrate the success of our approach with positive experimental results on a wide range of graphical models. 1

5 0.14928038 90 nips-2011-Evaluating the inverse decision-making approach to preference learning

Author: Alan Jern, Christopher G. Lucas, Charles Kemp

Abstract: Psychologists have recently begun to develop computational accounts of how people infer others’ preferences from their behavior. The inverse decision-making approach proposes that people infer preferences by inverting a generative model of decision-making. Existing data sets, however, do not provide sufﬁcient resolution to thoroughly evaluate this approach. We introduce a new preference learning task that provides a benchmark for evaluating computational accounts and use it to compare the inverse decision-making approach to a feature-based approach, which relies on a discriminative combination of decision features. Our data support the inverse decision-making approach to preference learning. A basic principle of decision-making is that knowing people’s preferences allows us to predict how they will behave: if you know your friend likes comedies and hates horror ﬁlms, you can probably guess which of these options she will choose when she goes to the theater. Often, however, we do not know what other people like and we can only infer their preferences from their behavior. If you know that a different friend saw a comedy today, does that mean that he likes comedies in general? The conclusion you draw will likely depend on what else was playing and what movie choices he has made in the past. A goal for social cognition research is to develop a computational account of people’s ability to infer others’ preferences. One computational approach is based on inverse decision-making. This approach begins with a model of how someone’s preferences lead to a decision. Then, this model is inverted to determine the most likely preferences that motivated an observed decision. An alternative approach might simply learn a functional mapping between features of an observed decision and the preferences that motivated it. For instance, in your friend’s decision to see a comedy, perhaps the more movie options he turned down, the more likely it is that he has a true preference for comedies. The difference between the inverse decision-making approach and the feature-based approach maps onto the standard dichotomy between generative and discriminative models. Economists have developed an instance of the inverse decision-making approach known as the multinomial logit model [1] that has been widely used to infer consumer’s preferences from their choices. This model has recently been explored as a psychological model [2, 3, 4], but there are few behavioral data sets for evaluating it as a model of how people learn others’ preferences. Additionally, the data sets that do exist tend to be drawn from the developmental literature, which focuses on simple tasks that collect only one or two judgments from children [5, 6, 7]. The limitations of these data sets make it difﬁcult to evaluate the multinomial logit model with respect to alternative accounts of preference learning like the feature-based approach. In this paper, we use data from a new experimental task that elicits a detailed set of preference judgments from a single participant in order to evaluate the predictions of several preference learning models from both the inverse decision-making and feature-based classes. Our task requires each participant to sort a large number of observed decisions on the basis of how strongly they indicate 1 (a) (b) (c) d c c (d) b b a a x d x d c b a x 1. Number of chosen effects (−/+) 2. Number of forgone effects (+/+) 3. Number of forgone options (+/+) 4. Number of forgone options containing x (−/−) 5. Max/min number of effects in a forgone option (+/−) 6. Is x in every option? (−/−) 7. Chose only option with x? (+/+) 8. Is x the only difference between options? (+/+) 9. Do all options have same number of effects? (+/+) 10. Chose option with max/min number of effects? (−/−) Figure 1: (a)–(c) Examples of the decisions used in the experiments. Each column represents one option and the boxes represent different effects. The chosen option is indicated by the black rectangle. (d) Features used by the weighted feature and ranked feature models. Features 5 and 10 involved maxima in Experiment 1, which focused on all positive effects, and minima in Experiment 2, which focused on all negative effects. The signs in parentheses indicate the direction of the feature that suggests a stronger preference in Experiment 1 / Experiment 2. a preference for a chosen item. Because the number of decisions is large and these decisions vary on multiple dimensions, predicting how people will order them offers a challenging benchmark on which to compare computational models of preference learning. Data sets from these sorts of detailed tasks have proved fruitful in other domains. For example, data reported by Shepard, Hovland, and Jenkins [8]; Osherson, Smith, Wilkie, L´ pez, and Shaﬁr [9]; and Wasserman, Elek, Chatlosh, o and Baker [10] have motivated much subsequent research on category learning, inductive reasoning, and causal reasoning, respectively. We ﬁrst describe our preference learning task in detail. We then present several inverse decisionmaking and feature-based models of preference learning and compare these models’ predictions to people’s judgments in two experiments. The data are well predicted by models that follow the inverse decision-making approach, suggesting that this computational approach may help explain how people learn others’ preferences. 1 Multi-attribute decisions and revealed preferences We designed a task that can be used to elicit a large number of preference judgments from a single participant. The task involves a set of observed multi-attribute decisions, some examples of which are represented visually in Figure 1. Each decision is among a set of options and each option produces a set of effects. Figure 1 shows several decisions involving a total of ﬁve effects distributed among up to ﬁve options. The differently colored boxes represent different effects and the chosen option is marked by a black rectangle. For example, 1a shows a choice between an option with four effects and an option with a single effect; here, the decision maker chose the second option. In our task, people are asked to rank a large number of these decisions by how strongly they suggest that the decision maker had a preference for a particular effect (e.g., effect x in Figure 1). By imposing some minimal constraints, the space of unique multi-attribute decisions is ﬁnite and we can obtain rankings for every decision in the space. For example, Figure 2c shows a complete list of 47 unique decisions involving up to ﬁve effects, subject to several constraints described later. Three of these decisions are shown in Figure 1. If all the effects are positive—pieces of candy, for example—the ﬁrst decision (1a) suggests a strong preference for candy x, because the decision maker turned down four pieces in favor of one. The second decision (1b), however, offers much weaker evidence because nearly everyone would choose four pieces of candy over one, even without a speciﬁc preference for x. The third decision (1c) provides evidence that is strong but perhaps not quite as strong as the ﬁrst decision. When all effects are negative—like electric shocks at different body locations—decision makers may still ﬁnd some effects more tolerable than others, but different inferences are sometimes supported. For example, for negative effects, 1a provides weak evidence that x is relatively tolerable because nearly everyone would choose one shock over four. 2 A computational account of preference learning We now describe a simple computational model for learning a person’s preferences after observing that person make a decision like the ones in Figure 1. We assume that there are n available options 2 {o1 , . . . , on }, each of which produces one or more effects from the set {f1 , f2 , ..., fm }. For simplicity, we assume that effects are binary. Let ui denote the utility the decision maker assigns to effect fi . We begin by specifying a model of decision-making that makes the standard assumptions that decision makers tend to choose things with greater utility and that utilities are additive. That is, if fj is a binary vector indicating the effects produced by option oj and u is a vector of utilities assigned to each of the m effects, then the total utility associated with option oj can be expressed as Uj = fj T u. We complete the speciﬁcation of the model by applying the Luce choice rule [11], a common psychological model of choice behavior, as the function that chooses among the options: p(c = oj |u, f ) = exp(Uj ) = exp(Uk ) n k=1 exp(fj T u) n T k=1 exp(fk u) (1) where c denotes the choice made. This model can predict the choice someone will make among a speciﬁed set of options, given the utilities that person assigns to the effects in each option. To obtain estimates of someone’s utilities, we invert this model by applying Bayes’ rule: p(u|c, F) = p(c|u, F)p(u) p(c|F) (2) where F = {f1 , . . . , fn } speciﬁes the available options and their corresponding effects. This is the multinomial logit model [1], a standard econometric model. In order to apply Equation 2 we must specify a prior p(u) on the utilities. We adopt a standard approach that places independent Gaussian priors on the utilities: ui ∼ N (µ, σ 2 ). For decisions where effects are positive—like candies—we set µ = 2σ, which corresponds to a prior distribution that places approximately 2% of the probability mass below zero. Similarly, for negative effects—like electric shocks—we set µ = −2σ. 2.1 Ordering a set of observed decisions Equation 2 speciﬁes a posterior probability distribution over utilities for a single observed decision but does not provide a way to compare the inferences drawn from multiple decisions for the purposes of ordering them. Suppose we are interested in a decision maker’s preference for effect x and we wish to order a set of decisions by how strongly they support this preference. Two criteria for ordering the decisions are as follows: Absolute utility Relative utility p(c|ux , F)p(ux ) p(c|F) p(c|∀j ux ≥ uj , F)p(∀j ux ≥ uj ) p(∀j ux ≥ uj |c, F) = p(c|F) E(ux |c, F) = Eux The absolute utility model orders decisions by the mean posterior utility for effect x. This criterion is perhaps the most natural way to assess how much a decision indicates a preference for x, but it requires an inference about the utility of x in isolation, and research suggests that people often think about the utility of an effect only in relation to other salient possibilities [12]. The relative utility model applies this idea to preference learning by ordering decisions based on how strongly they suggest that x has a greater utility than all other effects. The decisions in Figures 1b and 1c are cases where the two models lead to different predictions. If the effects are all negative (e.g., electric shocks), the absolute utility model predicts that 1b provides stronger evidence for a tolerance for x because the decision maker chose to receive four shocks instead of just one. The relative utility model predicts that 1c provides stronger evidence because 1b offers no way to determine the relative tolerance of the four chosen effects with respect to one another. Like all generative models, the absolute and relative models incorporate three qualitatively different components: the likelihood term p(c|u, F), the prior p(u), and the reciprocal of the marginal likelihood 1/p(c|F). We assume that the total number of effects is ﬁxed in advance and, as a result, the prior term will be the same for all decisions that we consider. The two other components, however, will vary across decisions. The inverse decision-making approach predicts that both components should inﬂuence preference judgments, and we will test this prediction by comparing our 3 two inverse decision-making models to two alternatives that rely only one of these components as an ordering criterion: p(c|∀j ux ≥ uj , F) 1/p(c|F) Representativeness Surprise The representativeness model captures how likely the observed decision would be if the utility for x were high, and previous research has shown that people sometimes rely on a representativeness computation of this kind [13]. The surprise model captures how unexpected the observed decision is overall; surprising decisions may be best explained in terms of a strong preference for x, but unsurprising decisions provide little information about x in particular. 2.2 Feature-based models We also consider a class of feature-based models that use surface features to order decisions. The ten features that we consider are shown in Figure 1d, where x is the effect of interest. As an example, the ﬁrst feature speciﬁes the number of effects chosen; because x is always among the chosen effects, decisions where few or no other effects belong to the chosen option suggest the strongest preference for x (when all effects are positive). This and the second feature were previously identiﬁed by Newtson [14]; we included the eight additional features shown in Figure 1d in an attempt to include all possible features that seemed both simple and relevant. We consider two methods for combining this set of features to order a set of decisions by how strongly they suggest a preference for x. The ﬁrst model is a standard linear regression model, which we refer to as the weighted feature model. The model learns a weight for each feature, and the rank of a given decision is determined by a weighted sum of its features. The second model is a ranked feature model that sorts the observed decisions with respect to a strict ranking of the features. The top-ranked feature corresponds to the primary sort key, the second-ranked feature to the secondary sort key, and so on. For example, suppose that the top-ranked feature is the number of chosen effects and the second-ranked feature is the number of forgone options. Sorting the three decisions in Figure 1 according to this criterion produces the following ordering: 1a,1c,1b. This notion of sorting items on the basis of ranked features has been applied before to decision-making [15, 16] and other domains of psychology [17], but we are not aware of any previous applications to preference learning. Although our inverse decision-making and feature-based models represent two very different approaches, both may turn out to be valuable. An inverse decision-making approach may be the appropriate account of preference learning at Marr’s [18] computational level, and a feature-based approach may capture the psychological processes by which the computational-level account is implemented. Our goal, therefore, is not necessarily to accept one of these approaches and dismiss the other. Instead, we entertain three distinct possibilities. First, both approaches may account well for the data, which would support the idea that they are valid accounts operating at different levels of analysis. Second, the inverse decision-making approach may offer a better account, suggesting that process-level accounts other than the feature-based approach should be explored. Finally, the feature-based approach may offer a better account, suggesting that inverse decision-making does not constitute an appropriate computational-level account of preference learning. 3 Experiment 1: Positive effects Our ﬁrst experiment focuses on decisions involving only positive effects. The full set of 47 decisions we used is shown in Figure 2c. This set includes every possible unique decision with up to ﬁve different effects, subject to the following constraints: (1) one of the effects (effect x) must always appear in the chosen option, (2) there are no repeated options, (3) each effect may appear in an option at most once, (4) only effects in the chosen option may be repeated in other options, and (5) when effects appear in multiple options, the number of effects is held constant across options. The ﬁrst constraint is necessary for the sorting task, the second two constraints create a ﬁnite space of decisions, and the ﬁnal two constraints limit attention to what we deemed the most interesting cases. Method 43 Carnegie Mellon undergraduates participated for course credit. Each participant was given a set of cards, with one decision printed on each card. The decisions were represented visually 4 (a) (c) Decisions 42 40 45 Mean human rankings 38 30 23 20 22 17 13 12 11 10 9 8 7 6 19 18 31 34 28 21 26 36 35 33 37 27 29 32 25 24 16 15 14 5 4 3 2 1 Absolute utility model rankings (b) Mean human rankings (Experiment 1) 47 43 44 46 45 38 37 36 34 35 30 32 33 31 29 28 24 26 27 25 21 19 22 20 18 16 17 12 13 7 6 11 5 9 4 10 8 1 2 3 42 40 41 39 47 46 44 41 43 39 23 15 14 Mean human rankings (Experiment 2) 1. dcbax 2. cbax 3. bax 4. ax 5. x 6. dcax | bcax 7. dx | cx | bx | ax 8. cax | bax 9. bdx | bcx | bax 10. dcx | bax 11. bx | ax 12. bdx | cax | bax 13. cx | bx | ax 14. d | cbax 15. c | bax 16. b | ax 17. d | c | bax 18. dc | bax 19. c | b | ax 20. dc | bx | ax 21. bdc | bax 22. ad | cx | bx | ax 23. d | c | b | ax 24. bad | bcx | bax 25. ac | bx | ax 26. cb | ax 27. cbad | cbax 28. dc | b | ax 29. ad | ac | bx | ax 30. ab | ax 31. bad | bax 32. dc | ab | ax 33. dcb | ax 34. a | x 35. bad | bac | bax 36. ac | ab | ax 37. ad | ac | ab | ax 38. b | a | x 39. ba | x 40. c | b | a | x 41. cb | a | x 42. d | c | b | a | x 43. cba | x 44. dc | ba | x 45. dc | b | a | x 46. dcb | a | x 47. dcba | x Figure 2: (a) Comparison between the absolute utility model rankings and the mean human rankings for Experiment 1. Each point represents one decision, numbered with respect to the list in panel c. (b) Comparison between the mean human rankings in Experiments 1 and 2. In both scatter plots, the solid diagonal lines indicate a perfect correspondence between the two sets of rankings. (c) The complete set of decisions, ordered by the mean human rankings from Experiment 1. Options are separated by vertical bars and the chosen option is always at the far right. Participants were always asked about a preference for effect x. as in Figure 1 but without the letter labels. Participants were told that the effects were different types of candy and each option was a bag containing one or more pieces of candy. They were asked to sort the cards by how strongly each decision suggested that the decision maker liked a particular target candy, labeled x in Figure 2c. They sorted the cards freely on a table but reported their ﬁnal rankings by writing them on a sheet of paper, from weakest to strongest evidence. They were instructed to order the cards as completely as possible, but were told that they could assign the same ranking to a set of cards if they believed those cards provided equal evidence. 3.1 Results Two participants were excluded as outliers based on the criterion that their rankings for at least ﬁve decisions were at least three standard deviations from the mean rankings. We performed a hierarchical clustering analysis of the remaining 41 participants’ rankings using rank correlation as a similarity metric. Participants’ rankings were highly correlated: cutting the resulting dendrogram at 0.2 resulted in one cluster that included 33 participants and the second largest cluster included 5 Surprise MAE = 17.8 MAE = 7.0 MAE = 4.3 MAE = 17.3 MAE = 9.5 Human rankings Experiment 2 Negative effects Representativeness MAE = 2.3 MAE = 6.7 Experiment 1 Positive effects Relative utility MAE = 2.3 Human rankings Absolute utility Model rankings Model rankings Model rankings Model rankings Figure 3: Comparison between human rankings in both experiments and predicted rankings from four models. The solid diagonal lines indicate a perfect correspondence between human and model rankings. only 3 participants. Thus, we grouped all participants together and analyzed their mean rankings. The 0.2 threshold was chosen because it produced the most informative clustering in Experiment 2. Inverse decision-making models We implemented the inverse decision-making models using importance sampling with 5 million samples drawn from the prior distribution p(u). Because all the effects were positive, we used a prior on utilities that placed nearly all probability mass above zero (µ = 4, σ = 2). The mean human rankings are compared with the absolute utility model rankings in Figure 2a, and the mean human rankings are listed in order in 2c. Fractional rankings were used for both the human data and the model predictions. The human rankings in the ﬁgure are the means of participants’ fractional rankings. The ﬁrst row of Figure 3 contains similar plots that allow comparison of the four models we considered. In these plots, the solid diagonal lines indicate a perfect correspondence between model and human rankings. Thus, the largest deviations from this line represent the largest deviations in the data from the model’s predictions. Figure 3 shows that the absolute and relative utility models make virtually identical predictions and both models provide a strong account of the human rankings as measured by mean absolute error (MAE = 2.3 in both cases). Moreover, both models correctly predict the highest ranked decision and the set of lowest ranked decisions. The only clear discrepancy between the model predictions and the data is the cluster of points at the lower left, labeled as Decisions 6–13 in Figure 2a. These are all cases in which effect x appears in all options and therefore these decisions provide no information about a decision maker’s preference for x. Consequently, the models assign the same ranking to this group as to the group of decisions in which there is only a single option (Decisions 1–5). Although people appeared to treat these groups somewhat differently, the models still correctly predict that the entire group of decisions 1–13 is ranked lower than all other decisions. The surprise and representativeness models do not perform nearly as well (MAE = 7.0 and 17.8, respectively). Although the surprise model captures some of the general trends in the human rankings, it makes several major errors. For example, consider Decision 7: dx|cx|bx|ax. This decision provides no information about a preference for x because it appears in every option. The decision is surprising, however, because a decision maker choosing at random from these options would make the observed choice only 1/4 of the time. The representativeness model performs even worse, primarily because it does not take into account alternative explanations for why an option was chosen, such as the fact that no other options were available (e.g., Decision 1 in Figure 2c). The failure of these models to adequately account for the data suggests that both the likelihood p(c|u, F) and marginal likelihood p(c|F) are important components of the absolute and relative utility models. Feature-based models We compared the performance of the absolute and relative utility models to our two feature-based models: the weighted feature and ranked feature models. For each participant, 6 (b) Ranked feature 10 10 5 Figure 4: Results of the feature-based model analysis from Experiment 1 for (a) the weighted feature models and (b) the ranked feature models. The histograms show the minimum number of features needed to match the accuracy (measured by MAE) of the absolute utility model for each participant. 15 5 1 2 3 4 5 6 >6 15 1 2 3 4 5 6 7 8 9 10 >10 Number of participants (a) Weighted feature Number of features needed we considered every subset of features1 in Figure 1d in order to determine the minimum number of features needed by the two models to achieve the same level of accuracy as the absolute utility model, as measured by mean absolute error. The results of these analyses are shown in Figure 4. For the majority of participants, at least four features were needed by both models to match the accuracy of the absolute utility model. For the weighted feature model, 14 participants could not be ﬁt as well as the absolute utility model even when all ten features were considered. These results indicate that a feature-based account of people’s inferences in our task must be supplied with a relatively large number of features. By contrast, the inverse decision-making approach provides a relatively parsimonious account of the data. 4 Experiment 2: Negative effects Experiment 2 focused on a setting in which all effects are negative, motivated by the fact that the inverse decision-making models predict several major differences in orderings when effects are negative rather than positive. For instance, the absolute utility model’s relative rankings of the decisions in Figures 1a and 1b are reversed when all effects are negative rather than positive. Method 42 Carnegie Mellon undergraduates participated for course credit. The experimental design was identical to Experiment 1 except that participants were told that the effects were electric shocks at different body locations. They were asked to sort the cards on the basis of how strongly each decision suggested that the decision maker ﬁnds shocks at the target location relatively tolerable. The model predictions were derived in the same way as for Experiment 1, but with a prior distribution on utilities that placed nearly all probability mass below zero (µ = −4, σ = 2) to reﬂect the fact that effects were all negative. 4.1 Results Three participants were excluded as outliers by the same criterion applied in Experiment 1. The resulting mean rankings are compared with the corresponding rankings from Experiment 1 in Figure 2b. The ﬁgure shows that responses based on positive and negative effects were substantially different in a number of cases. Figure 3 shows how the mean rankings compare to the predictions of the four models we considered. Although the relative utility model is fairly accurate, no model achieves the same level of accuracy as the absolute and relative utility models in Experiment 1. In addition, the relative utility model provides a poor account of the responses of many individual participants. To better understand responses at the individual level, we repeated the hierarchical clustering analysis described in Experiment 1, which revealed that 29 participants could be grouped into one of four clusters, with the remaining participants each in their own clusters. We analyzed these four clusters independently, excluding the 10 participants that could not be naturally grouped. We compared the mean rankings of each cluster to the absolute and relative utility models, as well as all one- and two-feature weighted feature and ranked feature models. Figure 5 shows that the mean rankings of participants in Cluster 1 (N = 8) were best ﬁt by the absolute utility model, the mean rankings of participants in Cluster 2 (N = 12) were best ﬁt by the relative utility model, and the mean rankings of participants in Clusters 3 (N = 3) and 4 (N = 6) were better ﬁt by feature-based models than by either the absolute or relative utility models. 1 A maximum of six features was considered for the ranked feature model because considering more features was computationally intractable. 7 Cluster 4 N =6 MAE = 4.9 MAE = 14.0 MAE = 7.9 MAE = 5.3 MAE = 2.6 MAE = 13.0 MAE = 6.2 Human rankings Relative utility Cluster 3 N =3 MAE = 2.6 Absolute utility Cluster 2 N = 12 Human rankings Cluster 1 N =8 Factors: 1,3 Factors: 1,8 MAE = 2.3 MAE = 5.2 Model rankings Best−fitting weighted feature Factors: 6,7 MAE = 4.0 Model rankings Model rankings Model rankings Human rankings Factors: 3,8 MAE = 4.8 Figure 5: Comparison between human rankings for four clusters of participants identiﬁed in Experiment 2 and predicted rankings from three models. Each point in the plots corresponds to one decision and the solid diagonal lines indicate a perfect correspondence between human and model rankings. The third row shows the predictions of the best-ﬁtting two-factor weighted feature model for each cluster. The two factors listed refer to Figure 1d. To examine how well the models accounted for individuals’ rankings within each cluster, we compared the predictions of the inverse decision-making models to the best-ﬁtting two-factor featurebased model for each participant. In Cluster 1, 7 out of 8 participants were best ﬁt by the absolute utility model; in Cluster 2, 8 out of 12 participants were best ﬁt by the relative utility model; in Clusters 3 and 4, all participants were better ﬁt by feature-based models. No single feature-based model provided the best ﬁt for more than two participants, suggesting that participants not ﬁt well by the inverse decision-making models were not using a single alternative strategy. Applying the feature-based model analysis from Experiment 1 to the current results revealed that the weighted feature model required an average of 6.0 features to match the performance of the absolute utility model for participants in Cluster 1, and an average of 3.9 features to match the performance of the relative utility model for participants in Cluster 2. Thus, although a single model did not ﬁt all participants well in the current experiment, many participants were ﬁt well by one of the two inverse decision-making models, suggesting that this general approach is useful for explaining how people reason about negative effects as well as positive effects. 5 Conclusion In two experiments, we found that an inverse decision-making approach offered a good computational account of how people make judgments about others’ preferences. Although this approach is conceptually simple, our analyses indicated that it captures the inﬂuence of a fairly large number of relevant decision features. Indeed, the feature-based models that we considered as potential process models of preference learning could only match the performance of the inverse decision-making approach when supplied with a relatively large number of features. We feel that this result rules out the feature-based approach as psychologically implausible, meaning that alternative process-level accounts will need to be explored. One possibility is sampling, which has been proposed as a psychological mechanism for approximating probabilistic inferences [19, 20]. However, even if process models that use large numbers of features are considered plausible, the inverse decision-making approach provides a valuable computational-level account that helps to explain which decision features are informative. Acknowledgments This work was supported in part by the Pittsburgh Life Sciences Greenhouse Opportunity Fund and by NSF grant CDI-0835797. 8 References [1] D. McFadden. Conditional logit analysis of qualitative choice behavior. In P. Zarembka, editor, Frontiers in Econometrics. Amademic Press, New York, 1973. [2] C. G. Lucas, T. L. Grifﬁths, F. Xu, and C. Fawcett. A rational model of preference learning and choice prediction by children. In Proceedings of Neural Information Processing Systems 21, 2009. [3] L. Bergen, O. R. Evans, and J. B. Tenenbaum. Learning structured preferences. In Proceedings of the 32nd Annual Conference of the Cognitive Science Society, 2010. [4] A. Jern and C. Kemp. Decision factors that support preference learning. In Proceedings of the 33rd Annual Conference of the Cognitive Science Society, 2011. [5] T. Kushnir, F. Xu, and H. M. Wellman. Young children use statistical sampling to infer the preferences of other people. Psychological Science, 21(8):1134–1140, 2010. [6] L. Ma and F. Xu. Young children’s use of statistical sampling evidence to infer the subjectivity of preferences. Cognition, in press. [7] M. J. Doherty. Theory of Mind: How Children Understand Others’ Thoughts and Feelings. Psychology Press, New York, 2009. [8] R. N. Shepard, C. I. Hovland, and H. M. Jenkins. Learning and memorization of classiﬁcations. Psychological Monographs, 75, Whole No. 517, 1961. [9] D. N. Osherson, E. E. Smith, O. Wilkie, A. L´ pez, and E. Shaﬁr. Category-based induction. Psychological o Review, 97(2):185–200, 1990. [10] E. A. Wasserman, S. M. Elek, D. L. Chatlosh, and A. G. Baker. Rating causal relations: Role of probability in judgments of response-outcome contingency. Journal of Experimental Psychology: Learning, Memory, and Cognition, 19(1):174–188, 1993. [11] R. D. Luce. Individual choice behavior. John Wiley, 1959. [12] D. Ariely, G. Loewenstein, and D. Prelec. Tom Sawyer and the construction of value. Journal of Economic Behavior & Organization, 60:1–10, 2006. [13] D. Kahneman and A. Tversky. Subjective probability: A judgment of representativeness. Cognitive Psychology, 3(3):430–454, 1972. [14] D. Newtson. Dispositional inference from effects of actions: Effects chosen and effects forgone. Journal of Experimental Social Psychology, 10:489–496, 1974. [15] P. C. Fishburn. Lexicographic orders, utilities and decision rules: A survey. Management Science, 20(11):1442–1471, 1974. [16] G. Gigerenzer and P. M. Todd. Fast and frugal heuristics: The adaptive toolbox. Oxford University Press, New York, 1999. [17] A. Prince and P. Smolensky. Optimality Theory: Constraint Interaction in Generative Grammar. WileyBlackwell, 2004. [18] D. Marr. Vision. W. H. Freeman, San Francisco, 1982. [19] A. N. Sanborn, T. L. Grifﬁths, and D. J. Navarro. Rational approximations to rational models: Alternative algorithms for category learning. Psychological Review, 117:1144–1167, 2010. [20] L. Shi and T. L. Grifﬁths. Neural implementation of Bayesian inference by importance sampling. In Proceedings of Neural Information Processing Systems 22, 2009. 9

6 0.13615523 20 nips-2011-Active Learning Ranking from Pairwise Preferences with Almost Optimal Query Complexity

7 0.10923701 205 nips-2011-Online Submodular Set Cover, Ranking, and Repeated Active Learning

8 0.097298689 126 nips-2011-Im2Text: Describing Images Using 1 Million Captioned Photographs

9 0.092246778 7 nips-2011-A Machine Learning Approach to Predict Chemical Reactions

10 0.086376011 303 nips-2011-Video Annotation and Tracking with Active Learning

11 0.084233314 157 nips-2011-Learning to Search Efficiently in High Dimensions

12 0.082576118 119 nips-2011-Higher-Order Correlation Clustering for Image Segmentation

13 0.082266949 304 nips-2011-Why The Brain Separates Face Recognition From Object Recognition

14 0.075387001 1 nips-2011-$\theta$-MRF: Capturing Spatial and Semantic Structure in the Parameters for Scene Understanding

15 0.075208575 21 nips-2011-Active Learning with a Drifting Distribution

16 0.0682851 247 nips-2011-Semantic Labeling of 3D Point Clouds for Indoor Scenes

17 0.067588083 207 nips-2011-Optimal learning rates for least squares SVMs using Gaussian kernels

18 0.067012258 43 nips-2011-Bayesian Partitioning of Large-Scale Distance Data

19 0.065960392 153 nips-2011-Learning large-margin halfspaces with more malicious noise

20 0.065743797 162 nips-2011-Lower Bounds for Passive and Active Learning

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.175), (1, 0.038), (2, -0.106), (3, 0.07), (4, 0.067), (5, 0.016), (6, -0.098), (7, -0.107), (8, 0.052), (9, 0.055), (10, -0.013), (11, 0.024), (12, 0.039), (13, -0.03), (14, 0.334), (15, -0.085), (16, 0.15), (17, 0.044), (18, -0.06), (19, 0.048), (20, 0.052), (21, -0.02), (22, 0.063), (23, 0.016), (24, -0.144), (25, -0.022), (26, 0.175), (27, 0.095), (28, -0.032), (29, -0.133), (30, 0.107), (31, -0.095), (32, -0.002), (33, 0.046), (34, -0.042), (35, 0.031), (36, -0.104), (37, 0.042), (38, 0.044), (39, 0.06), (40, -0.002), (41, 0.032), (42, 0.082), (43, -0.044), (44, -0.085), (45, 0.117), (46, 0.097), (47, 0.028), (48, -0.046), (49, -0.03)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.98083335 22 nips-2011-Active Ranking using Pairwise Comparisons

Author: Kevin G. Jamieson, Robert Nowak

2 0.79817253 231 nips-2011-Randomized Algorithms for Comparison-based Search

Author: Dominique Tschopp, Suhas Diggavi, Payam Delgosha, Soheil Mohajer

3 0.71990216 20 nips-2011-Active Learning Ranking from Pairwise Preferences with Almost Optimal Query Complexity

Author: Nir Ailon

Abstract: Given a set V of n elements we wish to linearly order them using pairwise preference labels which may be non-transitive (due to irrationality or arbitrary noise). The goal is to linearly order the elements while disagreeing with as few pairwise preference labels as possible. Our performance is measured by two parameters: The number of disagreements (loss) and the query complexity (number of pairwise preference labels). Our algorithm adaptively queries at most O(n poly(log n, ε−1 )) preference labels for a regret of ε times the optimal loss. This is strictly better, and often signiﬁcantly better than what non-adaptive sampling could achieve. Our main result helps settle an open problem posed by learning-to-rank (from pairwise information) theoreticians and practitioners: What is a provably correct way to sample preference labels? 1

4 0.60410863 229 nips-2011-Query-Aware MCMC

Author: Michael L. Wick, Andrew McCallum

5 0.56256765 35 nips-2011-An ideal observer model for identifying the reference frame of objects

Author: Joseph L. Austerweil, Abram L. Friesen, Thomas L. Griffiths

6 0.45499742 7 nips-2011-A Machine Learning Approach to Predict Chemical Reactions

7 0.4296869 90 nips-2011-Evaluating the inverse decision-making approach to preference learning

8 0.42946073 122 nips-2011-How Do Humans Teach: On Curriculum Learning and Teaching Dimension

9 0.41489902 303 nips-2011-Video Annotation and Tracking with Active Learning

10 0.39849854 304 nips-2011-Why The Brain Separates Face Recognition From Object Recognition

11 0.39337328 64 nips-2011-Convergent Bounds on the Euclidean Distance

12 0.39065129 126 nips-2011-Im2Text: Describing Images Using 1 Million Captioned Photographs

13 0.37770152 162 nips-2011-Lower Bounds for Passive and Active Learning

14 0.36127552 42 nips-2011-Bayesian Bias Mitigation for Crowdsourcing

15 0.35916331 280 nips-2011-Testing a Bayesian Measure of Representativeness Using a Large Image Database

16 0.35223085 272 nips-2011-Stochastic convex optimization with bandit feedback

17 0.33648336 130 nips-2011-Inductive reasoning about chimeric creatures

18 0.33258519 138 nips-2011-Joint 3D Estimation of Objects and Scene Layout

19 0.32683569 21 nips-2011-Active Learning with a Drifting Distribution

20 0.31897026 193 nips-2011-Object Detection with Grammar Models

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.052), (4, 0.069), (8, 0.087), (20, 0.062), (26, 0.137), (31, 0.075), (33, 0.036), (43, 0.064), (45, 0.12), (57, 0.038), (74, 0.054), (83, 0.05), (99, 0.049)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.92135382 22 nips-2011-Active Ranking using Pairwise Comparisons

Author: Kevin G. Jamieson, Robert Nowak

2 0.89984274 255 nips-2011-Simultaneous Sampling and Multi-Structure Fitting with Adaptive Reversible Jump MCMC

Author: Trung T. Pham, Tat-jun Chin, Jin Yu, David Suter

Abstract: Multi-structure model ﬁtting has traditionally taken a two-stage approach: First, sample a (large) number of model hypotheses, then select the subset of hypotheses that optimise a joint ﬁtting and model selection criterion. This disjoint two-stage approach is arguably suboptimal and inefﬁcient — if the random sampling did not retrieve a good set of hypotheses, the optimised outcome will not represent a good ﬁt. To overcome this weakness we propose a new multi-structure ﬁtting approach based on Reversible Jump MCMC. Instrumental in raising the effectiveness of our method is an adaptive hypothesis generator, whose proposal distribution is learned incrementally and online. We prove that this adaptive proposal satisﬁes the diminishing adaptation property crucial for ensuring ergodicity in MCMC. Our method effectively conducts hypothesis sampling and optimisation simultaneously, and yields superior computational efﬁciency over previous two-stage methods. 1

3 0.89748424 289 nips-2011-Trace Lasso: a trace norm regularization for correlated designs

Author: Edouard Grave, Guillaume R. Obozinski, Francis R. Bach

Abstract: Using the 1 -norm to regularize the estimation of the parameter vector of a linear model leads to an unstable estimator when covariates are highly correlated. In this paper, we introduce a new penalty function which takes into account the correlation of the design matrix to stabilize the estimation. This norm, called the trace Lasso, uses the trace norm of the selected covariates, which is a convex surrogate of their rank, as the criterion of model complexity. We analyze the properties of our norm, describe an optimization algorithm based on reweighted least-squares, and illustrate the behavior of this norm on synthetic data, showing that it is more adapted to strong correlations than competing methods such as the elastic net. 1

4 0.89499468 179 nips-2011-Multilinear Subspace Regression: An Orthogonal Tensor Decomposition Approach

Author: Qibin Zhao, Cesar F. Caiafa, Danilo P. Mandic, Liqing Zhang, Tonio Ball, Andreas Schulze-bonhage, Andrzej S. Cichocki

Abstract: A multilinear subspace regression model based on so called latent variable decomposition is introduced. Unlike standard regression methods which typically employ matrix (2D) data representations followed by vector subspace transformations, the proposed approach uses tensor subspace transformations to model common latent variables across both the independent and dependent data. The proposed approach aims to maximize the correlation between the so derived latent variables and is shown to be suitable for the prediction of multidimensional dependent data from multidimensional independent data, where for the estimation of the latent variables we introduce an algorithm based on Multilinear Singular Value Decomposition (MSVD) on a specially deﬁned cross-covariance tensor. It is next shown that in this way we are also able to unify the existing Partial Least Squares (PLS) and N-way PLS regression algorithms within the same framework. Simulations on benchmark synthetic data conﬁrm the advantages of the proposed approach, in terms of its predictive ability and robustness, especially for small sample sizes. The potential of the proposed technique is further illustrated on a real world task of the decoding of human intracranial electrocorticogram (ECoG) from a simultaneously recorded scalp electroencephalograph (EEG). 1

5 0.87546152 162 nips-2011-Lower Bounds for Passive and Active Learning

Author: Maxim Raginsky, Alexander Rakhlin

Abstract: We develop uniﬁed information-theoretic machinery for deriving lower bounds for passive and active learning schemes. Our bounds involve the so-called Alexander’s capacity function. The supremum of this function has been recently rediscovered by Hanneke in the context of active learning under the name of “disagreement coefﬁcient.” For passive learning, our lower bounds match the upper bounds of Gin´ and Koltchinskii up to constants and generalize analogous results of Mase sart and N´ d´ lec. For active learning, we provide ﬁrst known lower bounds based e e on the capacity function rather than the disagreement coefﬁcient. 1

6 0.86849242 20 nips-2011-Active Learning Ranking from Pairwise Preferences with Almost Optimal Query Complexity

7 0.85179597 29 nips-2011-Algorithms and hardness results for parallel large margin learning

8 0.8505857 231 nips-2011-Randomized Algorithms for Comparison-based Search

9 0.85028434 204 nips-2011-Online Learning: Stochastic, Constrained, and Smoothed Adversaries

10 0.84670484 21 nips-2011-Active Learning with a Drifting Distribution

11 0.84595865 84 nips-2011-EigenNet: A Bayesian hybrid of generative and conditional models for sparse learning

12 0.83097178 17 nips-2011-Accelerated Adaptive Markov Chain for Partition Function Computation

13 0.82589722 159 nips-2011-Learning with the weighted trace-norm under arbitrary sampling distributions

14 0.82528508 80 nips-2011-Efficient Online Learning via Randomized Rounding

15 0.8252123 64 nips-2011-Convergent Bounds on the Euclidean Distance

16 0.82489264 297 nips-2011-Universal low-rank matrix recovery from Pauli measurements

17 0.8241933 303 nips-2011-Video Annotation and Tracking with Active Learning

18 0.82012647 185 nips-2011-Newtron: an Efficient Bandit algorithm for Online Multiclass Prediction

19 0.81900287 210 nips-2011-PAC-Bayesian Analysis of Contextual Bandits

20 0.81494123 25 nips-2011-Adaptive Hedge