nips nips2012 nips2012-165 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Sahand Negahban, Sewoong Oh, Devavrat Shah
Abstract: The question of aggregating pairwise comparisons to obtain a global ranking over a collection of objects has been of interest for a very long time: be it ranking of online gamers (e.g. MSR’s TrueSkill system) and chess players, aggregating social opinions, or deciding which product to sell based on transactions. In most settings, in addition to obtaining ranking, finding ‘scores’ for each object (e.g. player’s rating) is of interest to understanding the intensity of the preferences. In this paper, we propose a novel iterative rank aggregation algorithm for discovering scores for objects from pairwise comparisons. The algorithm has a natural random walk interpretation over the graph of objects with edges present between two objects if they are compared; the scores turn out to be the stationary probability of this random walk. The algorithm is model independent. To establish the efficacy of our method, however, we consider the popular Bradley-Terry-Luce (BTL) model in which each object has an associated score which determines the probabilistic outcomes of pairwise comparisons between objects. We bound the finite sample error rates between the scores assumed by the BTL model and those estimated by our algorithm. This, in essence, leads to order-optimal dependence on the number of samples required to learn the scores well by our algorithm. Indeed, the experimental evaluation shows that our (model independent) algorithm performs as well as the Maximum Likelihood Estimator of the BTL model and outperforms a recently proposed algorithm by Ammar and Shah [1]. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract The question of aggregating pairwise comparisons to obtain a global ranking over a collection of objects has been of interest for a very long time: be it ranking of online gamers (e. [sent-4, score-0.686]
2 MSR’s TrueSkill system) and chess players, aggregating social opinions, or deciding which product to sell based on transactions. [sent-6, score-0.177]
3 In this paper, we propose a novel iterative rank aggregation algorithm for discovering scores for objects from pairwise comparisons. [sent-10, score-0.657]
4 The algorithm has a natural random walk interpretation over the graph of objects with edges present between two objects if they are compared; the scores turn out to be the stationary probability of this random walk. [sent-11, score-1.04]
5 To establish the efficacy of our method, however, we consider the popular Bradley-Terry-Luce (BTL) model in which each object has an associated score which determines the probabilistic outcomes of pairwise comparisons between objects. [sent-13, score-0.45]
6 We bound the finite sample error rates between the scores assumed by the BTL model and those estimated by our algorithm. [sent-14, score-0.21]
7 This, in essence, leads to order-optimal dependence on the number of samples required to learn the scores well by our algorithm. [sent-15, score-0.21]
8 1 Introduction Rank aggregation is an important task in a wide range of learning and social contexts arising in recommendation systems, information retrieval, and sports and competitions. [sent-17, score-0.192]
9 Given n items, we wish to infer relevancy scores or an ordering on the items based on partial orderings provided through many (possibly contradictory) samples. [sent-18, score-0.563]
10 Frequently, the available data that is presented to us is in the form of a comparison: player A defeats player B; book A is purchased when books A and B are displayed (a bigger collection of books implies multiple pairwise comparisons); movie A is liked more compared to movie B. [sent-19, score-0.433]
11 From such partial preferences in the form of comparisons, we frequently wish to deduce not only the order of the underlying objects, but also the scores associated with the objects themselves so as to deduce the intensity of the resulting preference order. [sent-20, score-0.477]
12 For example, the Microsoft TrueSkill engine assigns scores to online gamers based on the outcomes of (pairwise) games between players. [sent-21, score-0.376]
13 Indeed, it assumes that each player has inherent “skill” and the 1 outcomes of the games are used to learn these skill parameters which in turn lead to scores associate with each player. [sent-22, score-0.461]
14 Most rating based systems rely on users to provide explicit numeric scores for their interests. [sent-27, score-0.297]
15 While these assumptions have led to a flurry of theoretical research for item recommendations based on matrix completion [2, 3, 4], it is widely believed that numeric scores provided by individual users are generally inconsistent. [sent-28, score-0.52]
16 For example, if we consider items A, B, and C, one user might prefer A to B, while another prefers B to C, and a third user prefers C to A. [sent-32, score-0.364]
17 In the celebrated work by Arrow [6], existence of a rank aggregation algorithm with reasonable sets of properties (or axioms) was shown to be impossible. [sent-34, score-0.217]
18 In this paper, we are interested in a more restrictive setting: we have outcomes of pairwise comparisons between pairs of items, rather than a complete ordering as considered in [6]. [sent-35, score-0.441]
19 Based on those pairwise comparisons, we want to obtain a ranking of items along with a score for each item indicating the intensity of the preference. [sent-36, score-0.705]
20 One reasonable way to think about our setting is to imagine that there is a distribution over orderings or rankings or permutations of items and every time a pair of items is compared, the outcome is generated as per this underlying distribution. [sent-37, score-0.705]
21 However, their algorithm requires information about comparisons between all pairs, and for each pair it requires the exact pairwise comparison ‘marginal’ with respect to the underlying distribution over permutations. [sent-43, score-0.401]
22 Indeed, in reality, not all pairs of items can typically be compared, and the number of times each pair is compared is also very small. [sent-44, score-0.331]
23 In somewhat related work by Braverman and Mossel [7], the authors present an algorithm that produces an ordering based on O(n log n) pair-wise comparisons on adaptively selected pairs. [sent-46, score-0.338]
24 They assume that there is an underlying true ranking and one observes noisy comparison results. [sent-47, score-0.197]
25 Each time a pair is queried, we are given the true ordering of the pair with probability 1/2 + γ for some γ > 0 which does not depend on the items being compared. [sent-48, score-0.425]
26 One limitation of this model is that it does not capture the fact that in many applications, like chess matches, the outcome of a comparison very much depends on the opponents that are competing. [sent-49, score-0.147]
27 Interestingly enough, the (near-)optimal performance of their adaptive algorithm for winner selection is matched by our non-adaptive (model independent) algorithm for assigning scores to obtain global rankings of all players. [sent-56, score-0.344]
28 In this paper, we provide an iterative algorithm that takes the noisy comparison answers between a subset of all possible pairs of items as input and produces scores for each item 2 as the output. [sent-58, score-0.75]
29 Consider a graph with nodes/vertices corresponding to the items of interest (e. [sent-60, score-0.316]
30 Construct a random walk on this graph where at each time, the random walk is likely to go from vertex i to vertex j if items i and j were ever compared; and if so, the likelihood of going from i to j depends on how often i lost to j. [sent-63, score-0.896]
31 That is, the random walk is more likely to move to a neighbor who has more “wins”. [sent-64, score-0.29]
32 How frequently this walk visits a particular node in the long run, or equivalently the stationary distribution, is the score of the corresponding item. [sent-65, score-0.623]
33 Thus, effectively this algorithm captures preference of the given item versus all of the others, not just immediate neighbors: the global effect induced by transitivity of comparisons is captured through the stationary distribution. [sent-66, score-0.633]
34 Such an interpretation of the stationary distribution of a Markov chain or a random walk has been an effective measure of relative importance of a node in wide class of graph problems, popularly known as the network centrality [13]. [sent-67, score-0.875]
35 The computation of the stationary distribution of the Markov chain boils down to ‘power iteration’ using transition matrix lending to a nice iterative algorithm. [sent-69, score-0.523]
36 It should be noted that Ω(n log n) is a necessary number of (random) comparisons for any algorithm to even produce a consistent ranking (due to connectivity threshold of random bipartite graph). [sent-74, score-0.428]
37 Our analysis boils down to studying the induced stationary distribution of the random walk or Markov chain corresponding to the algorithm. [sent-79, score-0.726]
38 Like most such scenarios, the only hope to obtain meaningful results for such ‘random noisy’ Markov chain is to relate it to stationary distribution of a known Markov chain. [sent-80, score-0.326]
39 We then present our explicit random walk approach to ranking. [sent-97, score-0.29]
40 1 Bradley-Terry-Luce model for comparative judgment In this section we discuss our model of comparisons between various items. [sent-99, score-0.209]
41 As alluded to above, for the purpose of establishing analytic properties of the algorithm, we will assume comparisons are 3 governed by the BTL model of pairwise comparisons. [sent-100, score-0.312]
42 To begin with, there are n items of interest, represented as [n] = {1, . [sent-102, score-0.236]
43 We shall assume that for each item i ∈ [n] that there is an associated weight score wi ∈ R+ (i. [sent-106, score-0.316]
44 Given a pair of items i and j we will let Yij be 1 if j is preferred over i and 0 otherwise during the lth competition for 1 ≤ l ≤ k, where k is the total number of competitions for the pair. [sent-110, score-0.365]
45 Under the BTL model we assume that wj l P(Yij = 1) = . [sent-111, score-0.128]
46 (1) wi + wj l Furthermore, conditioned on the score vector w we assume the the variables Yi,j are independent for all i, j, and l. [sent-112, score-0.302]
47 We further assume that given some item i we will compare item j to i with probability d/n. [sent-113, score-0.284]
48 This model is a natural one to consider because over a population of individuals the comparisons cannot be adaptively selected. [sent-115, score-0.19]
49 A more realistic model might incorporate selecting various items with different distributions: for example, the Netflix dataset demonstrates skews in the sampling distribution for different films [16]. [sent-116, score-0.271]
50 We now discuss our method for computing the scores wi . [sent-118, score-0.317]
51 2 Random walk approach to ranking In our setting, we will assume that aij represents the fraction of times object j has been preferred to object i, for example the fraction of times chess player j has defeated player i. [sent-120, score-0.987]
52 Given the notation k l above we have that aij = (1/k) l=1 Yij . [sent-121, score-0.144]
53 Consider a random walk on a weighted directed graph G = ([n], E, A), where a pair (i, j) ∈ E if and only if the pair has been compared. [sent-122, score-0.478]
54 The weight edges are defined as the outcome of the comparisons Aij = aij /(aij + aji ) and Aji = aji /(aij + aji ). [sent-123, score-0.665]
55 Note that by the Strong Law of Large Numbers, as the number k → ∞ the quantity Aij converges to wj /(wi + wj ) almost surely. [sent-125, score-0.256]
56 A random walk can be represented by a time-independent transition matrix P , where Pij = P(Xt+1 = j|Xt = i). [sent-126, score-0.39]
57 One way to define a valid transition matrix of a random walk on G is to scale all the edge weights by 1/dmax , where we define dmax as the maximum out-degree of a node. [sent-128, score-0.465]
58 More concretely, Pij = 1− 1 dmax Aij 1 k=i Aik dmax if i = j , if i = j . [sent-131, score-0.15]
59 (2) The choice to construct our random walk as above is not arbitrary. [sent-132, score-0.29]
60 In an ideal setting with infinite samples (k → ∞) the transition matrix P would define a reversible Markov chain. [sent-133, score-0.269]
61 Recall that a Markov chain is reversible if it satisfies the detailed balance equation: there exists v ∈ Rn + such that vi Pij = vj Pji for all i, j; and in that case, π ∈ Rn defined as πi = vi /( j vj ) is + ˜ it’s unique stationary distribution. [sent-134, score-0.374]
62 In the ideal setting (say k → ∞), we will have Pij ≡ Pij = (1/dmax )wj /(wi + wj ). [sent-135, score-0.214]
63 That is, the random walk will move from state i to state j with probability equal to the chance that item j is preferred to item i. [sent-136, score-0.649]
64 Therefore, under these ideal conditions it immediately follows that the ˜ vector w/ i wi acts as a valid stationary distribution for the Markov chain defined by P , the ideal matrix. [sent-138, score-0.605]
65 Hence, as long as the graph G is connected and at least one node has a self loop then we are guaranteed that our graph has a unique stationary distribution proportional to w. [sent-139, score-0.458]
66 If the Markov chain is reversible then we may apply the spectral analysis of self-adjoint operators, which is crucial ˜ in the analysis when we repeatedly apply the operator P . [sent-140, score-0.142]
67 ˜ In our setting, the matrix P is a noisy version (due to finite sample error) of the ideal matrix P discussed above. [sent-141, score-0.182]
68 Precisely, let pt (i) = P(Xt = i) denote the distribution of the random walk at time t 4 with p0 = (p0 (i)) ∈ Rn be an arbitrary starting distribution on [n]. [sent-144, score-0.43]
69 Regardless of the starting distribution, when the transition matrix has a unique top eigenvalue, the random walk always converges to a unique distribution: the stationary distribution π = limt→∞ pt . [sent-146, score-0.727]
70 In linear algebra terms, this stationary distribution π is the top left eigenvector of P , which makes computing π a simple eigenvector computation. [sent-147, score-0.351]
71 Formally, we state the algorithm, which assigns numerical scores to each node, which we shall call Rank Centrality: Rank Centrality Input: G = ([n], E, A) Output: rank {π(i)}i∈[n] 1: Compute the transition matrix P according to (2); 2: Compute the stationary distribution π. [sent-148, score-0.666]
72 The stationary distribution of the random walk is a fixed point of the following equation: Aji π(j) π(i) = . [sent-149, score-0.557]
73 Ai j This suggests an alternative intuitive justification: an object receives a high rank if it has been preferred to other high ranking objects or if it has been preferred to many objects. [sent-150, score-0.485]
74 One key question remains: does P have a well defined stationary distribution? [sent-151, score-0.232]
75 As discussed ear˜ lier, when G is connected, the idealized transition matrix P has stationary distribution with desired ˜ properties. [sent-152, score-0.367]
76 But due to noise, P may not be reversible and the arguments of ideal P do not apply to our setting. [sent-153, score-0.169]
77 (3)) induced by P are close enough to those induced by P . [sent-157, score-0.14]
78 Subsequently, we can guarantee that the iterative algorithm will converge to a solution that is close to the ideal stationary distribution. [sent-158, score-0.405]
79 3 Main Results Our main result, Theorem 1, provides an upper bound on estimating the stationary distribution given the observation model presented above. [sent-159, score-0.267]
80 The results demonstrate that even with random sampling we can estimate the underlying score with high probability with good accuracy. [sent-160, score-0.131]
81 The bounds are presented as the rescaled Euclidean norm between our estimate π and the underlying stationary dis˜ tribution P . [sent-161, score-0.266]
82 This error metric provides us with a means to quantify the relative certainty in guessing if one item is preferred over another. [sent-162, score-0.217]
83 Furthermore, producing such scores are ubiquitous [17] as they may also be used to calculate the desired rankings. [sent-163, score-0.21]
84 1 Error bound in stationary distribution recovery via Rank Centrality The theorem below presents our main recovery theorem under the sampling assumptions described above. [sent-166, score-0.267]
85 Assume that, among n items, each pair is chosen with probability d/n and for each chosen pair we collect the outcomes of k comparisons according to the BTL model. [sent-171, score-0.338]
86 Then, there exists positive universal constants C, C , and C such that when d ≥ C(log n)2 , and k d ≥ Cb5 log n, the following bound on the error rate holds with probability at least 1 − C /n3 : where π (i) = wi / ˜ π−π ˜ ≤ C b3 π ˜ w and b ≡ maxi,j wi /wj . [sent-172, score-0.251]
87 Since we are sampling each of the n pairs with probability d/n and then sampling them k times, we obtain O(n log3 n) (with 2 k = Θ(log n)) comparisons in total. [sent-178, score-0.2]
88 Due to classical results on Erdos-Renyi graphs, the induced graph G is connected with high probability only when total number of pairs sampled scales as Ω(n log n)–we need at least those many comparisons. [sent-179, score-0.228]
89 If b were scaling with n, then it would be really easy to differentiate scores of items that are at the two opposite end of the dynamic range; in which case one could focus on differentiating scores of items that have their parameter values near-by. [sent-183, score-0.892]
90 Finally, observe the interesting consequence that under the conditions on d, since the induced distribution π is close to π , it implies connectivity of G. [sent-185, score-0.148]
91 Thus, the analysis of our algorithm provides an ˜ alternative proof of connectivity in an Erdos-Renyi graph (of course, by using heavy machinery! [sent-186, score-0.153]
92 2 Experimental Results Under the BTL model, define an error metric of a estimated ordering σ as the weighted sum of pairs (i, j) whose ordering is incorrect: Dw (σ) = 1 2n w (wi − wj )2 I (wi − wj )(σi − σj ) > 0 2 1/2 , i < 1 the first term in (4) vanishes as t grows. [sent-189, score-0.459]
93 Formally, we have √ πmax /˜min ≤ b , π ≥ 1/ n , and p0 − π ≤ 2 , ˜ π ˜ ˜ by the assumption that maxi,j wi /wj ≤ b and the fact that π (i) = wi /( j wj ). [sent-191, score-0.342]
94 Hence, the error ˜ th t between the distribution at the t iteration p and the true stationary distribution π is dominated by ˜ the second term in equation (4). [sent-192, score-0.302]
95 5 Discussion In this paper, we developed a novel iterative rank aggregation algorithm for discovering scores of objects given pairwise comparisons. [sent-199, score-0.657]
96 The algorithm has a natural random walk interpretation over the graph of objects with edges present between two objects if they are compared; the scores turn out to be the stationary probability of this random walk. [sent-200, score-1.04]
97 In lieu of recent works on network centrality which are graph score functions primarily based on random walks, we call this algorithm Rank Centrality. [sent-201, score-0.325]
98 We have obtained an analytic bound on the finite sample error rates between the scores assumed by the BTL model and those estimated by our algorithm. [sent-204, score-0.274]
99 As shown, these lead to order-optimal dependence on the number of samples required to learn the scores well by our algorithm. [sent-205, score-0.21]
100 Given the simplicity of the algorithm, analytic guarantees and wide utility of the problem of rank aggregation, we believe that this algorithm will be of great practical value. [sent-207, score-0.213]
wordName wordTfidf (topN-words)
[('btl', 0.463), ('walk', 0.26), ('items', 0.236), ('stationary', 0.232), ('scores', 0.21), ('comparisons', 0.159), ('aij', 0.144), ('item', 0.142), ('ranking', 0.129), ('wj', 0.128), ('ammar', 0.124), ('centrality', 0.118), ('pij', 0.116), ('chess', 0.109), ('aji', 0.108), ('wi', 0.107), ('player', 0.102), ('shah', 0.101), ('aggregation', 0.098), ('trueskill', 0.093), ('pairwise', 0.089), ('rank', 0.089), ('ideal', 0.086), ('objects', 0.084), ('reversible', 0.083), ('ordering', 0.081), ('graph', 0.08), ('preferred', 0.075), ('dmax', 0.075), ('outcomes', 0.071), ('induced', 0.07), ('pt', 0.07), ('transition', 0.069), ('score', 0.067), ('analytic', 0.064), ('braverman', 0.062), ('devavrat', 0.062), ('eigentrust', 0.062), ('gamers', 0.062), ('msr', 0.062), ('chain', 0.059), ('yij', 0.058), ('iterative', 0.057), ('completion', 0.057), ('web', 0.057), ('adler', 0.055), ('pair', 0.054), ('kd', 0.052), ('judgment', 0.05), ('rating', 0.048), ('markov', 0.046), ('skill', 0.045), ('connectivity', 0.043), ('players', 0.043), ('designs', 0.043), ('intensity', 0.042), ('eigenvector', 0.042), ('led', 0.041), ('logit', 0.041), ('pairs', 0.041), ('arrow', 0.04), ('books', 0.04), ('boils', 0.04), ('numeric', 0.039), ('industrial', 0.039), ('des', 0.039), ('allerton', 0.039), ('negahban', 0.039), ('outcome', 0.038), ('winner', 0.038), ('indeed', 0.037), ('log', 0.037), ('deduce', 0.037), ('rankings', 0.036), ('www', 0.036), ('orderings', 0.036), ('eecs', 0.035), ('soda', 0.035), ('distribution', 0.035), ('concretely', 0.034), ('aggregating', 0.034), ('social', 0.034), ('noisy', 0.034), ('underlying', 0.034), ('object', 0.033), ('prefers', 0.033), ('games', 0.033), ('frequently', 0.033), ('remarks', 0.032), ('cacy', 0.032), ('matrix', 0.031), ('node', 0.031), ('adaptively', 0.031), ('user', 0.031), ('establish', 0.031), ('wide', 0.03), ('algorithm', 0.03), ('contexts', 0.03), ('movie', 0.03), ('random', 0.03)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000004 165 nips-2012-Iterative ranking from pair-wise comparisons
Author: Sahand Negahban, Sewoong Oh, Devavrat Shah
Abstract: The question of aggregating pairwise comparisons to obtain a global ranking over a collection of objects has been of interest for a very long time: be it ranking of online gamers (e.g. MSR’s TrueSkill system) and chess players, aggregating social opinions, or deciding which product to sell based on transactions. In most settings, in addition to obtaining ranking, finding ‘scores’ for each object (e.g. player’s rating) is of interest to understanding the intensity of the preferences. In this paper, we propose a novel iterative rank aggregation algorithm for discovering scores for objects from pairwise comparisons. The algorithm has a natural random walk interpretation over the graph of objects with edges present between two objects if they are compared; the scores turn out to be the stationary probability of this random walk. The algorithm is model independent. To establish the efficacy of our method, however, we consider the popular Bradley-Terry-Luce (BTL) model in which each object has an associated score which determines the probabilistic outcomes of pairwise comparisons between objects. We bound the finite sample error rates between the scores assumed by the BTL model and those estimated by our algorithm. This, in essence, leads to order-optimal dependence on the number of samples required to learn the scores well by our algorithm. Indeed, the experimental evaluation shows that our (model independent) algorithm performs as well as the Maximum Likelihood Estimator of the BTL model and outperforms a recently proposed algorithm by Ammar and Shah [1]. 1
2 0.18569899 178 nips-2012-Learning Label Trees for Probabilistic Modelling of Implicit Feedback
Author: Andriy Mnih, Yee W. Teh
Abstract: User preferences for items can be inferred from either explicit feedback, such as item ratings, or implicit feedback, such as rental histories. Research in collaborative filtering has concentrated on explicit feedback, resulting in the development of accurate and scalable models. However, since explicit feedback is often difficult to collect it is important to develop effective models that take advantage of the more widely available implicit feedback. We introduce a probabilistic approach to collaborative filtering with implicit feedback based on modelling the user’s item selection process. In the interests of scalability, we restrict our attention to treestructured distributions over items and develop a principled and efficient algorithm for learning item trees from data. We also identify a problem with a widely used protocol for evaluating implicit feedback models and propose a way of addressing it using a small quantity of explicit feedback data. 1
3 0.15750103 111 nips-2012-Efficient Sampling for Bipartite Matching Problems
Author: Maksims Volkovs, Richard S. Zemel
Abstract: Bipartite matching problems characterize many situations, ranging from ranking in information retrieval to correspondence in vision. Exact inference in realworld applications of these problems is intractable, making efficient approximation methods essential for learning and inference. In this paper we propose a novel sequential matching sampler based on a generalization of the PlackettLuce model, which can effectively make large moves in the space of matchings. This allows the sampler to match the difficult target distributions common in these problems: highly multimodal distributions with well separated modes. We present experimental results with bipartite matching problems—ranking and image correspondence—which show that the sequential matching sampler efficiently approximates the target distribution, significantly outperforming other sampling approaches. 1
4 0.14854427 75 nips-2012-Collaborative Ranking With 17 Parameters
Author: Maksims Volkovs, Richard S. Zemel
Abstract: The primary application of collaborate filtering (CF) is to recommend a small set of items to a user, which entails ranking. Most approaches, however, formulate the CF problem as rating prediction, overlooking the ranking perspective. In this work we present a method for collaborative ranking that leverages the strengths of the two main CF approaches, neighborhood- and model-based. Our novel method is highly efficient, with only seventeen parameters to optimize and a single hyperparameter to tune, and beats the state-of-the-art collaborative ranking methods. We also show that parameters learned on datasets from one item domain yield excellent results on a dataset from very different item domain, without any retraining. 1
5 0.13718361 30 nips-2012-Accuracy at the Top
Author: Stephen Boyd, Corinna Cortes, Mehryar Mohri, Ana Radovanovic
Abstract: We introduce a new notion of classification accuracy based on the top ⌧ -quantile values of a scoring function, a relevant criterion in a number of problems arising for search engines. We define an algorithm optimizing a convex surrogate of the corresponding loss, and discuss its solution in terms of a set of convex optimization problems. We also present margin-based guarantees for this algorithm based on the top ⌧ -quantile value of the scores of the functions in the hypothesis set. Finally, we report the results of several experiments in the bipartite setting evaluating the performance of our solution and comparing the results to several other algorithms seeking high precision at the top. In most examples, our solution achieves a better performance in precision at the top. 1
6 0.13129604 60 nips-2012-Bayesian nonparametric models for ranked data
7 0.12335485 346 nips-2012-Topology Constraints in Graphical Models
8 0.11929671 323 nips-2012-Statistical Consistency of Ranking Methods in A Rank-Differentiable Probability Space
9 0.10129955 64 nips-2012-Calibrated Elastic Regularization in Matrix Completion
10 0.099233314 49 nips-2012-Automatic Feature Induction for Stagewise Collaborative Filtering
11 0.09485428 69 nips-2012-Clustering Sparse Graphs
12 0.094655097 278 nips-2012-Probabilistic n-Choose-k Models for Classification and Ranking
13 0.09416797 141 nips-2012-GenDeR: A Generic Diversified Ranking Algorithm
14 0.091061004 286 nips-2012-Random Utility Theory for Social Choice
15 0.089868814 169 nips-2012-Label Ranking with Partial Abstention based on Thresholded Probabilistic Models
16 0.086621888 74 nips-2012-Collaborative Gaussian Processes for Preference Learning
17 0.086413719 203 nips-2012-Locating Changes in Highly Dependent Data with Unknown Number of Change Points
18 0.085561492 196 nips-2012-Learning with Partially Absorbing Random Walks
19 0.084728517 59 nips-2012-Bayesian nonparametric models for bipartite graphs
20 0.082578346 107 nips-2012-Effective Split-Merge Monte Carlo Methods for Nonparametric Models of Sequential Data
topicId topicWeight
[(0, 0.215), (1, 0.016), (2, 0.033), (3, -0.06), (4, -0.043), (5, -0.097), (6, -0.025), (7, 0.102), (8, 0.017), (9, 0.281), (10, -0.123), (11, -0.004), (12, -0.175), (13, -0.024), (14, 0.079), (15, 0.041), (16, 0.048), (17, 0.014), (18, 0.037), (19, -0.003), (20, 0.019), (21, -0.066), (22, -0.019), (23, -0.067), (24, -0.02), (25, 0.058), (26, 0.044), (27, 0.107), (28, -0.018), (29, -0.024), (30, 0.012), (31, 0.011), (32, -0.037), (33, -0.014), (34, 0.035), (35, -0.028), (36, 0.042), (37, 0.061), (38, 0.019), (39, 0.033), (40, 0.043), (41, 0.013), (42, -0.127), (43, -0.075), (44, -0.052), (45, -0.021), (46, 0.014), (47, -0.017), (48, -0.024), (49, -0.063)]
simIndex simValue paperId paperTitle
same-paper 1 0.9292416 165 nips-2012-Iterative ranking from pair-wise comparisons
Author: Sahand Negahban, Sewoong Oh, Devavrat Shah
Abstract: The question of aggregating pairwise comparisons to obtain a global ranking over a collection of objects has been of interest for a very long time: be it ranking of online gamers (e.g. MSR’s TrueSkill system) and chess players, aggregating social opinions, or deciding which product to sell based on transactions. In most settings, in addition to obtaining ranking, finding ‘scores’ for each object (e.g. player’s rating) is of interest to understanding the intensity of the preferences. In this paper, we propose a novel iterative rank aggregation algorithm for discovering scores for objects from pairwise comparisons. The algorithm has a natural random walk interpretation over the graph of objects with edges present between two objects if they are compared; the scores turn out to be the stationary probability of this random walk. The algorithm is model independent. To establish the efficacy of our method, however, we consider the popular Bradley-Terry-Luce (BTL) model in which each object has an associated score which determines the probabilistic outcomes of pairwise comparisons between objects. We bound the finite sample error rates between the scores assumed by the BTL model and those estimated by our algorithm. This, in essence, leads to order-optimal dependence on the number of samples required to learn the scores well by our algorithm. Indeed, the experimental evaluation shows that our (model independent) algorithm performs as well as the Maximum Likelihood Estimator of the BTL model and outperforms a recently proposed algorithm by Ammar and Shah [1]. 1
2 0.75182289 75 nips-2012-Collaborative Ranking With 17 Parameters
Author: Maksims Volkovs, Richard S. Zemel
Abstract: The primary application of collaborate filtering (CF) is to recommend a small set of items to a user, which entails ranking. Most approaches, however, formulate the CF problem as rating prediction, overlooking the ranking perspective. In this work we present a method for collaborative ranking that leverages the strengths of the two main CF approaches, neighborhood- and model-based. Our novel method is highly efficient, with only seventeen parameters to optimize and a single hyperparameter to tune, and beats the state-of-the-art collaborative ranking methods. We also show that parameters learned on datasets from one item domain yield excellent results on a dataset from very different item domain, without any retraining. 1
3 0.71395707 30 nips-2012-Accuracy at the Top
Author: Stephen Boyd, Corinna Cortes, Mehryar Mohri, Ana Radovanovic
Abstract: We introduce a new notion of classification accuracy based on the top ⌧ -quantile values of a scoring function, a relevant criterion in a number of problems arising for search engines. We define an algorithm optimizing a convex surrogate of the corresponding loss, and discuss its solution in terms of a set of convex optimization problems. We also present margin-based guarantees for this algorithm based on the top ⌧ -quantile value of the scores of the functions in the hypothesis set. Finally, we report the results of several experiments in the bipartite setting evaluating the performance of our solution and comparing the results to several other algorithms seeking high precision at the top. In most examples, our solution achieves a better performance in precision at the top. 1
4 0.70176733 111 nips-2012-Efficient Sampling for Bipartite Matching Problems
Author: Maksims Volkovs, Richard S. Zemel
Abstract: Bipartite matching problems characterize many situations, ranging from ranking in information retrieval to correspondence in vision. Exact inference in realworld applications of these problems is intractable, making efficient approximation methods essential for learning and inference. In this paper we propose a novel sequential matching sampler based on a generalization of the PlackettLuce model, which can effectively make large moves in the space of matchings. This allows the sampler to match the difficult target distributions common in these problems: highly multimodal distributions with well separated modes. We present experimental results with bipartite matching problems—ranking and image correspondence—which show that the sequential matching sampler efficiently approximates the target distribution, significantly outperforming other sampling approaches. 1
5 0.69839239 178 nips-2012-Learning Label Trees for Probabilistic Modelling of Implicit Feedback
Author: Andriy Mnih, Yee W. Teh
Abstract: User preferences for items can be inferred from either explicit feedback, such as item ratings, or implicit feedback, such as rental histories. Research in collaborative filtering has concentrated on explicit feedback, resulting in the development of accurate and scalable models. However, since explicit feedback is often difficult to collect it is important to develop effective models that take advantage of the more widely available implicit feedback. We introduce a probabilistic approach to collaborative filtering with implicit feedback based on modelling the user’s item selection process. In the interests of scalability, we restrict our attention to treestructured distributions over items and develop a principled and efficient algorithm for learning item trees from data. We also identify a problem with a widely used protocol for evaluating implicit feedback models and propose a way of addressing it using a small quantity of explicit feedback data. 1
6 0.63300788 49 nips-2012-Automatic Feature Induction for Stagewise Collaborative Filtering
7 0.61197734 196 nips-2012-Learning with Partially Absorbing Random Walks
8 0.59720588 141 nips-2012-GenDeR: A Generic Diversified Ranking Algorithm
9 0.58143342 155 nips-2012-Human memory search as a random walk in a semantic network
10 0.53655756 323 nips-2012-Statistical Consistency of Ranking Methods in A Rank-Differentiable Probability Space
11 0.53202379 189 nips-2012-Learning from the Wisdom of Crowds by Minimax Entropy
12 0.49938262 194 nips-2012-Learning to Discover Social Circles in Ego Networks
13 0.48761871 60 nips-2012-Bayesian nonparametric models for ranked data
14 0.48572442 286 nips-2012-Random Utility Theory for Social Choice
15 0.47379631 169 nips-2012-Label Ranking with Partial Abstention based on Thresholded Probabilistic Models
16 0.47221091 223 nips-2012-Multi-criteria Anomaly Detection using Pareto Depth Analysis
17 0.45763591 288 nips-2012-Rational inference of relative preferences
18 0.43967596 346 nips-2012-Topology Constraints in Graphical Models
19 0.43874478 107 nips-2012-Effective Split-Merge Monte Carlo Methods for Nonparametric Models of Sequential Data
20 0.43760729 320 nips-2012-Spectral Learning of General Weighted Automata via Constrained Matrix Completion
topicId topicWeight
[(0, 0.012), (21, 0.015), (38, 0.596), (42, 0.017), (54, 0.012), (55, 0.02), (74, 0.042), (76, 0.124), (80, 0.059), (92, 0.025)]
simIndex simValue paperId paperTitle
Author: Zhuo Wang, Alan Stocker, Daniel Lee
Abstract: In this work we study how the stimulus distribution influences the optimal coding of an individual neuron. Closed-form solutions to the optimal sigmoidal tuning curve are provided for a neuron obeying Poisson statistics under a given stimulus distribution. We consider a variety of optimality criteria, including maximizing discriminability, maximizing mutual information and minimizing estimation error under a general Lp norm. We generalize the Cramer-Rao lower bound and show how the Lp loss can be written as a functional of the Fisher Information in the asymptotic limit, by proving the moment convergence of certain functions of Poisson random variables. In this manner, we show how the optimal tuning curve depends upon the loss function, and the equivalence of maximizing mutual information with minimizing Lp loss in the limit as p goes to zero. 1
2 0.98569787 29 nips-2012-Accelerated Training for Matrix-norm Regularization: A Boosting Approach
Author: Xinhua Zhang, Dale Schuurmans, Yao-liang Yu
Abstract: Sparse learning models typically combine a smooth loss with a nonsmooth penalty, such as trace norm. Although recent developments in sparse approximation have offered promising solution methods, current approaches either apply only to matrix-norm constrained problems or provide suboptimal convergence rates. In this paper, we propose a boosting method for regularized learning that guarantees accuracy within O(1/ ) iterations. Performance is further accelerated by interlacing boosting with fixed-rank local optimization—exploiting a simpler local objective than previous work. The proposed method yields state-of-the-art performance on large-scale problems. We also demonstrate an application to latent multiview learning for which we provide the first efficient weak-oracle. 1
3 0.98514849 84 nips-2012-Convergence Rate Analysis of MAP Coordinate Minimization Algorithms
Author: Ofer Meshi, Amir Globerson, Tommi S. Jaakkola
Abstract: Finding maximum a posteriori (MAP) assignments in graphical models is an important task in many applications. Since the problem is generally hard, linear programming (LP) relaxations are often used. Solving these relaxations efficiently is thus an important practical problem. In recent years, several authors have proposed message passing updates corresponding to coordinate descent in the dual LP. However, these are generally not guaranteed to converge to a global optimum. One approach to remedy this is to smooth the LP, and perform coordinate descent on the smoothed dual. However, little is known about the convergence rate of this procedure. Here we perform a thorough rate analysis of such schemes and derive primal and dual convergence rates. We also provide a simple dual to primal mapping that yields feasible primal solutions with a guaranteed rate of convergence. Empirical evaluation supports our theoretical claims and shows that the method is highly competitive with state of the art approaches that yield global optima. 1
same-paper 4 0.98035854 165 nips-2012-Iterative ranking from pair-wise comparisons
Author: Sahand Negahban, Sewoong Oh, Devavrat Shah
Abstract: The question of aggregating pairwise comparisons to obtain a global ranking over a collection of objects has been of interest for a very long time: be it ranking of online gamers (e.g. MSR’s TrueSkill system) and chess players, aggregating social opinions, or deciding which product to sell based on transactions. In most settings, in addition to obtaining ranking, finding ‘scores’ for each object (e.g. player’s rating) is of interest to understanding the intensity of the preferences. In this paper, we propose a novel iterative rank aggregation algorithm for discovering scores for objects from pairwise comparisons. The algorithm has a natural random walk interpretation over the graph of objects with edges present between two objects if they are compared; the scores turn out to be the stationary probability of this random walk. The algorithm is model independent. To establish the efficacy of our method, however, we consider the popular Bradley-Terry-Luce (BTL) model in which each object has an associated score which determines the probabilistic outcomes of pairwise comparisons between objects. We bound the finite sample error rates between the scores assumed by the BTL model and those estimated by our algorithm. This, in essence, leads to order-optimal dependence on the number of samples required to learn the scores well by our algorithm. Indeed, the experimental evaluation shows that our (model independent) algorithm performs as well as the Maximum Likelihood Estimator of the BTL model and outperforms a recently proposed algorithm by Ammar and Shah [1]. 1
5 0.95849895 181 nips-2012-Learning Multiple Tasks using Shared Hypotheses
Author: Koby Crammer, Yishay Mansour
Abstract: In this work we consider a setting where we have a very large number of related tasks with few examples from each individual task. Rather than either learning each task individually (and having a large generalization error) or learning all the tasks together using a single hypothesis (and suffering a potentially large inherent error), we consider learning a small pool of shared hypotheses. Each task is then mapped to a single hypothesis in the pool (hard association). We derive VC dimension generalization bounds for our model, based on the number of tasks, shared hypothesis and the VC dimension of the hypotheses class. We conducted experiments with both synthetic problems and sentiment of reviews, which strongly support our approach. 1
6 0.95645314 268 nips-2012-Perfect Dimensionality Recovery by Variational Bayesian PCA
7 0.95526463 278 nips-2012-Probabilistic n-Choose-k Models for Classification and Ranking
8 0.94504583 240 nips-2012-Newton-Like Methods for Sparse Inverse Covariance Estimation
9 0.93706495 208 nips-2012-Matrix reconstruction with the local max norm
10 0.93526465 143 nips-2012-Globally Convergent Dual MAP LP Relaxation Solvers using Fenchel-Young Margins
11 0.89424777 285 nips-2012-Query Complexity of Derivative-Free Optimization
12 0.88472897 86 nips-2012-Convex Multi-view Subspace Learning
13 0.88028133 241 nips-2012-No-Regret Algorithms for Unconstrained Online Convex Optimization
14 0.87513989 15 nips-2012-A Polylog Pivot Steps Simplex Algorithm for Classification
15 0.87178594 17 nips-2012-A Scalable CUR Matrix Decomposition Algorithm: Lower Time Complexity and Tighter Bound
16 0.87102354 30 nips-2012-Accuracy at the Top
17 0.8635428 324 nips-2012-Stochastic Gradient Descent with Only One Projection
18 0.86308092 187 nips-2012-Learning curves for multi-task Gaussian process regression
19 0.86173534 178 nips-2012-Learning Label Trees for Probabilistic Modelling of Implicit Feedback
20 0.85931396 114 nips-2012-Efficient coding provides a direct link between prior and likelihood in perceptual Bayesian inference