nips nips2012 nips2012-169 knowledge-graph by maker-knowledge-mining

169 nips-2012-Label Ranking with Partial Abstention based on Thresholded Probabilistic Models


Source: pdf

Author: Weiwei Cheng, Willem Waegeman, Volkmar Welker, Eyke Hüllermeier

Abstract: Several machine learning methods allow for abstaining from uncertain predictions. While being common for settings like conventional classification, abstention has been studied much less in learning to rank. We address abstention for the label ranking setting, allowing the learner to declare certain pairs of labels as being incomparable and, thus, to predict partial instead of total orders. In our method, such predictions are produced via thresholding the probabilities of pairwise preferences between labels, as induced by a predicted probability distribution on the set of all rankings. We formally analyze this approach for the Mallows and the Plackett-Luce model, showing that it produces proper partial orders as predictions and characterizing the expressiveness of the induced class of partial orders. These theoretical results are complemented by experiments demonstrating the practical usefulness of the approach. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 We address abstention for the label ranking setting, allowing the learner to declare certain pairs of labels as being incomparable and, thus, to predict partial instead of total orders. [sent-11, score-0.938]

2 In our method, such predictions are produced via thresholding the probabilities of pairwise preferences between labels, as induced by a predicted probability distribution on the set of all rankings. [sent-12, score-0.349]

3 We formally analyze this approach for the Mallows and the Plackett-Luce model, showing that it produces proper partial orders as predictions and characterizing the expressiveness of the induced class of partial orders. [sent-13, score-0.762]

4 While a conventional classifier has only two choices, namely to predict a class or to abstain, a ranker can abstain to some degree: The order relation predicted can be more or less complete, ranging from a total order to the empty relation in which all alternatives are declared incomparable. [sent-18, score-0.715]

5 Our focus is on so-called label ranking problems [16, 10], to be introduced more formally in Section 2 below. [sent-19, score-0.379]

6 Label ranking has a strong relationship with the standard setting of multi-class classification, but each instance is now associated with a complete ranking of all labels instead of a single label. [sent-20, score-0.559]

7 Typical examples, which also highlight the need for abstention, include the ranking of candidates for a given job and the ranking of products for a given customer. [sent-21, score-0.522]

8 Thus, if a ranker cannot reliably decide whether a first label should precede a second one or the other way around, it should abstain from this decision and instead declare these alternatives as being incomparable. [sent-23, score-0.442]

9 Abstaining in a consistent way, the relation thus produced should form a partial order [6]. [sent-24, score-0.453]

10 1 In Section 4, we propose and analyze a new approach for abstention in label ranking that builds on existing work on partial orders in areas like decision theory, probability theory and discrete mathematics. [sent-25, score-0.956]

11 We predict partial orders by thresholding parameterized probability distributions on rankings, using the Plackett-Luce and the Mallows model. [sent-26, score-0.538]

12 Roughly speaking, this approach is able to avoid certain inconsistencies of a previous approach to label ranking with abstention [6], to be discussed in Section 3. [sent-27, score-0.536]

13 By making stronger model assumptions, our approach simplifies the construction of consistent partial order relations. [sent-28, score-0.268]

14 Apart from assuring proper partial orders as predictions, it allows for an exact characterization of the expressivity of a class of thresholded probability distributions in terms of the number of partial orders that can be produced. [sent-30, score-1.056]

15 2 Label Ranking with Abstention In the setting of label ranking, each instance x from an instance space X is associated with a total order of a fixed set of class labels Y = {y1 , . [sent-33, score-0.188]

16 , yM }, that is, a complete, transitive, and antisymmetric relation on Y, where yi yj indicates that yi precedes yj in the order. [sent-36, score-1.402]

17 Since a ranking can be considered as a special type of preference relation, we shall also say that yi yj indicates a preference for yi over yj (given the instance x). [sent-37, score-1.595]

18 Moreover, we identify with the mapping (relation) R : Y 2 −→ {0, 1} such that R(i, j) = 1 if yi yj and R(i, j) = 0 otherwise. [sent-43, score-0.521]

19 The goal in label ranking is to learn a “label ranker” in the form of an X −→ Ω mapping. [sent-44, score-0.379]

20 As training data, a label ranker uses a set of instances xn (n ∈ [N ]), together with preference information in the form of pairwise comparisons yi yj of some (but not necessarily all) labels in Y, suggesting that instance xn prefers label yi to yj . [sent-45, score-1.694]

21 The prediction accuracy of a label ranker is assessed by comparing the true ranking π with the prediction π , using a distance measure D on rankings. [sent-46, score-0.573]

22 [6] introduced a variant of the above setting in which the label ranker is allowed to partially abstain from a prediction. [sent-50, score-0.354]

23 More specifically, it is allowed to make predictions in the form of a partial order Q instead of a total order R: If Q(i, j) = Q(j, i) = 0, the ranker abstains on the label pair (yi , yj ) and instead declares these labels as being incomparable. [sent-51, score-1.021]

24 Abstaining in a consistent way, Q should still be antisymmetric and transitive, hence a partial order relation. [sent-52, score-0.32]

25 , a subset of Ω supposed to cover the true ranking π, namely the set of all linear extensions of this partial order: C(Q) = π ∈ Ω | Q(i, j) = 1 ⇒ (π(i) < π(j)) for all i, j ∈ [M ] . [sent-55, score-0.525]

26 3 Previous Work Despite a considerable amount of work on ranking in general and learning to rank in particular, the literature on partial rankings is relatively sparse. [sent-56, score-0.675]

27 Worth mentioning is work on a specific type of partial orders, namely linear orders of unsorted or tied subsets (partitions, bucket orders) [13, 17]. [sent-57, score-0.494]

28 2 More concretely, in the context of the label ranking problem, the aforementioned work [6] is the only one so far that addresses the more general problem of learning to predict partial orders. [sent-60, score-0.642]

29 This method consists of two main steps and can be considered as a pairwise approach in the sense that, as a point of departure for a prediction, a valued preference relation P : Y 2 −→ [0, 1] is produced, where P (i, j) is interpreted as a measure of support of the pairwise preference yi yj . [sent-61, score-1.118]

30 Then, in a second step, a partial order Q is derived from P via thresholding: Q(i, j) = P (i, j) > q = 1, 0, if P (i, j) > q otherwise , (1) where 1/2 ≤ q < 1 is a threshold. [sent-63, score-0.268]

31 Thus, the idea is to predict only those pairwise preferences that are sufficiently likely, while abstaining on pairs (i, j) for which P (i, j) ≈ 1/2. [sent-64, score-0.238]

32 The first step of deriving the relation P is realized by means of an ensemble learning technique: Training an ensemble of standard label rankers, each of which provides a prediction in the form of a total order, P (i, j) is defined by the fraction of ensemble members voting for yi yj . [sent-65, score-0.79]

33 For the relation Q derived from P via thresholding, this has two important consequences: First, if the threshold q is not large enough, then Q may have cycles. [sent-69, score-0.233]

34 To overcome these problems, the authors devise an algorithm (of complexity O(|Y|3 )) that finds the smallest feasible threshold qmin , namely the threshold that guarantees Q(i, j) = P (i, j) > q to be cycle-free for each threshold q ∈ [qmin , 1). [sent-73, score-0.31]

35 To this end, we take advantage of methods for label ranking that produce (parameterized) probability distributions over Ω as predictions. [sent-76, score-0.379]

36 Our main theoretical result is to show that thresholding pairwise preferences induced by such distributions, apart from being semantically meaningful due to their clear probabilistic interpretation, yields preference relations with the desired properties, that is, partial order relations Q. [sent-77, score-0.753]

37 A label ranking method that produces predictions expressed in terms of the Mallows model is proposed in [5]. [sent-81, score-0.414]

38 The standard Mallows model P(π | θ, π0 ) = exp(−θD(π, π0 )) φ(θ) (2) is determined by two parameters: The ranking π0 ∈ Ω is the location parameter (mode, center ranking) and θ ≥ 0 is a spread parameter. [sent-82, score-0.291]

39 Obviously, the Mallows model assigns the maximum probability to the center ranking π0 . [sent-84, score-0.261]

40 +v Obviously, the larger va in comparison to vb , the higher the probability that a is chosen. [sent-98, score-0.234]

41 We start by stating a necessary and sufficient condition on P (i, j) for the threshold relation (1) to result in a (strict) partial order, i. [sent-104, score-0.468]

42 Let P be a reciprocal relation and let Q be given by (1). [sent-108, score-0.186]

43 Then Q defines a strict partial order relation for all q ∈ [1/2, 1) if and only if P satisfies partial stochastic transitivity, i. [sent-109, score-0.733]

44 This lemma was first proven by Fishburn [12], together with a number of other characterizations of subclasses of strict partial orders by means of transitivity properties on P (i, j). [sent-112, score-0.651]

45 Our main theoretical results below state that thresholding (4) yields a strict partial order relation Q, both for the PL and the Mallows model. [sent-114, score-0.588]

46 Thus, we can guarantee that a strict partial order relation can be predicted by simple thresholding, and without the need for any further reparation. [sent-115, score-0.526]

47 Moreover, let Q be given by the threshold relation (1). [sent-119, score-0.233]

48 Then Q defines a strict partial order relation for all q ∈ [1/2, 1). [sent-120, score-0.498]

49 Moreover, let Q be given by the threshold relation (1). [sent-123, score-0.233]

50 Then Q defines a strict partial order relation for all q ∈ [1/2, 1). [sent-124, score-0.498]

51 A distance D on Ω is said to have the transposition property, if the following holds: Let π and π be rankings and let (i, j) be an inversion (i. [sent-133, score-0.262]

52 Let π ∈ Ω be constructed from π by swapping yi and yj , that is, π (i) = π (j), π (j) = π (i) and π (m) = π (m) for all m ∈ [M ] \ {i, j}. [sent-136, score-0.564]

53 If yi precedes yj in the center ranking π0 in (2), then P(yi yj ) ≥ 1/2. [sent-139, score-1.281]

54 Moreover, if P(yi yj ) > q ≥ 1/2, then yi precedes yj in the center ranking π0 . [sent-140, score-1.281]

55 For every ranking π ∈ Ω, let b(π) = π if yi precedes yj in π; otherwise, b(π) is defined by swapping yi and yj in π. [sent-142, score-1.503]

56 Therefore, according to the Mallows model, P(b(π)) ≥ P(π), and hence P(yi P(b−1 (π)) = P(π) ≥ yj ) = π∈E(i,j) π∈E(i,j) P(π) = P(yj yi ) π∈E(j,i) Since, moreover, P(yi yj ) = 1 − P(yj yi ), it follows that P(yi yj ) ≥ 1/2. [sent-145, score-1.384]

57 The second part immediately follows from the first one by way of contradiction: If yj would precede yi , then P(yj yi ) ≥ 1/2, and therefore P(yi yj ) = 1 − P(yj yi ) ≤ 1/2 ≤ q. [sent-146, score-1.256]

58 If yi precedes yj and yj precedes yk in the center ranking π0 in (2), then P(yi max (P(yi yj ), P(yj yk )). [sent-148, score-1.974]

59 The second inequality P(yi yk ) ≥ P(yj yk ) is shown analogously. [sent-151, score-0.194]

60 Let E(i, j, k) denote the set of linear extensions of yi yj yk , i. [sent-152, score-0.618]

61 , the set of rankings π ∈ Ω in which yi precedes yj and yj precedes yk . [sent-154, score-1.411]

62 Now, for every π ∈ E(k, j, i), define b(π) by first swapping yk and yj and then yk and yi in π. [sent-155, score-0.758]

63 Consequently, since E(i, j) = E(i, j, k) ∪ E(i, k, j) ∪ E(k, i, j) and E(i, k) = E(i, k, j) ∪ E(i, j, k) ∪ E(j, i, k), it follows that P(yi yk ) − P(yi yj ) = π∈E(i,k)\E(i,j) P(π) = π∈E(j,i,k) P(π) − P(π) = π∈E(k,j,i) P(b(π)) − P(π) ≥ 0. [sent-158, score-0.439]

64 The relation P derived via P (i, j) = P(yi yj ) from the Mallows model satisfies the following property (closely related to strong stochastic transitivity): If (P (i, j) > q and P (j, k) > q, then P (i, k) ≥ max(P (i, j), P (j, k)), for all q ≥ 1/2 and all i, j, k ∈ [M ]. [sent-161, score-0.493]

65 Since P(yi yj ) = 1 − P(yj yi ), it obviously follows that Q(yi , yj ) = 1 implies Q(yj , yi ) = 0. [sent-163, score-1.096]

66 The above statements guarantee that a strict partial order relation can be predicted by simple thresholding, and without the need for any further reparation. [sent-166, score-0.526]

67 As an aside, we mention that strict partial orders can also be produced by thresholding other probabilistic preference learning models. [sent-168, score-0.798]

68 All pairwise preference models based on utility scores satisfy strong stochastic transitivity. [sent-169, score-0.223]

69 These models are usually not applied in label ranking settings, however. [sent-171, score-0.379]

70 3 Expressivity of the Model Classes So far, we have shown that predictions produced by thresholding probability distributions on rankings are proper partial orders. [sent-173, score-0.564]

71 This expressivity is naturally defined in terms of the number of different partial orders (up to 5 isomorphism) that can be represented in the form of a threshold relation (1). [sent-176, score-0.757]

72 Let QM denote the set of different partial orders (up to isomorphism) that can be represented as a threshold relation Q defined by (1), where P is derived according to (4) from the Mallows model (2) with D the Kendall distance. [sent-179, score-0.653]

73 Thus, by Theorem 2, thresholding leads to M different partial orders. [sent-190, score-0.325]

74 More specifically, the partial orders in QM have a very simple structure that is purely rankdependent: The first structure is the total order π = π0 . [sent-191, score-0.453]

75 , labels yi and yj with adjacent ranks (|π(i) − π(j)| = 1). [sent-194, score-0.558]

76 , labels yi and yj with (|π(i) − π(j)| = 2), and so forth. [sent-197, score-0.558]

77 It is a simple consequence of M 2 Theorem 4 below, showing that exponentially more orders can be produced based on the PL model. [sent-204, score-0.219]

78 For fixed q ∈ (1/2, 1) and a set A of subsets of [M ], the following are equivalent: (i) The set A is the set of maximal antichains of a partial order induced by (4) on [M ] for some v1 > · · · > vM > 0. [sent-206, score-0.398]

79 +v From va ≥ vc > vd ≥ vb it follows that 1 vc va 1 = . [sent-218, score-0.572]

80 vb ≥ vd = va + vb 1 + va 1 + vc vc + vd Thus, it suffices to show that there are real numbers v1 > · · · > vn > 0 such that vava b ≤ q for any +v {a, a + 1, . [sent-219, score-0.705]

81 Then, by the considerations above, we need to choose va numbers vM > va > va+1 > · · · > vM > 0 such that va +vM ≤ q and v vM a > q. [sent-234, score-0.561]

82 Since M is contained in at least one set from A va−1 va and since this set is not contained in {a, a + 1, . [sent-238, score-0.187]

83 Now choose vM +1 > vM +2 > · · · > vM > 0 such that qvM > qvM +1 > qvM > va (1 − q). [sent-243, score-0.187]

84 Let QP L denote the set of different partial orders (up to isomorphism) that can be represented as a threshold relation Q defined by (1), where P is derived according to (4) from the PL model (3). [sent-247, score-0.653]

85 It is immediate that each path uniquely determines its partial order. [sent-254, score-0.279]

86 In order to verify that any Dyck path is induced by a suitable choice of parameters, one establishes a bijection between Dyck paths from (1, 0) to (M +1, M ) and maximal sets of mutually incomparable intervals (in the natural order) in [M ]. [sent-256, score-0.385]

87 It is a simple yet tedious task to check that assigning to a Dyck path the set of intervals associated to its peaks is indeed a bijection to the set of maximal sets of mutually incomparable intervals in [M ]. [sent-264, score-0.38]

88 Again, it is easy to verify that the set of intervals associated to a Dyck path is the set of maximal antichains of the partial order determined by the Dyck path. [sent-265, score-0.47]

89 Again, using Lemma 5, one checks that (5) implies that partial orders induced by (4) in the PL model have a unique labeling up to poset automorphism. [sent-267, score-0.488]

90 We note that, from the above proof, it follows that the partial orders in QP L are the so called semiorders. [sent-269, score-0.42]

91 Finally, given that we have been able to derive explicit (combinatorial) expressions for |QM | and |QP L |, it might be interesting to note that, somewhat surprisingly at first sight, no such expression exists for the total number of partial orders on M elements. [sent-277, score-0.42]

92 Our evaluation is done by measuring the tradeoff between correctness and completeness achieved by varying the threshold q in (1). [sent-280, score-0.279]

93 Obviously, the two criteria are conflicting: increasing completeness typically comes along with reducing correctness and vice versa, at least if the learner is effective in the sense of abstaining from those decisions that are indeed most uncertain. [sent-282, score-0.256]

94 The figure on the right corresponds to the original data set with rankings of size 10, while and the figure on the left shows results for rankings of size 6. [sent-286, score-0.274]

95 We compare the original method of [6] with our new proposal, calling the former direct, because the pairwise preference degrees on which we threshold are estimated directly, and the latter derived, because these degrees are derived from probability distributions on Ω. [sent-287, score-0.305]

96 The main conclusion that can be drawn from these results is that, as expected, our probabilistic approach does indeed achieve a better tradeoff between completeness and correctness than the original one, especially in the sense that it spans a wider range of values for the former. [sent-294, score-0.226]

97 6 Summary and Conclusions The idea of producing predictions in the form of a partial order by thresholding a (valued) pairwise preference relation is meaningful in the sense that a learner abstains on the most unreliable comparisons. [sent-303, score-0.839]

98 While this idea has first been realized in [6] in an ad-hoc manner, we put it on a firm mathematical grounding that guarantees consistency and, via variation of the threshold, allows for exploiting the whole spectrum between a complete ranking and an empty relation. [sent-304, score-0.261]

99 Label ranking based on the Plackett-Luce n u model. [sent-336, score-0.261]

100 A threshold for majority in the context of aggregating partial order relations. [sent-464, score-0.35]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('yj', 0.342), ('mallows', 0.301), ('ranking', 0.261), ('partial', 0.235), ('va', 0.187), ('orders', 0.185), ('pl', 0.184), ('yi', 0.179), ('dyck', 0.177), ('abstention', 0.157), ('precedes', 0.157), ('ranker', 0.156), ('relation', 0.151), ('preference', 0.146), ('qm', 0.14), ('rankings', 0.137), ('vm', 0.126), ('transitivity', 0.12), ('marburg', 0.118), ('label', 0.118), ('completeness', 0.105), ('expressivity', 0.104), ('yk', 0.097), ('thresholding', 0.09), ('abstaining', 0.087), ('transposition', 0.087), ('threshold', 0.082), ('abstain', 0.08), ('isomorphism', 0.08), ('strict', 0.079), ('qvm', 0.079), ('pairwise', 0.077), ('incomparable', 0.075), ('qp', 0.068), ('intervals', 0.067), ('correctness', 0.064), ('cheng', 0.064), ('transitive', 0.06), ('kendall', 0.06), ('antichains', 0.059), ('obviously', 0.054), ('bijection', 0.053), ('vc', 0.052), ('antisymmetric', 0.052), ('spearman', 0.051), ('sushi', 0.048), ('germany', 0.048), ('vd', 0.047), ('vb', 0.047), ('reject', 0.046), ('preferences', 0.046), ('canada', 0.045), ('bucket', 0.045), ('interval', 0.045), ('path', 0.044), ('swapping', 0.043), ('rank', 0.042), ('mutually', 0.042), ('thresholded', 0.04), ('unreliable', 0.04), ('rho', 0.04), ('assuring', 0.039), ('rademaker', 0.039), ('vava', 0.039), ('vcvc', 0.039), ('welker', 0.039), ('induced', 0.039), ('vancouver', 0.038), ('distance', 0.038), ('moreover', 0.038), ('labels', 0.037), ('option', 0.037), ('reciprocal', 0.035), ('qmin', 0.035), ('catalan', 0.035), ('precede', 0.035), ('predictions', 0.035), ('produced', 0.034), ('order', 0.033), ('proper', 0.033), ('lemma', 0.032), ('maximal', 0.032), ('eyke', 0.032), ('ghent', 0.032), ('abstains', 0.032), ('spread', 0.03), ('barcelona', 0.03), ('namely', 0.029), ('relations', 0.029), ('probabilistic', 0.029), ('checks', 0.029), ('tradeoff', 0.028), ('psychology', 0.028), ('predict', 0.028), ('predicted', 0.028), ('obey', 0.027), ('declare', 0.027), ('montreal', 0.027), ('nes', 0.027), ('alternatives', 0.026)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999976 169 nips-2012-Label Ranking with Partial Abstention based on Thresholded Probabilistic Models

Author: Weiwei Cheng, Willem Waegeman, Volkmar Welker, Eyke Hüllermeier

Abstract: Several machine learning methods allow for abstaining from uncertain predictions. While being common for settings like conventional classification, abstention has been studied much less in learning to rank. We address abstention for the label ranking setting, allowing the learner to declare certain pairs of labels as being incomparable and, thus, to predict partial instead of total orders. In our method, such predictions are produced via thresholding the probabilities of pairwise preferences between labels, as induced by a predicted probability distribution on the set of all rankings. We formally analyze this approach for the Mallows and the Plackett-Luce model, showing that it produces proper partial orders as predictions and characterizing the expressiveness of the induced class of partial orders. These theoretical results are complemented by experiments demonstrating the practical usefulness of the approach. 1

2 0.17000876 323 nips-2012-Statistical Consistency of Ranking Methods in A Rank-Differentiable Probability Space

Author: Yanyan Lan, Jiafeng Guo, Xueqi Cheng, Tie-yan Liu

Abstract: This paper is concerned with the statistical consistency of ranking methods. Recently, it was proven that many commonly used pairwise ranking methods are inconsistent with the weighted pairwise disagreement loss (WPDL), which can be viewed as the true loss of ranking, even in a low-noise setting. This result is interesting but also surprising, given that the pairwise ranking methods have been shown very effective in practice. In this paper, we argue that the aforementioned result might not be conclusive, depending on what kind of assumptions are used. We give a new assumption that the labels of objects to rank lie in a rank-differentiable probability space (RDPS), and prove that the pairwise ranking methods become consistent with WPDL under this assumption. What is especially inspiring is that RDPS is actually not stronger than but similar to the low-noise setting. Our studies provide theoretical justifications of some empirical findings on pairwise ranking methods that are unexplained before, which bridge the gap between theory and applications.

3 0.15487561 286 nips-2012-Random Utility Theory for Social Choice

Author: Hossein Azari, David Parks, Lirong Xia

Abstract: Random utility theory models an agent’s preferences on alternatives by drawing a real-valued score on each alternative (typically independently) from a parameterized distribution, and then ranking the alternatives according to scores. A special case that has received significant attention is the Plackett-Luce model, for which fast inference methods for maximum likelihood estimators are available. This paper develops conditions on general random utility models that enable fast inference within a Bayesian framework through MC-EM, providing concave loglikelihood functions and bounded sets of global maxima solutions. Results on both real-world and simulated data provide support for the scalability of the approach and capability for model selection among general random utility models including Plackett-Luce. 1

4 0.14752643 75 nips-2012-Collaborative Ranking With 17 Parameters

Author: Maksims Volkovs, Richard S. Zemel

Abstract: The primary application of collaborate filtering (CF) is to recommend a small set of items to a user, which entails ranking. Most approaches, however, formulate the CF problem as rating prediction, overlooking the ranking perspective. In this work we present a method for collaborative ranking that leverages the strengths of the two main CF approaches, neighborhood- and model-based. Our novel method is highly efficient, with only seventeen parameters to optimize and a single hyperparameter to tune, and beats the state-of-the-art collaborative ranking methods. We also show that parameters learned on datasets from one item domain yield excellent results on a dataset from very different item domain, without any retraining. 1

5 0.14378208 278 nips-2012-Probabilistic n-Choose-k Models for Classification and Ranking

Author: Kevin Swersky, Brendan J. Frey, Daniel Tarlow, Richard S. Zemel, Ryan P. Adams

Abstract: In categorical data there is often structure in the number of variables that take on each label. For example, the total number of objects in an image and the number of highly relevant documents per query in web search both tend to follow a structured distribution. In this paper, we study a probabilistic model that explicitly includes a prior distribution over such counts, along with a count-conditional likelihood that defines probabilities over all subsets of a given size. When labels are binary and the prior over counts is a Poisson-Binomial distribution, a standard logistic regression model is recovered, but for other count distributions, such priors induce global dependencies and combinatorics that appear to complicate learning and inference. However, we demonstrate that simple, efficient learning procedures can be derived for more general forms of this model. We illustrate the utility of the formulation by exploring applications to multi-object classification, learning to rank, and top-K classification. 1

6 0.12653299 141 nips-2012-GenDeR: A Generic Diversified Ranking Algorithm

7 0.12170276 280 nips-2012-Proper losses for learning from partial labels

8 0.096671395 276 nips-2012-Probabilistic Event Cascades for Alzheimer's disease

9 0.091544963 60 nips-2012-Bayesian nonparametric models for ranked data

10 0.091411769 252 nips-2012-On Multilabel Classification and Ranking with Partial Feedback

11 0.089868814 165 nips-2012-Iterative ranking from pair-wise comparisons

12 0.08621183 74 nips-2012-Collaborative Gaussian Processes for Preference Learning

13 0.081043392 288 nips-2012-Rational inference of relative preferences

14 0.077337764 228 nips-2012-Multilabel Classification using Bayesian Compressed Sensing

15 0.07473065 206 nips-2012-Majorization for CRFs and Latent Likelihoods

16 0.069084212 111 nips-2012-Efficient Sampling for Bipartite Matching Problems

17 0.068568632 67 nips-2012-Classification Calibration Dimension for General Multiclass Losses

18 0.064847574 314 nips-2012-Slice Normalized Dynamic Markov Logic Networks

19 0.064681344 230 nips-2012-Multiple Choice Learning: Learning to Produce Multiple Structured Outputs

20 0.0642251 200 nips-2012-Local Supervised Learning through Space Partitioning


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.169), (1, 0.029), (2, 0.025), (3, -0.034), (4, 0.004), (5, -0.083), (6, 0.027), (7, 0.161), (8, 0.012), (9, 0.191), (10, -0.119), (11, 0.049), (12, -0.048), (13, 0.022), (14, 0.046), (15, -0.013), (16, 0.066), (17, 0.011), (18, 0.012), (19, 0.042), (20, 0.006), (21, 0.033), (22, -0.06), (23, 0.161), (24, 0.069), (25, 0.014), (26, 0.003), (27, 0.025), (28, -0.092), (29, 0.018), (30, 0.101), (31, -0.188), (32, 0.096), (33, 0.022), (34, 0.007), (35, -0.13), (36, -0.025), (37, -0.082), (38, 0.058), (39, 0.029), (40, 0.045), (41, -0.055), (42, -0.005), (43, 0.148), (44, 0.027), (45, -0.096), (46, -0.005), (47, -0.14), (48, -0.021), (49, 0.004)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96220171 169 nips-2012-Label Ranking with Partial Abstention based on Thresholded Probabilistic Models

Author: Weiwei Cheng, Willem Waegeman, Volkmar Welker, Eyke Hüllermeier

Abstract: Several machine learning methods allow for abstaining from uncertain predictions. While being common for settings like conventional classification, abstention has been studied much less in learning to rank. We address abstention for the label ranking setting, allowing the learner to declare certain pairs of labels as being incomparable and, thus, to predict partial instead of total orders. In our method, such predictions are produced via thresholding the probabilities of pairwise preferences between labels, as induced by a predicted probability distribution on the set of all rankings. We formally analyze this approach for the Mallows and the Plackett-Luce model, showing that it produces proper partial orders as predictions and characterizing the expressiveness of the induced class of partial orders. These theoretical results are complemented by experiments demonstrating the practical usefulness of the approach. 1

2 0.82425952 323 nips-2012-Statistical Consistency of Ranking Methods in A Rank-Differentiable Probability Space

Author: Yanyan Lan, Jiafeng Guo, Xueqi Cheng, Tie-yan Liu

Abstract: This paper is concerned with the statistical consistency of ranking methods. Recently, it was proven that many commonly used pairwise ranking methods are inconsistent with the weighted pairwise disagreement loss (WPDL), which can be viewed as the true loss of ranking, even in a low-noise setting. This result is interesting but also surprising, given that the pairwise ranking methods have been shown very effective in practice. In this paper, we argue that the aforementioned result might not be conclusive, depending on what kind of assumptions are used. We give a new assumption that the labels of objects to rank lie in a rank-differentiable probability space (RDPS), and prove that the pairwise ranking methods become consistent with WPDL under this assumption. What is especially inspiring is that RDPS is actually not stronger than but similar to the low-noise setting. Our studies provide theoretical justifications of some empirical findings on pairwise ranking methods that are unexplained before, which bridge the gap between theory and applications.

3 0.80690622 141 nips-2012-GenDeR: A Generic Diversified Ranking Algorithm

Author: Jingrui He, Hanghang Tong, Qiaozhu Mei, Boleslaw Szymanski

Abstract: Diversified ranking is a fundamental task in machine learning. It is broadly applicable in many real world problems, e.g., information retrieval, team assembling, product search, etc. In this paper, we consider a generic setting where we aim to diversify the top-k ranking list based on an arbitrary relevance function and an arbitrary similarity function among all the examples. We formulate it as an optimization problem and show that in general it is NP-hard. Then, we show that for a large volume of the parameter space, the proposed objective function enjoys the diminishing returns property, which enables us to design a scalable, greedy algorithm to find the (1 − 1/e) near-optimal solution. Experimental results on real data sets demonstrate the effectiveness of the proposed algorithm.

4 0.66946727 286 nips-2012-Random Utility Theory for Social Choice

Author: Hossein Azari, David Parks, Lirong Xia

Abstract: Random utility theory models an agent’s preferences on alternatives by drawing a real-valued score on each alternative (typically independently) from a parameterized distribution, and then ranking the alternatives according to scores. A special case that has received significant attention is the Plackett-Luce model, for which fast inference methods for maximum likelihood estimators are available. This paper develops conditions on general random utility models that enable fast inference within a Bayesian framework through MC-EM, providing concave loglikelihood functions and bounded sets of global maxima solutions. Results on both real-world and simulated data provide support for the scalability of the approach and capability for model selection among general random utility models including Plackett-Luce. 1

5 0.62847829 278 nips-2012-Probabilistic n-Choose-k Models for Classification and Ranking

Author: Kevin Swersky, Brendan J. Frey, Daniel Tarlow, Richard S. Zemel, Ryan P. Adams

Abstract: In categorical data there is often structure in the number of variables that take on each label. For example, the total number of objects in an image and the number of highly relevant documents per query in web search both tend to follow a structured distribution. In this paper, we study a probabilistic model that explicitly includes a prior distribution over such counts, along with a count-conditional likelihood that defines probabilities over all subsets of a given size. When labels are binary and the prior over counts is a Poisson-Binomial distribution, a standard logistic regression model is recovered, but for other count distributions, such priors induce global dependencies and combinatorics that appear to complicate learning and inference. However, we demonstrate that simple, efficient learning procedures can be derived for more general forms of this model. We illustrate the utility of the formulation by exploring applications to multi-object classification, learning to rank, and top-K classification. 1

6 0.53111213 288 nips-2012-Rational inference of relative preferences

7 0.51496279 75 nips-2012-Collaborative Ranking With 17 Parameters

8 0.50997025 196 nips-2012-Learning with Partially Absorbing Random Walks

9 0.4525744 165 nips-2012-Iterative ranking from pair-wise comparisons

10 0.4435834 280 nips-2012-Proper losses for learning from partial labels

11 0.43420663 276 nips-2012-Probabilistic Event Cascades for Alzheimer's disease

12 0.4339183 22 nips-2012-A latent factor model for highly multi-relational data

13 0.4324528 67 nips-2012-Classification Calibration Dimension for General Multiclass Losses

14 0.43071091 230 nips-2012-Multiple Choice Learning: Learning to Produce Multiple Structured Outputs

15 0.3562499 128 nips-2012-Fast Resampling Weighted v-Statistics

16 0.35261536 130 nips-2012-Feature-aware Label Space Dimension Reduction for Multi-label Classification

17 0.35067174 228 nips-2012-Multilabel Classification using Bayesian Compressed Sensing

18 0.34794441 302 nips-2012-Scaling MPE Inference for Constrained Continuous Markov Random Fields with Consensus Optimization

19 0.34441856 206 nips-2012-Majorization for CRFs and Latent Likelihoods

20 0.34408244 53 nips-2012-Bayesian Pedigree Analysis using Measure Factorization


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.015), (21, 0.01), (38, 0.103), (42, 0.016), (53, 0.01), (55, 0.019), (74, 0.035), (76, 0.625), (80, 0.057), (92, 0.022)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.99692482 175 nips-2012-Learning High-Density Regions for a Generalized Kolmogorov-Smirnov Test in High-Dimensional Data

Author: Assaf Glazer, Michael Lindenbaum, Shaul Markovitch

Abstract: We propose an efficient, generalized, nonparametric, statistical KolmogorovSmirnov test for detecting distributional change in high-dimensional data. To implement the test, we introduce a novel, hierarchical, minimum-volume sets estimator to represent the distributions to be tested. Our work is motivated by the need to detect changes in data streams, and the test is especially efficient in this context. We provide the theoretical foundations of our test and show its superiority over existing methods. 1

2 0.988949 33 nips-2012-Active Learning of Model Evidence Using Bayesian Quadrature

Author: Michael Osborne, Roman Garnett, Zoubin Ghahramani, David K. Duvenaud, Stephen J. Roberts, Carl E. Rasmussen

Abstract: Numerical integration is a key component of many problems in scientific computing, statistical modelling, and machine learning. Bayesian Quadrature is a modelbased method for numerical integration which, relative to standard Monte Carlo methods, offers increased sample efficiency and a more robust estimate of the uncertainty in the estimated integral. We propose a novel Bayesian Quadrature approach for numerical integration when the integrand is non-negative, such as the case of computing the marginal likelihood, predictive distribution, or normalising constant of a probabilistic model. Our approach approximately marginalises the quadrature model’s hyperparameters in closed form, and introduces an active learning scheme to optimally select function evaluations, as opposed to using Monte Carlo samples. We demonstrate our method on both a number of synthetic benchmarks and a real scientific problem from astronomy. 1

3 0.98849142 286 nips-2012-Random Utility Theory for Social Choice

Author: Hossein Azari, David Parks, Lirong Xia

Abstract: Random utility theory models an agent’s preferences on alternatives by drawing a real-valued score on each alternative (typically independently) from a parameterized distribution, and then ranking the alternatives according to scores. A special case that has received significant attention is the Plackett-Luce model, for which fast inference methods for maximum likelihood estimators are available. This paper develops conditions on general random utility models that enable fast inference within a Bayesian framework through MC-EM, providing concave loglikelihood functions and bounded sets of global maxima solutions. Results on both real-world and simulated data provide support for the scalability of the approach and capability for model selection among general random utility models including Plackett-Luce. 1

4 0.98666626 311 nips-2012-Shifting Weights: Adapting Object Detectors from Image to Video

Author: Kevin Tang, Vignesh Ramanathan, Li Fei-fei, Daphne Koller

Abstract: Typical object detectors trained on images perform poorly on video, as there is a clear distinction in domain between the two types of data. In this paper, we tackle the problem of adapting object detectors learned from images to work well on videos. We treat the problem as one of unsupervised domain adaptation, in which we are given labeled data from the source domain (image), but only unlabeled data from the target domain (video). Our approach, self-paced domain adaptation, seeks to iteratively adapt the detector by re-training the detector with automatically discovered target domain examples, starting with the easiest first. At each iteration, the algorithm adapts by considering an increased number of target domain examples, and a decreased number of source domain examples. To discover target domain examples from the vast amount of video data, we introduce a simple, robust approach that scores trajectory tracks instead of bounding boxes. We also show how rich and expressive features specific to the target domain can be incorporated under the same framework. We show promising results on the 2011 TRECVID Multimedia Event Detection [1] and LabelMe Video [2] datasets that illustrate the benefit of our approach to adapt object detectors to video. 1

5 0.98583776 28 nips-2012-A systematic approach to extracting semantic information from functional MRI data

Author: Francisco Pereira, Matthew Botvinick

Abstract: This paper introduces a novel classification method for functional magnetic resonance imaging datasets with tens of classes. The method is designed to make predictions using information from as many brain locations as possible, instead of resorting to feature selection, and does this by decomposing the pattern of brain activation into differently informative sub-regions. We provide results over a complex semantic processing dataset that show that the method is competitive with state-of-the-art feature selection and also suggest how the method may be used to perform group or exploratory analyses of complex class structure. 1

same-paper 6 0.98324978 169 nips-2012-Label Ranking with Partial Abstention based on Thresholded Probabilistic Models

7 0.9815613 205 nips-2012-MCMC for continuous-time discrete-state systems

8 0.95881629 164 nips-2012-Iterative Thresholding Algorithm for Sparse Inverse Covariance Estimation

9 0.95173955 247 nips-2012-Nonparametric Reduced Rank Regression

10 0.94587636 307 nips-2012-Semi-Crowdsourced Clustering: Generalizing Crowd Labeling by Robust Distance Metric Learning

11 0.89710224 338 nips-2012-The Perturbed Variation

12 0.88512379 142 nips-2012-Generalization Bounds for Domain Adaptation

13 0.88187224 318 nips-2012-Sparse Approximate Manifolds for Differential Geometric MCMC

14 0.87070352 41 nips-2012-Ancestor Sampling for Particle Gibbs

15 0.87043256 264 nips-2012-Optimal kernel choice for large-scale two-sample tests

16 0.8677178 291 nips-2012-Reducing statistical time-series problems to binary classification

17 0.86739331 99 nips-2012-Dip-means: an incremental clustering method for estimating the number of clusters

18 0.86721069 327 nips-2012-Structured Learning of Gaussian Graphical Models

19 0.86572123 203 nips-2012-Locating Changes in Highly Dependent Data with Unknown Number of Change Points

20 0.86310071 317 nips-2012-Smooth-projected Neighborhood Pursuit for High-dimensional Nonparanormal Graph Estimation