nips nips2009 nips2009-230 knowledge-graph by maker-knowledge-mining

230 nips-2009-Statistical Consistency of Top-k Ranking

Source: pdf

Author: Fen Xia, Tie-yan Liu, Hang Li

Abstract: This paper is concerned with the consistency analysis on listwise ranking methods. Among various ranking methods, the listwise methods have competitive performances on benchmark datasets and are regarded as one of the state-of-the-art approaches. Most listwise ranking methods manage to optimize ranking on the whole list (permutation) of objects, however, in practical applications such as information retrieval, correct ranking at the top k positions is much more important. This paper aims to analyze whether existing listwise ranking methods are statistically consistent in the top-k setting. For this purpose, we deﬁne a top-k ranking framework, where the true loss (and thus the risks) are deﬁned on the basis of top-k subgroup of permutations. This framework can include the permutationlevel ranking framework proposed in previous work as a special case. Based on the new framework, we derive sufﬁcient conditions for a listwise ranking method to be consistent with the top-k true loss, and show an effective way of modifying the surrogate loss functions in existing methods to satisfy these conditions. Experimental results show that after the modiﬁcations, the methods can work signiﬁcantly better than their original versions. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 com Abstract This paper is concerned with the consistency analysis on listwise ranking methods. [sent-6, score-0.607]

2 Among various ranking methods, the listwise methods have competitive performances on benchmark datasets and are regarded as one of the state-of-the-art approaches. [sent-7, score-0.596]

3 Most listwise ranking methods manage to optimize ranking on the whole list (permutation) of objects, however, in practical applications such as information retrieval, correct ranking at the top k positions is much more important. [sent-8, score-1.447]

4 This paper aims to analyze whether existing listwise ranking methods are statistically consistent in the top-k setting. [sent-9, score-0.659]

5 For this purpose, we deﬁne a top-k ranking framework, where the true loss (and thus the risks) are deﬁned on the basis of top-k subgroup of permutations. [sent-10, score-1.281]

6 This framework can include the permutationlevel ranking framework proposed in previous work as a special case. [sent-11, score-0.422]

7 Based on the new framework, we derive sufﬁcient conditions for a listwise ranking method to be consistent with the top-k true loss, and show an effective way of modifying the surrogate loss functions in existing methods to satisfy these conditions. [sent-12, score-1.371]

8 Empirical results on benchmark datasets have demonstrated that the listwise ranking methods have very competitive ranking performances [10]. [sent-16, score-0.98]

9 To explain the high ranking performances of the listwise ranking methods, a theoretical framework was proposed in [16]. [sent-17, score-0.979]

10 In the framework, existing listwise ranking methods are interpreted as making use of different surrogate loss functions of the permutation-level 0-1 loss. [sent-18, score-1.234]

11 Theoretical analysis shows that these surrogate loss functions are all statistically consistent in the sense that minimization of the conditional expectation of them will lead to obtaining the Bayes ranker, i. [sent-19, score-0.761]

12 Here we point out that there is a gap between the analysis in [16] and many real ranking problems, where the correct ranking of the entire permutation is not needed. [sent-22, score-0.909]

13 For example, in IR, users usually care much more about the top ranking results and thus only correct ranking at the top positions is important. [sent-23, score-0.91]

14 In this new situation, it is no longer clear whether existing listwise ranking methods are still statistically consistent. [sent-24, score-0.598]

15 For this purpose, we propose a new ranking framework, in which the “true loss” is deﬁned on the top-k subgroup of permutations instead of on the entire permutation. [sent-26, score-0.966]

16 The new true loss only measures errors occurring at the top k positions of a ranked list, therefore we refer to it as the top-k true loss (Note that when k equals the length of the ranked list, the top-k true loss will become exactly 1 the permutation-level 0-1 loss). [sent-27, score-1.362]

17 We prove a new theorem which gives sufﬁcient conditions for a surrogate loss function to be consistent with the top-k true loss. [sent-28, score-0.797]

18 Our analysis shows that, as k decreases, to guarantee the consistency of a surrogate loss function, the requirement on the probability space becomes weaker while the requirement on the surrogate loss function itself becomes stronger. [sent-30, score-1.507]

19 As a result, a surrogate loss function that is consistent with the permutation-level 0-1 loss might not be consistent with the top-k true loss any more. [sent-31, score-1.441]

20 Therefore, the surrogate loss functions in existing listwise ranking methods, which have been proved to be consistent with the permutation-level 0-1 loss, are not theoretically guaranteed to have good performances in the top-k setting. [sent-32, score-1.355]

21 Modiﬁcations to these surrogate loss functions are needed to further make them consistent with the top-k true loss. [sent-33, score-0.782]

22 2 Permutation-level ranking framework We review the permutation-level ranking framework proposed in [16]. [sent-36, score-0.806]

23 The task of learning to rank is to learn a function that can minimize the expected risk R(h), deﬁned as, R(h) = l(h(x), y)dP (x, y), (1) 1, 0, (2) X×Y where l(h(x), y) is the true loss such that l(h(x), y) = if h(x) = y if h(x) = y. [sent-40, score-0.443]

24 The above true loss indicates that if the permutation of the predicted result is exactly the same as the permutation in the ground truth, then the loss is zero; otherwise the loss is one. [sent-41, score-1.26]

25 The optimal ranking function which can minimize the expected true risk R(h∗ ) = inf R(h) is referred to as the permutation-level Bayes ranker. [sent-43, score-0.475]

26 Since the risk is non-continuous and non-differentiable with respect to the scoring function g, a continuous and differentiable surrogate loss function φ(g(x), y) is usually used as an approximation of the true loss. [sent-49, score-0.758]

27 It has been shown in [16] that many existing listwise ranking methods fall into the above framework, with different surrogate loss functions used. [sent-54, score-1.234]

28 Furthermore, their surrogate loss functions are statistically consistent under certain conditions with respect to the permutation-level 0-1 loss. [sent-55, score-0.809]

29 However, as shown in the next section, the permutation-level 0-1 loss is not suitable to describe the ranking problem in many real applications. [sent-56, score-0.713]

30 3 Top-k ranking framework We next describe the real ranking problem, and then propose the top-k ranking framework. [sent-57, score-1.188]

31 1 Top-k ranking problem In real ranking applications like IR, people pay more attention to the top-ranked objects. [sent-59, score-0.785]

32 Therefore the correct ranking on the top positions is critically important. [sent-60, score-0.478]

33 It means that two ranked lists of documents will likely provide the same experience to the users (and thus suffer the same loss), if they have the same ranking results for the top positions. [sent-63, score-0.509]

34 2 Top-k true loss To better describe the top-k ranking problem, we propose deﬁning the true loss based on the top k positions in a ranked list, referred to as the top-k true loss. [sent-69, score-1.338]

35 When k equals the length of the entire ranked list, the top-k true loss will become exactly the permutation-level 0-1 loss. [sent-75, score-0.461]

36 (6) X×Y It can be proved that the optimal ranking function with respect to the top-k true loss (i. [sent-79, score-0.804]

37 , the top-k Bayes ranker) is any permutation in the top-k subgroup having the highest probability2 , i. [sent-81, score-0.656]

38 k} denotes a top-k subgroup in which all the permutations have the same top-k true loss; Gk denotes the collection of all top-k subgroups. [sent-95, score-0.665]

39 With the above setting, we will analyze the consistency of the surrogate loss functions in existing ranking methods with the top-k true loss in the next section. [sent-96, score-1.498]

40 4 Theoretical analysis In this section, we ﬁrst give the sufﬁcient conditions of consistency for the top-k ranking problem. [sent-97, score-0.468]

41 Last, we discuss whether the surrogate loss functions in existing methods are consistent, and how to make them consistent if not. [sent-99, score-0.749]

42 1 Statistical consistency We investigate what kinds of surrogate loss functions φ(g(x), y) are statistically consistent with the top-k true loss. [sent-101, score-0.875]

43 For this purpose, we study whether the ranking function that minimizes the conditional expectation of the surrogate loss function deﬁned as follows coincides with the top-k Bayes ranker as deﬁned in Eq. [sent-102, score-1.074]

44 com/ Note that the probability of a top-k subgroup is deﬁned as the sum of the probabilities of the permutations in the subgroup (cf. [sent-107, score-1.133]

45 2 3 According to [1], the above condition is the weakest condition to guarantee that optimizing a surrogate loss function will lead to obtaining a model achieving the Bayes risk (in our case, the top-k Bayes ranker), when the training sample size approaches inﬁnity. [sent-109, score-0.68]

46 Hence, Q(p, g) is the loss of g at x with respect to the conditional probability distribution py . [sent-111, score-0.407]

47 ΛGk is the a top-k subgroup probability space, such that ΛGk Gk (j1 ,j2 ,. [sent-117, score-0.551]

48 A top-k subgroup probability space ΛGk is order preserving with respect to objects −1 −1 −1 i and j, if ∀y ∈ Yi,j and Gk (y(1), y(2), . [sent-128, score-0.773]

49 A surrogate loss function φ is top-k subgroup order sensitive on a set Ω ⊂ Rn , if φ is a non-negative differentiable function and the following three conditions hold for ∀ objects i and −1 −1 j: (1) φ(g, y) = φ(σi,j g, σi,j y); (2)Assume gi < gj , ∀y ∈ Yi,j . [sent-143, score-1.496]

50 (iii) There exists a permutation, for which the speed of change in loss with respect to the score of an object will become faster if exchanging its position with another object with the same score but ranked lower. [sent-162, score-0.563]

51 A top-k subgroup order sensitive surrogate loss function has several nice properties as shown below. [sent-163, score-1.323]

52 Let φ(g, y) be a top-k subgroup order sensitive loss function. [sent-165, score-0.977]

53 Let φ(g, y) be a top-k subgroup order sensitive surrogate loss function. [sent-171, score-1.307]

54 Proposition 4 shows that all permutations in the same top-k subgroup share the same loss φ(g, y) and thus share the same partial difference with respect to the score of a given object. [sent-180, score-0.948]

55 Based on the above deﬁnitions and propositions, we give the main theorem (Theorem 6), which states the sufﬁcient conditions for a surrogate loss function to be consistent with the top-k true loss. [sent-183, score-0.797]

56 Let φ be a top-k subgroup order sensitive loss function on Ω ⊂ Rn . [sent-185, score-0.977]

57 For ∀n objects, if its top-k subgroup probability space is order preserving with respect to n − 1 object pairs {(ji , ji+1 )}k and {(jk+si , jk+i : 0 ≤ si < i)}n−k , then the loss φ(g, y) is consistent with the i=1 i=2 top-k true loss as deﬁned in Eq. [sent-186, score-1.479]

58 Let φ(g, y) be a top-k subgroup order sensitive loss function. [sent-192, score-0.977]

59 ∀i and j, if the topk subgroup probability space is order preserving with respect to them, and g is a vector which minimizes Q(p, g) in Eq. [sent-193, score-0.715]

60 Without loss of generality, we assume i = 1, j = 2, g1 = g2 , g2 = g1 , and gk = gk (k > 2). [sent-196, score-0.784]

61 After some algebra, by using Proposition 4, we have, −1 (pGk (σ−1 y) − pGk (y) )(φ(g, y) − φ(g, σ1,2 y)), Q(p, g ) − Q(p, g) = 1,2 −1 Gk (y)∈{Gk :Gk (y)=Gk (σ1,2 y)}:y∈Y1,2 where Gk (y) denotes the subgroup that y belongs to. [sent-200, score-0.547]

62 Meanwhile, pGk (σ−1 y) < pGk (y) due to the order 1,2 preserving of the top-k subgroup probability space. [sent-202, score-0.673]

63 Meanwhile, pGk (σ−1 y) < pGk (y) due to ∂g1 ∂g1 1,2 the order preserving of the top-k subgroup probability space. [sent-211, score-0.673]

64 First, we have the following proposition for the top-k subgroup probability space. [sent-215, score-0.639]

65 If the top-k subgroup probability space is order preserving with respect to object i and j, the top-(k − 1) subgroup probability space is also order preserving with respect to i and j. [sent-217, score-1.456]

66 The proposition can be proved by decomposing a top-(k − 1) subgroup into the sum of top-k subgroups. [sent-218, score-0.65]

67 If the top-2 subgroup probability space is order preserving with respect to objects 1 and 2, then we have pG2 (1,2) > pG2 (2,1) , pG2 (1,3) > pG2 (2,3) and pG2 (3,1) > pG2 (3,2) . [sent-222, score-0.773]

68 Second, we obtain the following proposition for the surrogate loss function φ. [sent-226, score-0.73]

69 If the surrogate loss function φ is top-k subgroup order sensitive on a set Ω ⊂ Rn , then it is also top-(k + 1) subgroup order sensitive on the same set. [sent-230, score-1.972]

70 If φ is top-1 subgroup order sensitive, then we have φ(g, (1, 2, 3)) ≥ φ(g, (2, 1, 3)), φ(g, (1, 3, 2)) ≥ φ(g, (2, 3, 1)), and φ(g, (3, 1, 2)) = φ(g, (3, 2, 1)). [sent-234, score-0.56]

71 On the other hand, if φ is top-2 subgroup order sensitive, the following inequalities hold with at least one of them being strict: φ(g, (1, 2, 3)) ≥ φ(g, (2, 1, 3)), φ(g, (1, 3, 2)) ≥ φ(g, (2, 3, 1)), and φ(g, (3, 1, 2)) ≥ φ(g, (3, 2, 1)). [sent-236, score-0.56]

72 Therefore top-1 subgroup order sensitive is a special case of top-2 subgroup order sensitive. [sent-237, score-1.225]

73 • For the consistency with the top-k true loss, when k becomes smaller, the requirement on the probability space becomes weaker but the requirement on the surrogate loss function becomes stronger. [sent-239, score-0.934]

74 Since we never know the real property of the (unknown) probability space, it is more likely the requirement on the probability space for the consistency with the top-k true loss can be satisﬁed than that for the top-l (l > k) true loss. [sent-240, score-0.598]

75 • If we ﬁx the true loss to be top-k and the probability space to be top-k subgroup order preserving, the surrogate loss function should be at most top-l (l ≤ k) subgroup order sensitive in order to meet the consistency conditions. [sent-242, score-2.357]

76 It is not guaranteed that a top-l (l > k) subgroup order sensitive surrogate loss function can be consistent with the top-k true loss. [sent-243, score-1.421]

77 For example, a top-1 subgroup order sensitive surrogate loss function may be consistent with any top-k true loss, but a permutation-level order sensitive surrogate loss function may not be consistent with any top-k true loss, if k is smaller than the length of the list. [sent-244, score-2.31]

78 It basically says that given a probability space that is top-1 subgroup order preserving, a top-3 subgroup order sensitive surrogate loss function may not be consistent with the top-1 true loss. [sent-246, score-2.017]

79 φ is a top-3 subgroup order sensitive loss function and the strict inequality φ(g, (3, 1, 2)) < φ(g, (3, 2, 1)) holds when g1 > g2 . [sent-249, score-1.0]

80 The above discussions imply that although the surrogate loss functions in existing listwise ranking methods are consistent with the permutation-level 0-1 loss (under a rigid condition), they may not be consistent with the top-k true loss (under a mild condition). [sent-252, score-2.052]

81 Therefore, it is necessary to modify these surrogate loss functions. [sent-253, score-0.666]

82 3 Consistent surrogate loss functions In [16], the surrogate loss functions in ListNet, RankCosine, and ListMLE have been proved to be permutation-level order sensitive. [sent-256, score-1.394]

83 According to the discussion in the previous subsection, however, they may not be top-k subgroup order sensitive, and therefore not consistent with the top-k true loss. [sent-257, score-0.674]

84 Even for the consistency with the permutation-level 0-1 loss, in order to guarantee these surrogate loss functions to be consistent, the requirement on the probability space may be too strong in some real scenarios. [sent-258, score-0.857]

85 To tackle the challenge, it is desirable to modify these surrogate loss functions to make them top-k subgroup order sensitive. [sent-259, score-1.252]

86 Actually this is doable, and the modiﬁcations to the aforementioned surrogate loss functions are given as follows. [sent-260, score-0.668]

87 1 Likelihood loss The likelihood loss is the loss function used in ListMLE [16], which is deﬁned as below, n φ(g(x), y) = − log P (y|x; g), where P (y|x; g) = i=1 6 exp(g(xy(i) )) . [sent-263, score-0.936]

88 exp(g(xy(t) )) n t=i (9) We propose replacing the permutation probability with the top-k subgroup probability (which is also deﬁned with the Luce model [11]) in the above deﬁnition: k P (y|x; g) = i=1 exp(g(xy(i) )) . [sent-264, score-0.694]

89 exp(g(xy(t) )) n t=i (10) It can be proved that the modiﬁed loss is top-k subgroup order sensitive (see [15]). [sent-265, score-1.007]

90 Let the mapping function retain the order for the top k positions of the ground truth permutation and assigns to all the remaining positions a small value (which is smaller than the score of any object ranked at the top-k positions), i. [sent-272, score-0.512]

91 It can be proved that after the modiﬁcation, the cosine loss becomes top-k subgroup order sensitive (see [15]). [sent-275, score-1.054]

92 We propose using a mapping function to modify the cross entropy loss in a similar way as in the case of the cosine loss4 It can be proved that such a modiﬁcation can make the surrogate loss function top-k subgroup order sensitive (see [15]). [sent-279, score-1.768]

93 5 It is obvious that these measures are top-k related and are suitable to evaluate the ranking performance in top-k ranking problems. [sent-283, score-0.768]

94 However, it can be veriﬁed that the so-deﬁned top-k cross entropy loss is still permutation-level order sensitive, but not top-k subgroup order sensitive. [sent-295, score-0.949]

95 269 Table 3: Ranking accuracies on TD2004 Table 4: Ranking accuracies on OHSUMED From the tables, we can see that with the modiﬁcations the ranking accuracies of ListMLE can be signiﬁcantly boosted, in terms of all measures, on both TD2003 and TD2004. [sent-393, score-0.534]

96 6 Conclusion In this paper we have proposed a top-k ranking framework, which can better describe real ranking applications like information retrieval. [sent-453, score-0.785]

97 In the framework, the true loss is deﬁned on the top-k subgroup of permutations. [sent-454, score-0.897]

98 We have derived the sufﬁcient conditions for a surrogate loss function to be statistically consistent with the top-k true loss. [sent-455, score-0.811]

99 We have also discussed how to modify the loss functions in existing listwise ranking methods to make them consistent with the top-k true loss. [sent-456, score-1.042]

100 (2) We will also study the consistency of the pointwise and pairwise loss functions with the top-k true loss. [sent-460, score-0.485]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('subgroup', 0.532), ('ranking', 0.384), ('surrogate', 0.33), ('loss', 0.312), ('listmle', 0.264), ('gk', 0.236), ('pgk', 0.18), ('listwise', 0.162), ('permutation', 0.124), ('sensitive', 0.105), ('rankcosine', 0.096), ('preserving', 0.094), ('proposition', 0.088), ('xy', 0.083), ('listnet', 0.079), ('ranked', 0.077), ('gi', 0.075), ('positions', 0.072), ('ohsumed', 0.067), ('consistency', 0.061), ('consistent', 0.061), ('objects', 0.058), ('true', 0.053), ('py', 0.051), ('accuracies', 0.05), ('permutations', 0.05), ('rankboost', 0.048), ('ranker', 0.048), ('requirement', 0.047), ('modi', 0.045), ('cations', 0.043), ('rank', 0.04), ('exchanging', 0.039), ('list', 0.039), ('risk', 0.038), ('jk', 0.037), ('luce', 0.036), ('ir', 0.035), ('gj', 0.033), ('liu', 0.032), ('xia', 0.032), ('letor', 0.032), ('statistically', 0.032), ('cosine', 0.031), ('qin', 0.03), ('proved', 0.03), ('performances', 0.03), ('score', 0.029), ('order', 0.028), ('bayes', 0.028), ('tables', 0.028), ('cossock', 0.026), ('object', 0.026), ('functions', 0.026), ('users', 0.026), ('respect', 0.025), ('entropy', 0.025), ('cross', 0.024), ('truth', 0.024), ('modify', 0.024), ('strict', 0.023), ('ground', 0.023), ('conditions', 0.023), ('click', 0.022), ('top', 0.022), ('meanwhile', 0.021), ('benchmark', 0.02), ('boosted', 0.02), ('existing', 0.02), ('tsai', 0.019), ('nition', 0.019), ('equals', 0.019), ('discussions', 0.019), ('engine', 0.019), ('burges', 0.019), ('validates', 0.019), ('framework', 0.019), ('probability', 0.019), ('lk', 0.018), ('propositions', 0.018), ('asia', 0.018), ('dp', 0.018), ('theorem', 0.018), ('space', 0.017), ('real', 0.017), ('microsoft', 0.017), ('ratings', 0.017), ('pointwise', 0.017), ('pairwise', 0.016), ('svm', 0.016), ('becomes', 0.016), ('nice', 0.016), ('sigir', 0.015), ('sorting', 0.015), ('ji', 0.015), ('denotes', 0.015), ('purpose', 0.015), ('mapping', 0.015), ('ranks', 0.015), ('ease', 0.015)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000001 230 nips-2009-Statistical Consistency of Top-k Ranking

Author: Fen Xia, Tie-yan Liu, Hang Li

2 0.40826267 199 nips-2009-Ranking Measures and Loss Functions in Learning to Rank

Author: Wei Chen, Tie-yan Liu, Yanyan Lan, Zhi-ming Ma, Hang Li

Abstract: Learning to rank has become an important research topic in machine learning. While most learning-to-rank methods learn the ranking functions by minimizing loss functions, it is the ranking measures (such as NDCG and MAP) that are used to evaluate the performance of the learned ranking functions. In this work, we reveal the relationship between ranking measures and loss functions in learningto-rank methods, such as Ranking SVM, RankBoost, RankNet, and ListMLE. We show that the loss functions of these methods are upper bounds of the measurebased ranking errors. As a result, the minimization of these loss functions will lead to the maximization of the ranking measures. The key to obtaining this result is to model ranking as a sequence of classiﬁcation tasks, and deﬁne a so-called essential loss for ranking as the weighted sum of the classiﬁcation errors of individual tasks in the sequence. We have proved that the essential loss is both an upper bound of the measure-based ranking errors, and a lower bound of the loss functions in the aforementioned methods. Our proof technique also suggests a way to modify existing loss functions to make them tighter bounds of the measure-based ranking errors. Experimental results on benchmark datasets show that the modiﬁcations can lead to better ranking performances, demonstrating the correctness of our theoretical analysis. 1

3 0.21557038 136 nips-2009-Learning to Rank by Optimizing NDCG Measure

Author: Hamed Valizadegan, Rong Jin, Ruofei Zhang, Jianchang Mao

Abstract: Learning to rank is a relatively new ﬁeld of study, aiming to learn a ranking function from a set of training data with relevancy labels. The ranking algorithms are often evaluated using information retrieval measures, such as Normalized Discounted Cumulative Gain (NDCG) [1] and Mean Average Precision (MAP) [2]. Until recently, most learning to rank algorithms were not using a loss function related to the above mentioned evaluation measures. The main difﬁculty in direct optimization of these measures is that they depend on the ranks of documents, not the numerical values output by the ranking function. We propose a probabilistic framework that addresses this challenge by optimizing the expectation of NDCG over all the possible permutations of documents. A relaxation strategy is used to approximate the average of NDCG over the space of permutation, and a bound optimization approach is proposed to make the computation efﬁcient. Extensive experiments show that the proposed algorithm outperforms state-of-the-art ranking algorithms on several benchmark data sets. 1

4 0.16397619 87 nips-2009-Exponential Family Graph Matching and Ranking

Author: James Petterson, Jin Yu, Julian J. Mcauley, Tibério S. Caetano

Abstract: We present a method for learning max-weight matching predictors in bipartite graphs. The method consists of performing maximum a posteriori estimation in exponential families with sufﬁcient statistics that encode permutations and data features. Although inference is in general hard, we show that for one very relevant application–document ranking–exact inference is efﬁcient. For general model instances, an appropriate sampler is readily available. Contrary to existing max-margin matching models, our approach is statistically consistent and, in addition, experiments with increasing sample sizes indicate superior improvement over such models. We apply the method to graph matching in computer vision as well as to a standard benchmark dataset for learning document ranking, in which we obtain state-of-the-art results, in particular improving on max-margin variants. The drawback of this method with respect to max-margin alternatives is its runtime for large graphs, which is comparatively high. 1

5 0.13614686 190 nips-2009-Polynomial Semantic Indexing

Author: Bing Bai, Jason Weston, David Grangier, Ronan Collobert, Kunihiko Sadamasa, Yanjun Qi, Corinna Cortes, Mehryar Mohri

Abstract: We present a class of nonlinear (polynomial) models that are discriminatively trained to directly map from the word content in a query-document or documentdocument pair to a ranking score. Dealing with polynomial models on word features is computationally challenging. We propose a low-rank (but diagonal preserving) representation of our polynomial models to induce feasible memory and computation requirements. We provide an empirical study on retrieval tasks based on Wikipedia documents, where we obtain state-of-the-art performance while providing realistically scalable methods. 1

6 0.098167583 78 nips-2009-Efficient Moments-based Permutation Tests

7 0.081047468 3 nips-2009-AUC optimization and the two-sample problem

8 0.066441216 98 nips-2009-From PAC-Bayes Bounds to KL Regularization

9 0.065639399 207 nips-2009-Robust Nonparametric Regression with Metric-Space Valued Output

10 0.065104693 261 nips-2009-fMRI-Based Inter-Subject Cortical Alignment Using Functional Connectivity

11 0.063504249 176 nips-2009-On Invariance in Hierarchical Models

12 0.06125718 20 nips-2009-A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers

13 0.059890412 71 nips-2009-Distribution-Calibrated Hierarchical Classification

14 0.053877342 7 nips-2009-A Data-Driven Approach to Modeling Choice

15 0.050002508 102 nips-2009-Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models

16 0.047587179 181 nips-2009-Online Learning of Assignments

17 0.044786215 161 nips-2009-Nash Equilibria of Static Prediction Games

18 0.042576376 37 nips-2009-Asymptotically Optimal Regularization in Smooth Parametric Models

19 0.040885445 14 nips-2009-A Parameter-free Hedging Algorithm

20 0.038524199 11 nips-2009-A General Projection Property for Distribution Families

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.133), (1, 0.085), (2, -0.061), (3, -0.039), (4, 0.014), (5, -0.067), (6, -0.392), (7, -0.218), (8, 0.091), (9, -0.256), (10, 0.15), (11, 0.07), (12, -0.139), (13, -0.039), (14, -0.017), (15, 0.024), (16, -0.136), (17, 0.046), (18, -0.024), (19, 0.034), (20, -0.102), (21, -0.007), (22, -0.049), (23, -0.048), (24, -0.003), (25, 0.005), (26, -0.015), (27, 0.061), (28, -0.036), (29, 0.014), (30, 0.065), (31, 0.027), (32, -0.002), (33, -0.029), (34, 0.027), (35, 0.012), (36, -0.005), (37, -0.009), (38, -0.012), (39, -0.01), (40, 0.058), (41, -0.033), (42, 0.062), (43, 0.05), (44, -0.047), (45, -0.002), (46, 0.046), (47, -0.089), (48, -0.01), (49, -0.022)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.98061943 230 nips-2009-Statistical Consistency of Top-k Ranking

Author: Fen Xia, Tie-yan Liu, Hang Li

2 0.93852681 199 nips-2009-Ranking Measures and Loss Functions in Learning to Rank

Author: Wei Chen, Tie-yan Liu, Yanyan Lan, Zhi-ming Ma, Hang Li

3 0.81724268 136 nips-2009-Learning to Rank by Optimizing NDCG Measure

Author: Hamed Valizadegan, Rong Jin, Ruofei Zhang, Jianchang Mao

4 0.62999952 87 nips-2009-Exponential Family Graph Matching and Ranking

Author: James Petterson, Jin Yu, Julian J. Mcauley, Tibério S. Caetano

5 0.54167193 206 nips-2009-Riffled Independence for Ranked Data

Author: Jonathan Huang, Carlos Guestrin

Abstract: Representing distributions over permutations can be a daunting task due to the fact that the number of permutations of n objects scales factorially in n. One recent way that has been used to reduce storage complexity has been to exploit probabilistic independence, but as we argue, full independence assumptions impose strong sparsity constraints on distributions and are unsuitable for modeling rankings. We identify a novel class of independence structures, called rifﬂed independence, which encompasses a more expressive family of distributions while retaining many of the properties necessary for performing efﬁcient inference and reducing sample complexity. In rifﬂed independence, one draws two permutations independently, then performs the rifﬂe shufﬂe, common in card games, to combine the two permutations to form a single permutation. In ranking, rifﬂed independence corresponds to ranking disjoint sets of objects independently, then interleaving those rankings. We provide a formal introduction and present algorithms for using rifﬂed independence within Fourier-theoretic frameworks which have been explored by a number of recent papers. 1

6 0.48980978 78 nips-2009-Efficient Moments-based Permutation Tests

7 0.46455371 190 nips-2009-Polynomial Semantic Indexing

8 0.43445104 7 nips-2009-A Data-Driven Approach to Modeling Choice

9 0.4008747 3 nips-2009-AUC optimization and the two-sample problem

10 0.33043122 244 nips-2009-The Wisdom of Crowds in the Recollection of Order Information

11 0.26469392 98 nips-2009-From PAC-Bayes Bounds to KL Regularization

12 0.2604956 233 nips-2009-Streaming Pointwise Mutual Information

13 0.25282606 161 nips-2009-Nash Equilibria of Static Prediction Games

14 0.23734139 207 nips-2009-Robust Nonparametric Regression with Metric-Space Valued Output

15 0.23126599 94 nips-2009-Fast Learning from Non-i.i.d. Observations

16 0.23075207 37 nips-2009-Asymptotically Optimal Regularization in Smooth Parametric Models

17 0.22003706 71 nips-2009-Distribution-Calibrated Hierarchical Classification

18 0.21108934 14 nips-2009-A Parameter-free Hedging Algorithm

19 0.21059638 59 nips-2009-Construction of Nonparametric Bayesian Models from Parametric Bayes Equations

20 0.20962456 176 nips-2009-On Invariance in Hierarchical Models

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(1, 0.022), (12, 0.246), (24, 0.046), (25, 0.089), (35, 0.06), (36, 0.067), (39, 0.05), (58, 0.072), (61, 0.016), (71, 0.057), (77, 0.034), (81, 0.01), (86, 0.114), (91, 0.017)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.81148386 230 nips-2009-Statistical Consistency of Top-k Ranking

Author: Fen Xia, Tie-yan Liu, Hang Li

2 0.76155514 2 nips-2009-3D Object Recognition with Deep Belief Nets

Author: Vinod Nair, Geoffrey E. Hinton

Abstract: We introduce a new type of top-level model for Deep Belief Nets and evaluate it on a 3D object recognition task. The top-level model is a third-order Boltzmann machine, trained using a hybrid algorithm that combines both generative and discriminative gradients. Performance is evaluated on the NORB database (normalized-uniform version), which contains stereo-pair images of objects under diﬀerent lighting conditions and viewpoints. Our model achieves 6.5% error on the test set, which is close to the best published result for NORB (5.9%) using a convolutional neural net that has built-in knowledge of translation invariance. It substantially outperforms shallow models such as SVMs (11.6%). DBNs are especially suited for semi-supervised learning, and to demonstrate this we consider a modiﬁed version of the NORB recognition task in which additional unlabeled images are created by applying small translations to the images in the database. With the extra unlabeled data (and the same amount of labeled data as before), our model achieves 5.2% error. 1

3 0.6437785 132 nips-2009-Learning in Markov Random Fields using Tempered Transitions

Author: Ruslan Salakhutdinov

Abstract: Markov random ﬁelds (MRF’s), or undirected graphical models, provide a powerful framework for modeling complex dependencies among random variables. Maximum likelihood learning in MRF’s is hard due to the presence of the global normalizing constant. In this paper we consider a class of stochastic approximation algorithms of the Robbins-Monro type that use Markov chain Monte Carlo to do approximate maximum likelihood learning. We show that using MCMC operators based on tempered transitions enables the stochastic approximation algorithm to better explore highly multimodal distributions, which considerably improves parameter estimates in large, densely-connected MRF’s. Our results on MNIST and NORB datasets demonstrate that we can successfully learn good generative models of high-dimensional, richly structured data that perform well on digit and object recognition tasks.

4 0.6347304 136 nips-2009-Learning to Rank by Optimizing NDCG Measure

Author: Hamed Valizadegan, Rong Jin, Ruofei Zhang, Jianchang Mao

5 0.62074047 106 nips-2009-Heavy-Tailed Symmetric Stochastic Neighbor Embedding

Author: Zhirong Yang, Irwin King, Zenglin Xu, Erkki Oja

Abstract: Stochastic Neighbor Embedding (SNE) has shown to be quite promising for data visualization. Currently, the most popular implementation, t-SNE, is restricted to a particular Student t-distribution as its embedding distribution. Moreover, it uses a gradient descent algorithm that may require users to tune parameters such as the learning step size, momentum, etc., in ﬁnding its optimum. In this paper, we propose the Heavy-tailed Symmetric Stochastic Neighbor Embedding (HSSNE) method, which is a generalization of the t-SNE to accommodate various heavytailed embedding similarity functions. With this generalization, we are presented with two difﬁculties. The ﬁrst is how to select the best embedding similarity among all heavy-tailed functions and the second is how to optimize the objective function once the heavy-tailed function has been selected. Our contributions then are: (1) we point out that various heavy-tailed embedding similarities can be characterized by their negative score functions. Based on this ﬁnding, we present a parameterized subset of similarity functions for choosing the best tail-heaviness for HSSNE; (2) we present a ﬁxed-point optimization algorithm that can be applied to all heavy-tailed functions and does not require the user to set any parameters; and (3) we present two empirical studies, one for unsupervised visualization showing that our optimization algorithm runs as fast and as good as the best known t-SNE implementation and the other for semi-supervised visualization showing quantitative superiority using the homogeneity measure as well as qualitative advantage in cluster separation over t-SNE.

6 0.61473727 155 nips-2009-Modelling Relational Data using Bayesian Clustered Tensor Factorization

7 0.61398739 212 nips-2009-Semi-Supervised Learning in Gigantic Image Collections

8 0.61339092 211 nips-2009-Segmenting Scenes by Matching Image Composites

9 0.61135745 137 nips-2009-Learning transport operators for image manifolds

10 0.61118352 112 nips-2009-Human Rademacher Complexity

11 0.61006308 237 nips-2009-Subject independent EEG-based BCI decoding

12 0.6084736 3 nips-2009-AUC optimization and the two-sample problem

13 0.60809267 199 nips-2009-Ranking Measures and Loss Functions in Learning to Rank

14 0.60637265 145 nips-2009-Manifold Embeddings for Model-Based Reinforcement Learning under Partial Observability

15 0.60597605 169 nips-2009-Nonlinear Learning using Local Coordinate Coding

16 0.60518605 151 nips-2009-Measuring Invariances in Deep Networks

17 0.60512531 158 nips-2009-Multi-Label Prediction via Sparse Infinite CCA

18 0.60487723 113 nips-2009-Improving Existing Fault Recovery Policies

19 0.6041826 104 nips-2009-Group Sparse Coding

20 0.60397881 28 nips-2009-An Additive Latent Feature Model for Transparent Object Recognition