nips nips2009 nips2009-230 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Fen Xia, Tie-yan Liu, Hang Li
Abstract: This paper is concerned with the consistency analysis on listwise ranking methods. Among various ranking methods, the listwise methods have competitive performances on benchmark datasets and are regarded as one of the state-of-the-art approaches. Most listwise ranking methods manage to optimize ranking on the whole list (permutation) of objects, however, in practical applications such as information retrieval, correct ranking at the top k positions is much more important. This paper aims to analyze whether existing listwise ranking methods are statistically consistent in the top-k setting. For this purpose, we define a top-k ranking framework, where the true loss (and thus the risks) are defined on the basis of top-k subgroup of permutations. This framework can include the permutationlevel ranking framework proposed in previous work as a special case. Based on the new framework, we derive sufficient conditions for a listwise ranking method to be consistent with the top-k true loss, and show an effective way of modifying the surrogate loss functions in existing methods to satisfy these conditions. Experimental results show that after the modifications, the methods can work significantly better than their original versions. 1
Reference: text
sentIndex sentText sentNum sentScore
1 com Abstract This paper is concerned with the consistency analysis on listwise ranking methods. [sent-6, score-0.607]
2 Among various ranking methods, the listwise methods have competitive performances on benchmark datasets and are regarded as one of the state-of-the-art approaches. [sent-7, score-0.596]
3 Most listwise ranking methods manage to optimize ranking on the whole list (permutation) of objects, however, in practical applications such as information retrieval, correct ranking at the top k positions is much more important. [sent-8, score-1.447]
4 This paper aims to analyze whether existing listwise ranking methods are statistically consistent in the top-k setting. [sent-9, score-0.659]
5 For this purpose, we define a top-k ranking framework, where the true loss (and thus the risks) are defined on the basis of top-k subgroup of permutations. [sent-10, score-1.281]
6 This framework can include the permutationlevel ranking framework proposed in previous work as a special case. [sent-11, score-0.422]
7 Based on the new framework, we derive sufficient conditions for a listwise ranking method to be consistent with the top-k true loss, and show an effective way of modifying the surrogate loss functions in existing methods to satisfy these conditions. [sent-12, score-1.371]
8 Empirical results on benchmark datasets have demonstrated that the listwise ranking methods have very competitive ranking performances [10]. [sent-16, score-0.98]
9 To explain the high ranking performances of the listwise ranking methods, a theoretical framework was proposed in [16]. [sent-17, score-0.979]
10 In the framework, existing listwise ranking methods are interpreted as making use of different surrogate loss functions of the permutation-level 0-1 loss. [sent-18, score-1.234]
11 Theoretical analysis shows that these surrogate loss functions are all statistically consistent in the sense that minimization of the conditional expectation of them will lead to obtaining the Bayes ranker, i. [sent-19, score-0.761]
12 Here we point out that there is a gap between the analysis in [16] and many real ranking problems, where the correct ranking of the entire permutation is not needed. [sent-22, score-0.909]
13 For example, in IR, users usually care much more about the top ranking results and thus only correct ranking at the top positions is important. [sent-23, score-0.91]
14 In this new situation, it is no longer clear whether existing listwise ranking methods are still statistically consistent. [sent-24, score-0.598]
15 For this purpose, we propose a new ranking framework, in which the “true loss” is defined on the top-k subgroup of permutations instead of on the entire permutation. [sent-26, score-0.966]
16 The new true loss only measures errors occurring at the top k positions of a ranked list, therefore we refer to it as the top-k true loss (Note that when k equals the length of the ranked list, the top-k true loss will become exactly 1 the permutation-level 0-1 loss). [sent-27, score-1.362]
17 We prove a new theorem which gives sufficient conditions for a surrogate loss function to be consistent with the top-k true loss. [sent-28, score-0.797]
18 Our analysis shows that, as k decreases, to guarantee the consistency of a surrogate loss function, the requirement on the probability space becomes weaker while the requirement on the surrogate loss function itself becomes stronger. [sent-30, score-1.507]
19 As a result, a surrogate loss function that is consistent with the permutation-level 0-1 loss might not be consistent with the top-k true loss any more. [sent-31, score-1.441]
20 Therefore, the surrogate loss functions in existing listwise ranking methods, which have been proved to be consistent with the permutation-level 0-1 loss, are not theoretically guaranteed to have good performances in the top-k setting. [sent-32, score-1.355]
21 Modifications to these surrogate loss functions are needed to further make them consistent with the top-k true loss. [sent-33, score-0.782]
22 2 Permutation-level ranking framework We review the permutation-level ranking framework proposed in [16]. [sent-36, score-0.806]
23 The task of learning to rank is to learn a function that can minimize the expected risk R(h), defined as, R(h) = l(h(x), y)dP (x, y), (1) 1, 0, (2) X×Y where l(h(x), y) is the true loss such that l(h(x), y) = if h(x) = y if h(x) = y. [sent-40, score-0.443]
24 The above true loss indicates that if the permutation of the predicted result is exactly the same as the permutation in the ground truth, then the loss is zero; otherwise the loss is one. [sent-41, score-1.26]
25 The optimal ranking function which can minimize the expected true risk R(h∗ ) = inf R(h) is referred to as the permutation-level Bayes ranker. [sent-43, score-0.475]
26 Since the risk is non-continuous and non-differentiable with respect to the scoring function g, a continuous and differentiable surrogate loss function φ(g(x), y) is usually used as an approximation of the true loss. [sent-49, score-0.758]
27 It has been shown in [16] that many existing listwise ranking methods fall into the above framework, with different surrogate loss functions used. [sent-54, score-1.234]
28 Furthermore, their surrogate loss functions are statistically consistent under certain conditions with respect to the permutation-level 0-1 loss. [sent-55, score-0.809]
29 However, as shown in the next section, the permutation-level 0-1 loss is not suitable to describe the ranking problem in many real applications. [sent-56, score-0.713]
30 3 Top-k ranking framework We next describe the real ranking problem, and then propose the top-k ranking framework. [sent-57, score-1.188]
31 1 Top-k ranking problem In real ranking applications like IR, people pay more attention to the top-ranked objects. [sent-59, score-0.785]
32 Therefore the correct ranking on the top positions is critically important. [sent-60, score-0.478]
33 It means that two ranked lists of documents will likely provide the same experience to the users (and thus suffer the same loss), if they have the same ranking results for the top positions. [sent-63, score-0.509]
34 2 Top-k true loss To better describe the top-k ranking problem, we propose defining the true loss based on the top k positions in a ranked list, referred to as the top-k true loss. [sent-69, score-1.338]
35 When k equals the length of the entire ranked list, the top-k true loss will become exactly the permutation-level 0-1 loss. [sent-75, score-0.461]
36 (6) X×Y It can be proved that the optimal ranking function with respect to the top-k true loss (i. [sent-79, score-0.804]
37 , the top-k Bayes ranker) is any permutation in the top-k subgroup having the highest probability2 , i. [sent-81, score-0.656]
38 k} denotes a top-k subgroup in which all the permutations have the same top-k true loss; Gk denotes the collection of all top-k subgroups. [sent-95, score-0.665]
39 With the above setting, we will analyze the consistency of the surrogate loss functions in existing ranking methods with the top-k true loss in the next section. [sent-96, score-1.498]
40 4 Theoretical analysis In this section, we first give the sufficient conditions of consistency for the top-k ranking problem. [sent-97, score-0.468]
41 Last, we discuss whether the surrogate loss functions in existing methods are consistent, and how to make them consistent if not. [sent-99, score-0.749]
42 1 Statistical consistency We investigate what kinds of surrogate loss functions φ(g(x), y) are statistically consistent with the top-k true loss. [sent-101, score-0.875]
43 For this purpose, we study whether the ranking function that minimizes the conditional expectation of the surrogate loss function defined as follows coincides with the top-k Bayes ranker as defined in Eq. [sent-102, score-1.074]
44 com/ Note that the probability of a top-k subgroup is defined as the sum of the probabilities of the permutations in the subgroup (cf. [sent-107, score-1.133]
45 2 3 According to [1], the above condition is the weakest condition to guarantee that optimizing a surrogate loss function will lead to obtaining a model achieving the Bayes risk (in our case, the top-k Bayes ranker), when the training sample size approaches infinity. [sent-109, score-0.68]
46 Hence, Q(p, g) is the loss of g at x with respect to the conditional probability distribution py . [sent-111, score-0.407]
47 ΛGk is the a top-k subgroup probability space, such that ΛGk Gk (j1 ,j2 ,. [sent-117, score-0.551]
48 A top-k subgroup probability space ΛGk is order preserving with respect to objects −1 −1 −1 i and j, if ∀y ∈ Yi,j and Gk (y(1), y(2), . [sent-128, score-0.773]
49 A surrogate loss function φ is top-k subgroup order sensitive on a set Ω ⊂ Rn , if φ is a non-negative differentiable function and the following three conditions hold for ∀ objects i and −1 −1 j: (1) φ(g, y) = φ(σi,j g, σi,j y); (2)Assume gi < gj , ∀y ∈ Yi,j . [sent-143, score-1.496]
50 (iii) There exists a permutation, for which the speed of change in loss with respect to the score of an object will become faster if exchanging its position with another object with the same score but ranked lower. [sent-162, score-0.563]
51 A top-k subgroup order sensitive surrogate loss function has several nice properties as shown below. [sent-163, score-1.323]
52 Let φ(g, y) be a top-k subgroup order sensitive loss function. [sent-165, score-0.977]
53 Let φ(g, y) be a top-k subgroup order sensitive surrogate loss function. [sent-171, score-1.307]
54 Proposition 4 shows that all permutations in the same top-k subgroup share the same loss φ(g, y) and thus share the same partial difference with respect to the score of a given object. [sent-180, score-0.948]
55 Based on the above definitions and propositions, we give the main theorem (Theorem 6), which states the sufficient conditions for a surrogate loss function to be consistent with the top-k true loss. [sent-183, score-0.797]
56 Let φ be a top-k subgroup order sensitive loss function on Ω ⊂ Rn . [sent-185, score-0.977]
57 For ∀n objects, if its top-k subgroup probability space is order preserving with respect to n − 1 object pairs {(ji , ji+1 )}k and {(jk+si , jk+i : 0 ≤ si < i)}n−k , then the loss φ(g, y) is consistent with the i=1 i=2 top-k true loss as defined in Eq. [sent-186, score-1.479]
58 Let φ(g, y) be a top-k subgroup order sensitive loss function. [sent-192, score-0.977]
59 ∀i and j, if the topk subgroup probability space is order preserving with respect to them, and g is a vector which minimizes Q(p, g) in Eq. [sent-193, score-0.715]
60 Without loss of generality, we assume i = 1, j = 2, g1 = g2 , g2 = g1 , and gk = gk (k > 2). [sent-196, score-0.784]
61 After some algebra, by using Proposition 4, we have, −1 (pGk (σ−1 y) − pGk (y) )(φ(g, y) − φ(g, σ1,2 y)), Q(p, g ) − Q(p, g) = 1,2 −1 Gk (y)∈{Gk :Gk (y)=Gk (σ1,2 y)}:y∈Y1,2 where Gk (y) denotes the subgroup that y belongs to. [sent-200, score-0.547]
62 Meanwhile, pGk (σ−1 y) < pGk (y) due to the order 1,2 preserving of the top-k subgroup probability space. [sent-202, score-0.673]
63 Meanwhile, pGk (σ−1 y) < pGk (y) due to ∂g1 ∂g1 1,2 the order preserving of the top-k subgroup probability space. [sent-211, score-0.673]
64 First, we have the following proposition for the top-k subgroup probability space. [sent-215, score-0.639]
65 If the top-k subgroup probability space is order preserving with respect to object i and j, the top-(k − 1) subgroup probability space is also order preserving with respect to i and j. [sent-217, score-1.456]
66 The proposition can be proved by decomposing a top-(k − 1) subgroup into the sum of top-k subgroups. [sent-218, score-0.65]
67 If the top-2 subgroup probability space is order preserving with respect to objects 1 and 2, then we have pG2 (1,2) > pG2 (2,1) , pG2 (1,3) > pG2 (2,3) and pG2 (3,1) > pG2 (3,2) . [sent-222, score-0.773]
68 Second, we obtain the following proposition for the surrogate loss function φ. [sent-226, score-0.73]
69 If the surrogate loss function φ is top-k subgroup order sensitive on a set Ω ⊂ Rn , then it is also top-(k + 1) subgroup order sensitive on the same set. [sent-230, score-1.972]
70 If φ is top-1 subgroup order sensitive, then we have φ(g, (1, 2, 3)) ≥ φ(g, (2, 1, 3)), φ(g, (1, 3, 2)) ≥ φ(g, (2, 3, 1)), and φ(g, (3, 1, 2)) = φ(g, (3, 2, 1)). [sent-234, score-0.56]
71 On the other hand, if φ is top-2 subgroup order sensitive, the following inequalities hold with at least one of them being strict: φ(g, (1, 2, 3)) ≥ φ(g, (2, 1, 3)), φ(g, (1, 3, 2)) ≥ φ(g, (2, 3, 1)), and φ(g, (3, 1, 2)) ≥ φ(g, (3, 2, 1)). [sent-236, score-0.56]
72 Therefore top-1 subgroup order sensitive is a special case of top-2 subgroup order sensitive. [sent-237, score-1.225]
73 • For the consistency with the top-k true loss, when k becomes smaller, the requirement on the probability space becomes weaker but the requirement on the surrogate loss function becomes stronger. [sent-239, score-0.934]
74 Since we never know the real property of the (unknown) probability space, it is more likely the requirement on the probability space for the consistency with the top-k true loss can be satisfied than that for the top-l (l > k) true loss. [sent-240, score-0.598]
75 • If we fix the true loss to be top-k and the probability space to be top-k subgroup order preserving, the surrogate loss function should be at most top-l (l ≤ k) subgroup order sensitive in order to meet the consistency conditions. [sent-242, score-2.357]
76 It is not guaranteed that a top-l (l > k) subgroup order sensitive surrogate loss function can be consistent with the top-k true loss. [sent-243, score-1.421]
77 For example, a top-1 subgroup order sensitive surrogate loss function may be consistent with any top-k true loss, but a permutation-level order sensitive surrogate loss function may not be consistent with any top-k true loss, if k is smaller than the length of the list. [sent-244, score-2.31]
78 It basically says that given a probability space that is top-1 subgroup order preserving, a top-3 subgroup order sensitive surrogate loss function may not be consistent with the top-1 true loss. [sent-246, score-2.017]
79 φ is a top-3 subgroup order sensitive loss function and the strict inequality φ(g, (3, 1, 2)) < φ(g, (3, 2, 1)) holds when g1 > g2 . [sent-249, score-1.0]
80 The above discussions imply that although the surrogate loss functions in existing listwise ranking methods are consistent with the permutation-level 0-1 loss (under a rigid condition), they may not be consistent with the top-k true loss (under a mild condition). [sent-252, score-2.052]
81 Therefore, it is necessary to modify these surrogate loss functions. [sent-253, score-0.666]
82 3 Consistent surrogate loss functions In [16], the surrogate loss functions in ListNet, RankCosine, and ListMLE have been proved to be permutation-level order sensitive. [sent-256, score-1.394]
83 According to the discussion in the previous subsection, however, they may not be top-k subgroup order sensitive, and therefore not consistent with the top-k true loss. [sent-257, score-0.674]
84 Even for the consistency with the permutation-level 0-1 loss, in order to guarantee these surrogate loss functions to be consistent, the requirement on the probability space may be too strong in some real scenarios. [sent-258, score-0.857]
85 To tackle the challenge, it is desirable to modify these surrogate loss functions to make them top-k subgroup order sensitive. [sent-259, score-1.252]
86 Actually this is doable, and the modifications to the aforementioned surrogate loss functions are given as follows. [sent-260, score-0.668]
87 1 Likelihood loss The likelihood loss is the loss function used in ListMLE [16], which is defined as below, n φ(g(x), y) = − log P (y|x; g), where P (y|x; g) = i=1 6 exp(g(xy(i) )) . [sent-263, score-0.936]
88 exp(g(xy(t) )) n t=i (9) We propose replacing the permutation probability with the top-k subgroup probability (which is also defined with the Luce model [11]) in the above definition: k P (y|x; g) = i=1 exp(g(xy(i) )) . [sent-264, score-0.694]
89 exp(g(xy(t) )) n t=i (10) It can be proved that the modified loss is top-k subgroup order sensitive (see [15]). [sent-265, score-1.007]
90 Let the mapping function retain the order for the top k positions of the ground truth permutation and assigns to all the remaining positions a small value (which is smaller than the score of any object ranked at the top-k positions), i. [sent-272, score-0.512]
91 It can be proved that after the modification, the cosine loss becomes top-k subgroup order sensitive (see [15]). [sent-275, score-1.054]
92 We propose using a mapping function to modify the cross entropy loss in a similar way as in the case of the cosine loss4 It can be proved that such a modification can make the surrogate loss function top-k subgroup order sensitive (see [15]). [sent-279, score-1.768]
93 5 It is obvious that these measures are top-k related and are suitable to evaluate the ranking performance in top-k ranking problems. [sent-283, score-0.768]
94 However, it can be verified that the so-defined top-k cross entropy loss is still permutation-level order sensitive, but not top-k subgroup order sensitive. [sent-295, score-0.949]
95 269 Table 3: Ranking accuracies on TD2004 Table 4: Ranking accuracies on OHSUMED From the tables, we can see that with the modifications the ranking accuracies of ListMLE can be significantly boosted, in terms of all measures, on both TD2003 and TD2004. [sent-393, score-0.534]
96 6 Conclusion In this paper we have proposed a top-k ranking framework, which can better describe real ranking applications like information retrieval. [sent-453, score-0.785]
97 In the framework, the true loss is defined on the top-k subgroup of permutations. [sent-454, score-0.897]
98 We have derived the sufficient conditions for a surrogate loss function to be statistically consistent with the top-k true loss. [sent-455, score-0.811]
99 We have also discussed how to modify the loss functions in existing listwise ranking methods to make them consistent with the top-k true loss. [sent-456, score-1.042]
100 (2) We will also study the consistency of the pointwise and pairwise loss functions with the top-k true loss. [sent-460, score-0.485]
wordName wordTfidf (topN-words)
[('subgroup', 0.532), ('ranking', 0.384), ('surrogate', 0.33), ('loss', 0.312), ('listmle', 0.264), ('gk', 0.236), ('pgk', 0.18), ('listwise', 0.162), ('permutation', 0.124), ('sensitive', 0.105), ('rankcosine', 0.096), ('preserving', 0.094), ('proposition', 0.088), ('xy', 0.083), ('listnet', 0.079), ('ranked', 0.077), ('gi', 0.075), ('positions', 0.072), ('ohsumed', 0.067), ('consistency', 0.061), ('consistent', 0.061), ('objects', 0.058), ('true', 0.053), ('py', 0.051), ('accuracies', 0.05), ('permutations', 0.05), ('rankboost', 0.048), ('ranker', 0.048), ('requirement', 0.047), ('modi', 0.045), ('cations', 0.043), ('rank', 0.04), ('exchanging', 0.039), ('list', 0.039), ('risk', 0.038), ('jk', 0.037), ('luce', 0.036), ('ir', 0.035), ('gj', 0.033), ('liu', 0.032), ('xia', 0.032), ('letor', 0.032), ('statistically', 0.032), ('cosine', 0.031), ('qin', 0.03), ('proved', 0.03), ('performances', 0.03), ('score', 0.029), ('order', 0.028), ('bayes', 0.028), ('tables', 0.028), ('cossock', 0.026), ('object', 0.026), ('functions', 0.026), ('users', 0.026), ('respect', 0.025), ('entropy', 0.025), ('cross', 0.024), ('truth', 0.024), ('modify', 0.024), ('strict', 0.023), ('ground', 0.023), ('conditions', 0.023), ('click', 0.022), ('top', 0.022), ('meanwhile', 0.021), ('benchmark', 0.02), ('boosted', 0.02), ('existing', 0.02), ('tsai', 0.019), ('nition', 0.019), ('equals', 0.019), ('discussions', 0.019), ('engine', 0.019), ('burges', 0.019), ('validates', 0.019), ('framework', 0.019), ('probability', 0.019), ('lk', 0.018), ('propositions', 0.018), ('asia', 0.018), ('dp', 0.018), ('theorem', 0.018), ('space', 0.017), ('real', 0.017), ('microsoft', 0.017), ('ratings', 0.017), ('pointwise', 0.017), ('pairwise', 0.016), ('svm', 0.016), ('becomes', 0.016), ('nice', 0.016), ('sigir', 0.015), ('sorting', 0.015), ('ji', 0.015), ('denotes', 0.015), ('purpose', 0.015), ('mapping', 0.015), ('ranks', 0.015), ('ease', 0.015)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 230 nips-2009-Statistical Consistency of Top-k Ranking
Author: Fen Xia, Tie-yan Liu, Hang Li
Abstract: This paper is concerned with the consistency analysis on listwise ranking methods. Among various ranking methods, the listwise methods have competitive performances on benchmark datasets and are regarded as one of the state-of-the-art approaches. Most listwise ranking methods manage to optimize ranking on the whole list (permutation) of objects, however, in practical applications such as information retrieval, correct ranking at the top k positions is much more important. This paper aims to analyze whether existing listwise ranking methods are statistically consistent in the top-k setting. For this purpose, we define a top-k ranking framework, where the true loss (and thus the risks) are defined on the basis of top-k subgroup of permutations. This framework can include the permutationlevel ranking framework proposed in previous work as a special case. Based on the new framework, we derive sufficient conditions for a listwise ranking method to be consistent with the top-k true loss, and show an effective way of modifying the surrogate loss functions in existing methods to satisfy these conditions. Experimental results show that after the modifications, the methods can work significantly better than their original versions. 1
2 0.40826267 199 nips-2009-Ranking Measures and Loss Functions in Learning to Rank
Author: Wei Chen, Tie-yan Liu, Yanyan Lan, Zhi-ming Ma, Hang Li
Abstract: Learning to rank has become an important research topic in machine learning. While most learning-to-rank methods learn the ranking functions by minimizing loss functions, it is the ranking measures (such as NDCG and MAP) that are used to evaluate the performance of the learned ranking functions. In this work, we reveal the relationship between ranking measures and loss functions in learningto-rank methods, such as Ranking SVM, RankBoost, RankNet, and ListMLE. We show that the loss functions of these methods are upper bounds of the measurebased ranking errors. As a result, the minimization of these loss functions will lead to the maximization of the ranking measures. The key to obtaining this result is to model ranking as a sequence of classification tasks, and define a so-called essential loss for ranking as the weighted sum of the classification errors of individual tasks in the sequence. We have proved that the essential loss is both an upper bound of the measure-based ranking errors, and a lower bound of the loss functions in the aforementioned methods. Our proof technique also suggests a way to modify existing loss functions to make them tighter bounds of the measure-based ranking errors. Experimental results on benchmark datasets show that the modifications can lead to better ranking performances, demonstrating the correctness of our theoretical analysis. 1
3 0.21557038 136 nips-2009-Learning to Rank by Optimizing NDCG Measure
Author: Hamed Valizadegan, Rong Jin, Ruofei Zhang, Jianchang Mao
Abstract: Learning to rank is a relatively new field of study, aiming to learn a ranking function from a set of training data with relevancy labels. The ranking algorithms are often evaluated using information retrieval measures, such as Normalized Discounted Cumulative Gain (NDCG) [1] and Mean Average Precision (MAP) [2]. Until recently, most learning to rank algorithms were not using a loss function related to the above mentioned evaluation measures. The main difficulty in direct optimization of these measures is that they depend on the ranks of documents, not the numerical values output by the ranking function. We propose a probabilistic framework that addresses this challenge by optimizing the expectation of NDCG over all the possible permutations of documents. A relaxation strategy is used to approximate the average of NDCG over the space of permutation, and a bound optimization approach is proposed to make the computation efficient. Extensive experiments show that the proposed algorithm outperforms state-of-the-art ranking algorithms on several benchmark data sets. 1
4 0.16397619 87 nips-2009-Exponential Family Graph Matching and Ranking
Author: James Petterson, Jin Yu, Julian J. Mcauley, Tibério S. Caetano
Abstract: We present a method for learning max-weight matching predictors in bipartite graphs. The method consists of performing maximum a posteriori estimation in exponential families with sufficient statistics that encode permutations and data features. Although inference is in general hard, we show that for one very relevant application–document ranking–exact inference is efficient. For general model instances, an appropriate sampler is readily available. Contrary to existing max-margin matching models, our approach is statistically consistent and, in addition, experiments with increasing sample sizes indicate superior improvement over such models. We apply the method to graph matching in computer vision as well as to a standard benchmark dataset for learning document ranking, in which we obtain state-of-the-art results, in particular improving on max-margin variants. The drawback of this method with respect to max-margin alternatives is its runtime for large graphs, which is comparatively high. 1
5 0.13614686 190 nips-2009-Polynomial Semantic Indexing
Author: Bing Bai, Jason Weston, David Grangier, Ronan Collobert, Kunihiko Sadamasa, Yanjun Qi, Corinna Cortes, Mehryar Mohri
Abstract: We present a class of nonlinear (polynomial) models that are discriminatively trained to directly map from the word content in a query-document or documentdocument pair to a ranking score. Dealing with polynomial models on word features is computationally challenging. We propose a low-rank (but diagonal preserving) representation of our polynomial models to induce feasible memory and computation requirements. We provide an empirical study on retrieval tasks based on Wikipedia documents, where we obtain state-of-the-art performance while providing realistically scalable methods. 1
6 0.098167583 78 nips-2009-Efficient Moments-based Permutation Tests
7 0.081047468 3 nips-2009-AUC optimization and the two-sample problem
8 0.066441216 98 nips-2009-From PAC-Bayes Bounds to KL Regularization
9 0.065639399 207 nips-2009-Robust Nonparametric Regression with Metric-Space Valued Output
10 0.065104693 261 nips-2009-fMRI-Based Inter-Subject Cortical Alignment Using Functional Connectivity
11 0.063504249 176 nips-2009-On Invariance in Hierarchical Models
12 0.06125718 20 nips-2009-A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers
13 0.059890412 71 nips-2009-Distribution-Calibrated Hierarchical Classification
14 0.053877342 7 nips-2009-A Data-Driven Approach to Modeling Choice
15 0.050002508 102 nips-2009-Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models
16 0.047587179 181 nips-2009-Online Learning of Assignments
17 0.044786215 161 nips-2009-Nash Equilibria of Static Prediction Games
18 0.042576376 37 nips-2009-Asymptotically Optimal Regularization in Smooth Parametric Models
19 0.040885445 14 nips-2009-A Parameter-free Hedging Algorithm
20 0.038524199 11 nips-2009-A General Projection Property for Distribution Families
topicId topicWeight
[(0, -0.133), (1, 0.085), (2, -0.061), (3, -0.039), (4, 0.014), (5, -0.067), (6, -0.392), (7, -0.218), (8, 0.091), (9, -0.256), (10, 0.15), (11, 0.07), (12, -0.139), (13, -0.039), (14, -0.017), (15, 0.024), (16, -0.136), (17, 0.046), (18, -0.024), (19, 0.034), (20, -0.102), (21, -0.007), (22, -0.049), (23, -0.048), (24, -0.003), (25, 0.005), (26, -0.015), (27, 0.061), (28, -0.036), (29, 0.014), (30, 0.065), (31, 0.027), (32, -0.002), (33, -0.029), (34, 0.027), (35, 0.012), (36, -0.005), (37, -0.009), (38, -0.012), (39, -0.01), (40, 0.058), (41, -0.033), (42, 0.062), (43, 0.05), (44, -0.047), (45, -0.002), (46, 0.046), (47, -0.089), (48, -0.01), (49, -0.022)]
simIndex simValue paperId paperTitle
same-paper 1 0.98061943 230 nips-2009-Statistical Consistency of Top-k Ranking
Author: Fen Xia, Tie-yan Liu, Hang Li
Abstract: This paper is concerned with the consistency analysis on listwise ranking methods. Among various ranking methods, the listwise methods have competitive performances on benchmark datasets and are regarded as one of the state-of-the-art approaches. Most listwise ranking methods manage to optimize ranking on the whole list (permutation) of objects, however, in practical applications such as information retrieval, correct ranking at the top k positions is much more important. This paper aims to analyze whether existing listwise ranking methods are statistically consistent in the top-k setting. For this purpose, we define a top-k ranking framework, where the true loss (and thus the risks) are defined on the basis of top-k subgroup of permutations. This framework can include the permutationlevel ranking framework proposed in previous work as a special case. Based on the new framework, we derive sufficient conditions for a listwise ranking method to be consistent with the top-k true loss, and show an effective way of modifying the surrogate loss functions in existing methods to satisfy these conditions. Experimental results show that after the modifications, the methods can work significantly better than their original versions. 1
2 0.93852681 199 nips-2009-Ranking Measures and Loss Functions in Learning to Rank
Author: Wei Chen, Tie-yan Liu, Yanyan Lan, Zhi-ming Ma, Hang Li
Abstract: Learning to rank has become an important research topic in machine learning. While most learning-to-rank methods learn the ranking functions by minimizing loss functions, it is the ranking measures (such as NDCG and MAP) that are used to evaluate the performance of the learned ranking functions. In this work, we reveal the relationship between ranking measures and loss functions in learningto-rank methods, such as Ranking SVM, RankBoost, RankNet, and ListMLE. We show that the loss functions of these methods are upper bounds of the measurebased ranking errors. As a result, the minimization of these loss functions will lead to the maximization of the ranking measures. The key to obtaining this result is to model ranking as a sequence of classification tasks, and define a so-called essential loss for ranking as the weighted sum of the classification errors of individual tasks in the sequence. We have proved that the essential loss is both an upper bound of the measure-based ranking errors, and a lower bound of the loss functions in the aforementioned methods. Our proof technique also suggests a way to modify existing loss functions to make them tighter bounds of the measure-based ranking errors. Experimental results on benchmark datasets show that the modifications can lead to better ranking performances, demonstrating the correctness of our theoretical analysis. 1
3 0.81724268 136 nips-2009-Learning to Rank by Optimizing NDCG Measure
Author: Hamed Valizadegan, Rong Jin, Ruofei Zhang, Jianchang Mao
Abstract: Learning to rank is a relatively new field of study, aiming to learn a ranking function from a set of training data with relevancy labels. The ranking algorithms are often evaluated using information retrieval measures, such as Normalized Discounted Cumulative Gain (NDCG) [1] and Mean Average Precision (MAP) [2]. Until recently, most learning to rank algorithms were not using a loss function related to the above mentioned evaluation measures. The main difficulty in direct optimization of these measures is that they depend on the ranks of documents, not the numerical values output by the ranking function. We propose a probabilistic framework that addresses this challenge by optimizing the expectation of NDCG over all the possible permutations of documents. A relaxation strategy is used to approximate the average of NDCG over the space of permutation, and a bound optimization approach is proposed to make the computation efficient. Extensive experiments show that the proposed algorithm outperforms state-of-the-art ranking algorithms on several benchmark data sets. 1
4 0.62999952 87 nips-2009-Exponential Family Graph Matching and Ranking
Author: James Petterson, Jin Yu, Julian J. Mcauley, Tibério S. Caetano
Abstract: We present a method for learning max-weight matching predictors in bipartite graphs. The method consists of performing maximum a posteriori estimation in exponential families with sufficient statistics that encode permutations and data features. Although inference is in general hard, we show that for one very relevant application–document ranking–exact inference is efficient. For general model instances, an appropriate sampler is readily available. Contrary to existing max-margin matching models, our approach is statistically consistent and, in addition, experiments with increasing sample sizes indicate superior improvement over such models. We apply the method to graph matching in computer vision as well as to a standard benchmark dataset for learning document ranking, in which we obtain state-of-the-art results, in particular improving on max-margin variants. The drawback of this method with respect to max-margin alternatives is its runtime for large graphs, which is comparatively high. 1
5 0.54167193 206 nips-2009-Riffled Independence for Ranked Data
Author: Jonathan Huang, Carlos Guestrin
Abstract: Representing distributions over permutations can be a daunting task due to the fact that the number of permutations of n objects scales factorially in n. One recent way that has been used to reduce storage complexity has been to exploit probabilistic independence, but as we argue, full independence assumptions impose strong sparsity constraints on distributions and are unsuitable for modeling rankings. We identify a novel class of independence structures, called riffled independence, which encompasses a more expressive family of distributions while retaining many of the properties necessary for performing efficient inference and reducing sample complexity. In riffled independence, one draws two permutations independently, then performs the riffle shuffle, common in card games, to combine the two permutations to form a single permutation. In ranking, riffled independence corresponds to ranking disjoint sets of objects independently, then interleaving those rankings. We provide a formal introduction and present algorithms for using riffled independence within Fourier-theoretic frameworks which have been explored by a number of recent papers. 1
6 0.48980978 78 nips-2009-Efficient Moments-based Permutation Tests
7 0.46455371 190 nips-2009-Polynomial Semantic Indexing
8 0.43445104 7 nips-2009-A Data-Driven Approach to Modeling Choice
9 0.4008747 3 nips-2009-AUC optimization and the two-sample problem
10 0.33043122 244 nips-2009-The Wisdom of Crowds in the Recollection of Order Information
11 0.26469392 98 nips-2009-From PAC-Bayes Bounds to KL Regularization
12 0.2604956 233 nips-2009-Streaming Pointwise Mutual Information
13 0.25282606 161 nips-2009-Nash Equilibria of Static Prediction Games
14 0.23734139 207 nips-2009-Robust Nonparametric Regression with Metric-Space Valued Output
15 0.23126599 94 nips-2009-Fast Learning from Non-i.i.d. Observations
16 0.23075207 37 nips-2009-Asymptotically Optimal Regularization in Smooth Parametric Models
17 0.22003706 71 nips-2009-Distribution-Calibrated Hierarchical Classification
18 0.21108934 14 nips-2009-A Parameter-free Hedging Algorithm
19 0.21059638 59 nips-2009-Construction of Nonparametric Bayesian Models from Parametric Bayes Equations
20 0.20962456 176 nips-2009-On Invariance in Hierarchical Models
topicId topicWeight
[(1, 0.022), (12, 0.246), (24, 0.046), (25, 0.089), (35, 0.06), (36, 0.067), (39, 0.05), (58, 0.072), (61, 0.016), (71, 0.057), (77, 0.034), (81, 0.01), (86, 0.114), (91, 0.017)]
simIndex simValue paperId paperTitle
same-paper 1 0.81148386 230 nips-2009-Statistical Consistency of Top-k Ranking
Author: Fen Xia, Tie-yan Liu, Hang Li
Abstract: This paper is concerned with the consistency analysis on listwise ranking methods. Among various ranking methods, the listwise methods have competitive performances on benchmark datasets and are regarded as one of the state-of-the-art approaches. Most listwise ranking methods manage to optimize ranking on the whole list (permutation) of objects, however, in practical applications such as information retrieval, correct ranking at the top k positions is much more important. This paper aims to analyze whether existing listwise ranking methods are statistically consistent in the top-k setting. For this purpose, we define a top-k ranking framework, where the true loss (and thus the risks) are defined on the basis of top-k subgroup of permutations. This framework can include the permutationlevel ranking framework proposed in previous work as a special case. Based on the new framework, we derive sufficient conditions for a listwise ranking method to be consistent with the top-k true loss, and show an effective way of modifying the surrogate loss functions in existing methods to satisfy these conditions. Experimental results show that after the modifications, the methods can work significantly better than their original versions. 1
2 0.76155514 2 nips-2009-3D Object Recognition with Deep Belief Nets
Author: Vinod Nair, Geoffrey E. Hinton
Abstract: We introduce a new type of top-level model for Deep Belief Nets and evaluate it on a 3D object recognition task. The top-level model is a third-order Boltzmann machine, trained using a hybrid algorithm that combines both generative and discriminative gradients. Performance is evaluated on the NORB database (normalized-uniform version), which contains stereo-pair images of objects under different lighting conditions and viewpoints. Our model achieves 6.5% error on the test set, which is close to the best published result for NORB (5.9%) using a convolutional neural net that has built-in knowledge of translation invariance. It substantially outperforms shallow models such as SVMs (11.6%). DBNs are especially suited for semi-supervised learning, and to demonstrate this we consider a modified version of the NORB recognition task in which additional unlabeled images are created by applying small translations to the images in the database. With the extra unlabeled data (and the same amount of labeled data as before), our model achieves 5.2% error. 1
3 0.6437785 132 nips-2009-Learning in Markov Random Fields using Tempered Transitions
Author: Ruslan Salakhutdinov
Abstract: Markov random fields (MRF’s), or undirected graphical models, provide a powerful framework for modeling complex dependencies among random variables. Maximum likelihood learning in MRF’s is hard due to the presence of the global normalizing constant. In this paper we consider a class of stochastic approximation algorithms of the Robbins-Monro type that use Markov chain Monte Carlo to do approximate maximum likelihood learning. We show that using MCMC operators based on tempered transitions enables the stochastic approximation algorithm to better explore highly multimodal distributions, which considerably improves parameter estimates in large, densely-connected MRF’s. Our results on MNIST and NORB datasets demonstrate that we can successfully learn good generative models of high-dimensional, richly structured data that perform well on digit and object recognition tasks.
4 0.6347304 136 nips-2009-Learning to Rank by Optimizing NDCG Measure
Author: Hamed Valizadegan, Rong Jin, Ruofei Zhang, Jianchang Mao
Abstract: Learning to rank is a relatively new field of study, aiming to learn a ranking function from a set of training data with relevancy labels. The ranking algorithms are often evaluated using information retrieval measures, such as Normalized Discounted Cumulative Gain (NDCG) [1] and Mean Average Precision (MAP) [2]. Until recently, most learning to rank algorithms were not using a loss function related to the above mentioned evaluation measures. The main difficulty in direct optimization of these measures is that they depend on the ranks of documents, not the numerical values output by the ranking function. We propose a probabilistic framework that addresses this challenge by optimizing the expectation of NDCG over all the possible permutations of documents. A relaxation strategy is used to approximate the average of NDCG over the space of permutation, and a bound optimization approach is proposed to make the computation efficient. Extensive experiments show that the proposed algorithm outperforms state-of-the-art ranking algorithms on several benchmark data sets. 1
5 0.62074047 106 nips-2009-Heavy-Tailed Symmetric Stochastic Neighbor Embedding
Author: Zhirong Yang, Irwin King, Zenglin Xu, Erkki Oja
Abstract: Stochastic Neighbor Embedding (SNE) has shown to be quite promising for data visualization. Currently, the most popular implementation, t-SNE, is restricted to a particular Student t-distribution as its embedding distribution. Moreover, it uses a gradient descent algorithm that may require users to tune parameters such as the learning step size, momentum, etc., in finding its optimum. In this paper, we propose the Heavy-tailed Symmetric Stochastic Neighbor Embedding (HSSNE) method, which is a generalization of the t-SNE to accommodate various heavytailed embedding similarity functions. With this generalization, we are presented with two difficulties. The first is how to select the best embedding similarity among all heavy-tailed functions and the second is how to optimize the objective function once the heavy-tailed function has been selected. Our contributions then are: (1) we point out that various heavy-tailed embedding similarities can be characterized by their negative score functions. Based on this finding, we present a parameterized subset of similarity functions for choosing the best tail-heaviness for HSSNE; (2) we present a fixed-point optimization algorithm that can be applied to all heavy-tailed functions and does not require the user to set any parameters; and (3) we present two empirical studies, one for unsupervised visualization showing that our optimization algorithm runs as fast and as good as the best known t-SNE implementation and the other for semi-supervised visualization showing quantitative superiority using the homogeneity measure as well as qualitative advantage in cluster separation over t-SNE.
6 0.61473727 155 nips-2009-Modelling Relational Data using Bayesian Clustered Tensor Factorization
7 0.61398739 212 nips-2009-Semi-Supervised Learning in Gigantic Image Collections
8 0.61339092 211 nips-2009-Segmenting Scenes by Matching Image Composites
9 0.61135745 137 nips-2009-Learning transport operators for image manifolds
10 0.61118352 112 nips-2009-Human Rademacher Complexity
11 0.61006308 237 nips-2009-Subject independent EEG-based BCI decoding
12 0.6084736 3 nips-2009-AUC optimization and the two-sample problem
13 0.60809267 199 nips-2009-Ranking Measures and Loss Functions in Learning to Rank
14 0.60637265 145 nips-2009-Manifold Embeddings for Model-Based Reinforcement Learning under Partial Observability
15 0.60597605 169 nips-2009-Nonlinear Learning using Local Coordinate Coding
16 0.60518605 151 nips-2009-Measuring Invariances in Deep Networks
17 0.60512531 158 nips-2009-Multi-Label Prediction via Sparse Infinite CCA
18 0.60487723 113 nips-2009-Improving Existing Fault Recovery Policies
19 0.6041826 104 nips-2009-Group Sparse Coding
20 0.60397881 28 nips-2009-An Additive Latent Feature Model for Transparent Object Recognition