nips nips2009 nips2009-199 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Wei Chen, Tie-yan Liu, Yanyan Lan, Zhi-ming Ma, Hang Li
Abstract: Learning to rank has become an important research topic in machine learning. While most learning-to-rank methods learn the ranking functions by minimizing loss functions, it is the ranking measures (such as NDCG and MAP) that are used to evaluate the performance of the learned ranking functions. In this work, we reveal the relationship between ranking measures and loss functions in learningto-rank methods, such as Ranking SVM, RankBoost, RankNet, and ListMLE. We show that the loss functions of these methods are upper bounds of the measurebased ranking errors. As a result, the minimization of these loss functions will lead to the maximization of the ranking measures. The key to obtaining this result is to model ranking as a sequence of classification tasks, and define a so-called essential loss for ranking as the weighted sum of the classification errors of individual tasks in the sequence. We have proved that the essential loss is both an upper bound of the measure-based ranking errors, and a lower bound of the loss functions in the aforementioned methods. Our proof technique also suggests a way to modify existing loss functions to make them tighter bounds of the measure-based ranking errors. Experimental results on benchmark datasets show that the modifications can lead to better ranking performances, demonstrating the correctness of our theoretical analysis. 1
Reference: text
sentIndex sentText sentNum sentScore
1 While most learning-to-rank methods learn the ranking functions by minimizing loss functions, it is the ranking measures (such as NDCG and MAP) that are used to evaluate the performance of the learned ranking functions. [sent-10, score-1.735]
2 In this work, we reveal the relationship between ranking measures and loss functions in learningto-rank methods, such as Ranking SVM, RankBoost, RankNet, and ListMLE. [sent-11, score-0.837]
3 We show that the loss functions of these methods are upper bounds of the measurebased ranking errors. [sent-12, score-0.829]
4 As a result, the minimization of these loss functions will lead to the maximization of the ranking measures. [sent-13, score-0.723]
5 The key to obtaining this result is to model ranking as a sequence of classification tasks, and define a so-called essential loss for ranking as the weighted sum of the classification errors of individual tasks in the sequence. [sent-14, score-1.41]
6 We have proved that the essential loss is both an upper bound of the measure-based ranking errors, and a lower bound of the loss functions in the aforementioned methods. [sent-15, score-1.272]
7 Our proof technique also suggests a way to modify existing loss functions to make them tighter bounds of the measure-based ranking errors. [sent-16, score-0.824]
8 Experimental results on benchmark datasets show that the modifications can lead to better ranking performances, demonstrating the correctness of our theoretical analysis. [sent-17, score-0.5]
9 Then a ranking function is constructed by minimizing a certain loss function on the training data. [sent-23, score-0.701]
10 In testing, given a new set of objects, the ranking function is applied to produce a ranked list of the objects. [sent-24, score-0.616]
11 1 ListMLE [16], takes the entire ranked list of objects as the learning instance. [sent-31, score-0.248]
12 Almost all these methods learn their ranking functions by minimizing certain loss functions, namely the pointwise, pairwise, and listwise losses. [sent-32, score-0.897]
13 On the other hand, however, it is the ranking measures that are used to evaluate the performance of the learned ranking functions. [sent-33, score-0.992]
14 Taking information retrieval as an example, measures such as Normalized Discounted Cumulative Gain (NDCG) [8] and Mean Average Precision (MAP) [1] are widely used, which obviously differ from the loss functions used in the aforementioned methods. [sent-34, score-0.373]
15 In such a situation, a natural question to ask is whether the minimization of the loss functions can really lead to the optimization of the ranking measures. [sent-35, score-0.723]
16 It has been proved in [5] and [10] that the regression and classification based losses used in the pointwise approach are upper bounds of (1−NDCG). [sent-37, score-0.431]
17 However, for the pairwise and listwise approaches, which are regarded as the state-of-the-art of learning to rank [3, 11], limited results have been obtained. [sent-38, score-0.333]
18 The motivation of this work is to reveal the relationship between ranking measures and the pairwise/listwise losses. [sent-39, score-0.58]
19 Note that ranking measures like NDCG and MAP are defined with the labels of objects (i. [sent-41, score-0.663]
20 Therefore it is relatively easy to establish the connection between the pointwise losses and the ranking measures, since the pointwise losses are also defined with the labels of objects. [sent-44, score-1.056]
21 In contrast, the pairwise and listwise losses are defined with the partial or total order relations among objects, rather than their individual labels. [sent-45, score-0.421]
22 As a result, it is much more difficult to bridge the gap between the pairwise/listwise losses and the ranking measures. [sent-46, score-0.661]
23 To tackle the challenge, we propose making a transformation of the labels on objects to a permutation set. [sent-47, score-0.231]
24 All the permutations in the set are consistent with the labels, in the sense that an object with a higher rating is ranked before another object with a lower rating in the permutation. [sent-48, score-0.362]
25 We then define an essential loss for ranking on the permutation set as follows. [sent-49, score-0.938]
26 First, for each permutation, we construct a sequence of classification tasks, with the goal of each task being to distinguish an object from the objects ranked below it in the permutation. [sent-50, score-0.277]
27 Third, the essential loss is defined as the minimum value of the weighted sum over all the permutations in the set. [sent-52, score-0.451]
28 Our study shows that the essential loss has several nice properties, which help us reveal the relationship between ranking measures and the pairwise/listwise losses. [sent-53, score-0.975]
29 First, it can be proved that the essential loss is an upper bound of measure-based ranking errors such as (1−NDCG) and (1−MAP). [sent-54, score-0.987]
30 Furthermore, the zero value of the essential loss is a sufficient and necessary condition for the zero values of (1−NDCG) and (1−MAP). [sent-55, score-0.416]
31 Second, it can be proved that the pairwise losses in Ranking SVM, RankBoost, and RankNet, and the listwise loss in ListMLE are all upper bounds of the essential loss. [sent-56, score-0.946]
32 As a consequence, we come to the conclusion that the loss functions used in these methods can bound (1−NDCG) and (1−MAP) from above. [sent-57, score-0.285]
33 In other words, the minimization of these loss functions can effectively maximize NDCG and MAP. [sent-58, score-0.257]
34 The proofs of the above results suggest a way to modify existing pairwise/listwise losses so as to make them tighter bounds of (1−NDCG). [sent-59, score-0.276]
35 We hypothesize that tighter bounds will lead to better ranking performances; we tested this hypothesis using benchmark datasets. [sent-60, score-0.584]
36 2 Related work In this section, we review the widely-used loss functions in learning to rank, ranking measures in information retrieval, and previous work on the relationship between loss functions and ranking measures. [sent-63, score-1.54]
37 2 Note that recently people try to directly optimize ranking measures [17, 12, 14, 18]. [sent-64, score-0.526]
38 The relationship between ranking measures and the loss functions in such work is explicitly known. [sent-65, score-0.817]
39 1 Loss functions in learning to rank Let x = {x1 , · · · , xn } be the objects be to ranked. [sent-68, score-0.227]
40 Let F be the function class and f ∈ F be a ranking function. [sent-81, score-0.466]
41 The optimal ranking function is learned from the training data by minimizing a certain loss function defined on the objects, their labels, and the ranking function. [sent-82, score-1.167]
42 Several approaches have been proposed to learn the optimal ranking function. [sent-83, score-0.466]
43 In the pointwise approach, the loss function is defined on the basis of single objects. [sent-84, score-0.305]
44 For example, in subset regression [5], the loss function is as follows, n Lr (f ; x, L) = 2 i=1 f (xi ) − l(i) . [sent-85, score-0.234]
45 (1) In the pairwise approach, the loss function is defined on the basis of pairs of objects whose labels are different. [sent-86, score-0.444]
46 For example, the loss functions of Ranking SVM [7], RankBoost [6], and RankNet [2] all have the following form, n−1 Lp (f ; x, L) = n s=1 i=1,l(i) l(j), then xi is ranked before xj in y. [sent-87, score-0.392]
47 2 Ranking measures Several ranking measures have been proposed in the literature to evaluate the performance of a ranking function. [sent-90, score-1.052]
48 When the labels are given in terms of K-level ratings (K > 2), a common practice of using MAP is to fix a level k ∗ , and regard all the objects whose levels are lower than k ∗ as having label 0, and regard the other objects as having label 1 [11]. [sent-96, score-0.449]
49 Therefore, we can consider (1−NDCG) and (1−MAP) as ranking errors. [sent-98, score-0.466]
50 For ease of reference, we call them measure-based ranking errors. [sent-99, score-0.482]
51 4 The regression based pointwise loss is an upper bound of (1−NDCG), n 1 − N DCG(f ; x, L) ≤ 1 D(i)2 2 Nn i=1 1/2 Lr (f ; x, L)1/2 . [sent-103, score-0.402]
52 The classification based pointwise loss is also an upper bound of (1−NDCG), 1 − N DCG(f ; x, L) ≤ √ 15 2 Nn n i=1 n D(i)2 − n D(i)2/n 1/2 i=1 n I{ˆ l(i)=l(i)} 1/2 , i=1 where ˆ is the label of object xi predicted by the classifier, in the setting of 5-level ratings. [sent-104, score-0.473]
53 n1 i=1 According to the above results, minimizing the regression and classification based pointwise losses will minimize (1−NDCG). [sent-106, score-0.304]
54 That is, when (1−NDCG) is zero, the loss functions may still be very large [10]. [sent-108, score-0.257]
55 Given that the pairwise and listwise approaches are regarded as the state-of-the-art in learning to rank [3, 11], it is very meaningful and important to perform more comprehensive analysis on these two approaches. [sent-111, score-0.333]
56 3 Main results In this section, we present our main results on the relationship between ranking measures and the pairwise/listwise losses. [sent-112, score-0.56]
57 The basic conclusion is that many pairwise and listwise losses are upper bounds of a quantity which we call the essential loss, and the essential loss is an upper bound of both (1−NDCG) and (1−MAP). [sent-113, score-1.146]
58 Furthermore, the zero value of the essential loss is a sufficient and necessary condition for the zero values of (1−NDCG) and (1−MAP). [sent-114, score-0.416]
59 1 Essential loss: ranking as a sequence of classifications In this subsection, we describe the essential loss for ranking. [sent-116, score-0.861]
60 According to the definition, it is clear that the NDCG and MAP of a ranking function equal one, if and only if the ranked list (permutation) given by the ranking function is consistent with the labels. [sent-128, score-1.105]
61 Second, given each permutation y ∈ YL , we decompose the ranking of objects x into several sequential steps. [sent-129, score-0.658]
62 For each step s, we distinguish xy(s) , the object ranked at the s-th position in y, from all the other objects ranked below the s-th position in y, using ranking function f . [sent-130, score-0.928]
63 y A B C π B A C y incorrect === = = =⇒ remove A π B C B C y correct = = = =⇒ === remove B π C C Figure 1: Modeling ranking as a sequence of classifications Suppose there are three objects, A, B, and C, and a permutation y = (A, B, C). [sent-141, score-0.623]
64 Suppose the output of the ranking function for these objects is (2, 3, 1), and accordingly the predicted ranked list is π = (B, A, C). [sent-142, score-0.714]
65 At step one of the decomposition, the ranking function predicts object B to be on the top of the list. [sent-143, score-0.51]
66 Then the ranking function predicts object B to be on the top of the remaining list. [sent-147, score-0.51]
67 Overall, the ranking function makes one error in this sequence of classification tasks. [sent-150, score-0.483]
68 We compute the weighted sum of the classification errors of all individual tasks, n−1 Lβ (f ; x, y) s=1 n β(s) 1 − I{f (xy(s) )>f (xy(i) )} , (6) i=s+1 and then define the minimum value of the weighted sum over all the permutations in YL as the essential loss for ranking. [sent-152, score-0.517]
69 In other words, the essential loss is zero if and only if the permutation given by the ranking function is consistent with the labels. [sent-157, score-0.98]
70 Further considering the discussions on the consistent permutation at the begining of this subsection, we can come to the conclusion that the zero value of the essential loss is a sufficient and necessary condition for the zero values of (1-NDCG) and (1-MAP). [sent-158, score-0.552]
71 2 Essential loss: upper bound of measure-based ranking errors In this subsection, we show that the essential loss is an upper bound of (1−NDCG) and (1−MAP), when specific weights β(s) are used. [sent-160, score-1.024]
72 Given K-level rating data (x, L) with nk objects having label k and then ∀f , the following inequalities hold, (1) 1 − N DCG(f ; x, L) ≤ (2) 1 − M AP (f ; x, L) ≤ K i=k∗ ni > 0, 1 Lβ (f ; x, L), where β1 (s) = G l(y(s)) D(s), ∀y ∈ YL ; Nn 1 1 Lβ2 (f ; x, L), where β2 (s) ≡ 1. [sent-162, score-0.243]
73 This can be done by changing the index of the sum in NDCG from the rank 5 position r in πf to the rank position s in ∀y ∈ YL . [sent-166, score-0.277]
74 s=1 Second, we consider the essential loss case by case. [sent-168, score-0.378]
75 , the first object with label 0 is ranked after position n1 in πf ), then all the objects with label 1 are ranked before the objects with label 0. [sent-187, score-0.656]
76 If i0 (πf ) ≤ n1 , there are i0 (πf ) − 1 objects with label 1 ranked before all the objects with label 0. [sent-189, score-0.406]
77 Essential loss: lower bound of loss functions In this section, we show that many pairwise/listwise losses are upper bounds of the essential loss. [sent-197, score-0.729]
78 The pairwise losses in Ranking SVM, RankBoost, and RankNet, and the listwise loss in ListMLE are all upper bounds of the essential loss, i. [sent-199, score-0.905]
79 Thus, the value of the pairwise loss is equal ∀y ∈ YL . [sent-206, score-0.307]
80 s=1 Considering inequality (8) and noticing that the pairwise losses are equal ∀y ∈ YL , we have n−1 Lβ (f ; x, L) ≤ max y∈YL n β(s) s=1 i=s+1 a y(i), y(s) φ f (xy(s) ) − f (xy(i) ) ≤ max 1≤s≤n−1 β(s) Lp (f ; x, L). [sent-217, score-0.292]
81 (2) We then prove the inequality for the loss function of ListMLE. [sent-218, score-0.262]
82 Therefore, we e have − ln − ln e f (xy(s) ) n i=s f (xy(s) ) n i=s e e f (xy(i) ) n−1 s=1 f (xy(i) ) β(s) − ln > ln 2 = ln 2 I{Tf (x(s) )=y(s)} . [sent-228, score-0.43]
83 To sum up, we have, ef (xy(s) ) n f (xy(i) ) i=s e n−1 > s=1 β(s) ln 2 I{Tf (x(s) )=y(s)} ≥ ln 2 min Lβ (πf , y) = ln 2 Lβ (πf , L). [sent-230, score-0.277]
84 (1) The pairwise losses in Ranking SVM, RankBoost, and RankNet are upper bounds of (1−NDCG) and (1−MAP). [sent-234, score-0.373]
85 K ni i=k∗ 1 − N DCG(f ; x, L) ≤ (2) The listwise loss in ListMLE is an upper bound of (1−NDCG) and (1−MAP). [sent-236, score-0.495]
86 The key idea is to introduce weights related to β1 (s) to the loss functions so as to make them tighter bounds of (1−NDCG). [sent-259, score-0.358]
87 i=s It can be proved (see Proposition 1 in [4]) that the above weighted losses are still upper bounds of (1−NDCG) and they are lower bounds of the original pairwise and listwise losses. [sent-261, score-0.647]
88 In other words, the above weighted loss functions are tighter bounds of (1−NDCG) than existing loss functions. [sent-262, score-0.596]
89 We tested the effectiveness of the weighted loss functions on the OHSUMED dataset in LETOR 3. [sent-263, score-0.28]
90 The methods that minimize the weighted loss functions are referred to as W-RankNet and W-ListMLE. [sent-266, score-0.28]
91 These experimental results seem to indicate that optimizing tighter bounds of the ranking measures can lead to better ranking performances. [sent-269, score-1.112]
92 5 Conclusion and future work In this work, we have proved that many pairwise/listwise losses in learning to rank are actually upper bounds of measure-based ranking errors. [sent-270, score-0.875]
93 (1) We have modeled ranking as a sequence of classifications, when defining the essential loss. [sent-274, score-0.646]
94 We will study whether the essential loss is an upper bound of other measure-based ranking errors. [sent-277, score-0.922]
95 (3) We have taken the loss functions in Ranking SVM, RankBoost, RankNet and ListMLE as examples in this study. [sent-278, score-0.257]
96 We plan to investigate the loss functions in other pairwise and listwise ranking methods, such as RankCosine [13], ListNet [3], FRank [15] and QBRank [19]. [sent-279, score-0.985]
97 (4) While we have mainly discussed the upper-bound relationship in this work, we will study whether loss functions in existing learning-to-rank methods are statistically consistent with the essential loss and the measure-based ranking errors. [sent-280, score-1.158]
98 Learning to rank: from pairwise approach to listwise approach. [sent-307, score-0.246]
99 Essential loss: Bridge the gap between ranking measures and loss functions in learning to rank. [sent-317, score-0.783]
100 A general boosting method and its application to learning ranking functions for web search. [sent-437, score-0.55]
wordName wordTfidf (topN-words)
[('ranking', 0.466), ('ndcg', 0.416), ('xy', 0.351), ('yl', 0.285), ('loss', 0.215), ('losses', 0.175), ('essential', 0.163), ('ranknet', 0.156), ('listwise', 0.154), ('tf', 0.153), ('rankboost', 0.13), ('ranked', 0.118), ('dcg', 0.114), ('listmle', 0.104), ('objects', 0.098), ('ap', 0.095), ('permutation', 0.094), ('pairwise', 0.092), ('nn', 0.092), ('ratings', 0.09), ('pointwise', 0.09), ('rank', 0.087), ('ln', 0.086), ('measures', 0.06), ('sigir', 0.058), ('map', 0.057), ('lp', 0.057), ('bounds', 0.056), ('rating', 0.051), ('upper', 0.05), ('listnet', 0.049), ('ni', 0.048), ('label', 0.046), ('letor', 0.046), ('tighter', 0.045), ('qin', 0.044), ('object', 0.044), ('svm', 0.043), ('position', 0.042), ('functions', 0.042), ('proved', 0.041), ('liu', 0.04), ('labels', 0.039), ('relationship', 0.034), ('classi', 0.034), ('frank', 0.033), ('retrieval', 0.032), ('tsai', 0.032), ('list', 0.032), ('permutations', 0.031), ('ll', 0.029), ('subsection', 0.029), ('svmmap', 0.028), ('bound', 0.028), ('mcrank', 0.026), ('inequality', 0.025), ('microsoft', 0.025), ('aforementioned', 0.024), ('lan', 0.024), ('lr', 0.024), ('ohsumed', 0.024), ('chen', 0.024), ('errors', 0.024), ('remove', 0.023), ('weighted', 0.023), ('chinese', 0.023), ('consistent', 0.023), ('web', 0.023), ('verify', 0.022), ('prove', 0.022), ('easy', 0.021), ('cations', 0.021), ('burges', 0.02), ('bridge', 0.02), ('reformulate', 0.02), ('minimizing', 0.02), ('reveal', 0.02), ('asia', 0.02), ('zero', 0.019), ('regression', 0.019), ('optimizing', 0.019), ('considering', 0.019), ('usa', 0.019), ('boosting', 0.019), ('academy', 0.019), ('sum', 0.019), ('acm', 0.018), ('satisfying', 0.018), ('named', 0.018), ('tasks', 0.017), ('ny', 0.017), ('nice', 0.017), ('correctness', 0.017), ('xj', 0.017), ('benchmark', 0.017), ('sequence', 0.017), ('regard', 0.016), ('ease', 0.016), ('plan', 0.016), ('margin', 0.016)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000002 199 nips-2009-Ranking Measures and Loss Functions in Learning to Rank
Author: Wei Chen, Tie-yan Liu, Yanyan Lan, Zhi-ming Ma, Hang Li
Abstract: Learning to rank has become an important research topic in machine learning. While most learning-to-rank methods learn the ranking functions by minimizing loss functions, it is the ranking measures (such as NDCG and MAP) that are used to evaluate the performance of the learned ranking functions. In this work, we reveal the relationship between ranking measures and loss functions in learningto-rank methods, such as Ranking SVM, RankBoost, RankNet, and ListMLE. We show that the loss functions of these methods are upper bounds of the measurebased ranking errors. As a result, the minimization of these loss functions will lead to the maximization of the ranking measures. The key to obtaining this result is to model ranking as a sequence of classification tasks, and define a so-called essential loss for ranking as the weighted sum of the classification errors of individual tasks in the sequence. We have proved that the essential loss is both an upper bound of the measure-based ranking errors, and a lower bound of the loss functions in the aforementioned methods. Our proof technique also suggests a way to modify existing loss functions to make them tighter bounds of the measure-based ranking errors. Experimental results on benchmark datasets show that the modifications can lead to better ranking performances, demonstrating the correctness of our theoretical analysis. 1
2 0.41007161 136 nips-2009-Learning to Rank by Optimizing NDCG Measure
Author: Hamed Valizadegan, Rong Jin, Ruofei Zhang, Jianchang Mao
Abstract: Learning to rank is a relatively new field of study, aiming to learn a ranking function from a set of training data with relevancy labels. The ranking algorithms are often evaluated using information retrieval measures, such as Normalized Discounted Cumulative Gain (NDCG) [1] and Mean Average Precision (MAP) [2]. Until recently, most learning to rank algorithms were not using a loss function related to the above mentioned evaluation measures. The main difficulty in direct optimization of these measures is that they depend on the ranks of documents, not the numerical values output by the ranking function. We propose a probabilistic framework that addresses this challenge by optimizing the expectation of NDCG over all the possible permutations of documents. A relaxation strategy is used to approximate the average of NDCG over the space of permutation, and a bound optimization approach is proposed to make the computation efficient. Extensive experiments show that the proposed algorithm outperforms state-of-the-art ranking algorithms on several benchmark data sets. 1
3 0.40826267 230 nips-2009-Statistical Consistency of Top-k Ranking
Author: Fen Xia, Tie-yan Liu, Hang Li
Abstract: This paper is concerned with the consistency analysis on listwise ranking methods. Among various ranking methods, the listwise methods have competitive performances on benchmark datasets and are regarded as one of the state-of-the-art approaches. Most listwise ranking methods manage to optimize ranking on the whole list (permutation) of objects, however, in practical applications such as information retrieval, correct ranking at the top k positions is much more important. This paper aims to analyze whether existing listwise ranking methods are statistically consistent in the top-k setting. For this purpose, we define a top-k ranking framework, where the true loss (and thus the risks) are defined on the basis of top-k subgroup of permutations. This framework can include the permutationlevel ranking framework proposed in previous work as a special case. Based on the new framework, we derive sufficient conditions for a listwise ranking method to be consistent with the top-k true loss, and show an effective way of modifying the surrogate loss functions in existing methods to satisfy these conditions. Experimental results show that after the modifications, the methods can work significantly better than their original versions. 1
4 0.29178095 87 nips-2009-Exponential Family Graph Matching and Ranking
Author: James Petterson, Jin Yu, Julian J. Mcauley, Tibério S. Caetano
Abstract: We present a method for learning max-weight matching predictors in bipartite graphs. The method consists of performing maximum a posteriori estimation in exponential families with sufficient statistics that encode permutations and data features. Although inference is in general hard, we show that for one very relevant application–document ranking–exact inference is efficient. For general model instances, an appropriate sampler is readily available. Contrary to existing max-margin matching models, our approach is statistically consistent and, in addition, experiments with increasing sample sizes indicate superior improvement over such models. We apply the method to graph matching in computer vision as well as to a standard benchmark dataset for learning document ranking, in which we obtain state-of-the-art results, in particular improving on max-margin variants. The drawback of this method with respect to max-margin alternatives is its runtime for large graphs, which is comparatively high. 1
5 0.17586398 190 nips-2009-Polynomial Semantic Indexing
Author: Bing Bai, Jason Weston, David Grangier, Ronan Collobert, Kunihiko Sadamasa, Yanjun Qi, Corinna Cortes, Mehryar Mohri
Abstract: We present a class of nonlinear (polynomial) models that are discriminatively trained to directly map from the word content in a query-document or documentdocument pair to a ranking score. Dealing with polynomial models on word features is computationally challenging. We propose a low-rank (but diagonal preserving) representation of our polynomial models to induce feasible memory and computation requirements. We provide an empirical study on retrieval tasks based on Wikipedia documents, where we obtain state-of-the-art performance while providing realistically scalable methods. 1
6 0.091542378 3 nips-2009-AUC optimization and the two-sample problem
7 0.086305335 198 nips-2009-Rank-Approximate Nearest Neighbor Search: Retaining Meaning and Speed in High Dimensions
8 0.074880809 78 nips-2009-Efficient Moments-based Permutation Tests
9 0.068646297 14 nips-2009-A Parameter-free Hedging Algorithm
10 0.06783127 102 nips-2009-Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models
11 0.067558855 37 nips-2009-Asymptotically Optimal Regularization in Smooth Parametric Models
12 0.064247847 98 nips-2009-From PAC-Bayes Bounds to KL Regularization
13 0.063383803 122 nips-2009-Label Selection on Graphs
14 0.062674291 71 nips-2009-Distribution-Calibrated Hierarchical Classification
15 0.058198988 7 nips-2009-A Data-Driven Approach to Modeling Choice
16 0.051922172 207 nips-2009-Robust Nonparametric Regression with Metric-Space Valued Output
17 0.048577756 251 nips-2009-Unsupervised Detection of Regions of Interest Using Iterative Link Analysis
18 0.048346143 20 nips-2009-A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers
19 0.048168648 201 nips-2009-Region-based Segmentation and Object Detection
20 0.046375547 31 nips-2009-An LP View of the M-best MAP problem
topicId topicWeight
[(0, -0.161), (1, 0.096), (2, -0.097), (3, -0.057), (4, 0.004), (5, -0.082), (6, -0.512), (7, -0.252), (8, 0.115), (9, -0.313), (10, 0.134), (11, 0.113), (12, -0.158), (13, -0.04), (14, -0.055), (15, 0.024), (16, -0.132), (17, 0.068), (18, -0.013), (19, 0.043), (20, -0.1), (21, -0.06), (22, -0.022), (23, -0.019), (24, 0.019), (25, -0.022), (26, -0.009), (27, 0.026), (28, -0.007), (29, -0.028), (30, 0.013), (31, 0.004), (32, -0.009), (33, 0.009), (34, 0.014), (35, -0.005), (36, -0.006), (37, 0.012), (38, -0.006), (39, 0.042), (40, 0.008), (41, -0.049), (42, 0.008), (43, -0.008), (44, -0.037), (45, -0.012), (46, 0.033), (47, -0.038), (48, -0.028), (49, -0.045)]
simIndex simValue paperId paperTitle
same-paper 1 0.97775698 199 nips-2009-Ranking Measures and Loss Functions in Learning to Rank
Author: Wei Chen, Tie-yan Liu, Yanyan Lan, Zhi-ming Ma, Hang Li
Abstract: Learning to rank has become an important research topic in machine learning. While most learning-to-rank methods learn the ranking functions by minimizing loss functions, it is the ranking measures (such as NDCG and MAP) that are used to evaluate the performance of the learned ranking functions. In this work, we reveal the relationship between ranking measures and loss functions in learningto-rank methods, such as Ranking SVM, RankBoost, RankNet, and ListMLE. We show that the loss functions of these methods are upper bounds of the measurebased ranking errors. As a result, the minimization of these loss functions will lead to the maximization of the ranking measures. The key to obtaining this result is to model ranking as a sequence of classification tasks, and define a so-called essential loss for ranking as the weighted sum of the classification errors of individual tasks in the sequence. We have proved that the essential loss is both an upper bound of the measure-based ranking errors, and a lower bound of the loss functions in the aforementioned methods. Our proof technique also suggests a way to modify existing loss functions to make them tighter bounds of the measure-based ranking errors. Experimental results on benchmark datasets show that the modifications can lead to better ranking performances, demonstrating the correctness of our theoretical analysis. 1
2 0.94626164 230 nips-2009-Statistical Consistency of Top-k Ranking
Author: Fen Xia, Tie-yan Liu, Hang Li
Abstract: This paper is concerned with the consistency analysis on listwise ranking methods. Among various ranking methods, the listwise methods have competitive performances on benchmark datasets and are regarded as one of the state-of-the-art approaches. Most listwise ranking methods manage to optimize ranking on the whole list (permutation) of objects, however, in practical applications such as information retrieval, correct ranking at the top k positions is much more important. This paper aims to analyze whether existing listwise ranking methods are statistically consistent in the top-k setting. For this purpose, we define a top-k ranking framework, where the true loss (and thus the risks) are defined on the basis of top-k subgroup of permutations. This framework can include the permutationlevel ranking framework proposed in previous work as a special case. Based on the new framework, we derive sufficient conditions for a listwise ranking method to be consistent with the top-k true loss, and show an effective way of modifying the surrogate loss functions in existing methods to satisfy these conditions. Experimental results show that after the modifications, the methods can work significantly better than their original versions. 1
3 0.89787352 136 nips-2009-Learning to Rank by Optimizing NDCG Measure
Author: Hamed Valizadegan, Rong Jin, Ruofei Zhang, Jianchang Mao
Abstract: Learning to rank is a relatively new field of study, aiming to learn a ranking function from a set of training data with relevancy labels. The ranking algorithms are often evaluated using information retrieval measures, such as Normalized Discounted Cumulative Gain (NDCG) [1] and Mean Average Precision (MAP) [2]. Until recently, most learning to rank algorithms were not using a loss function related to the above mentioned evaluation measures. The main difficulty in direct optimization of these measures is that they depend on the ranks of documents, not the numerical values output by the ranking function. We propose a probabilistic framework that addresses this challenge by optimizing the expectation of NDCG over all the possible permutations of documents. A relaxation strategy is used to approximate the average of NDCG over the space of permutation, and a bound optimization approach is proposed to make the computation efficient. Extensive experiments show that the proposed algorithm outperforms state-of-the-art ranking algorithms on several benchmark data sets. 1
4 0.70687622 87 nips-2009-Exponential Family Graph Matching and Ranking
Author: James Petterson, Jin Yu, Julian J. Mcauley, Tibério S. Caetano
Abstract: We present a method for learning max-weight matching predictors in bipartite graphs. The method consists of performing maximum a posteriori estimation in exponential families with sufficient statistics that encode permutations and data features. Although inference is in general hard, we show that for one very relevant application–document ranking–exact inference is efficient. For general model instances, an appropriate sampler is readily available. Contrary to existing max-margin matching models, our approach is statistically consistent and, in addition, experiments with increasing sample sizes indicate superior improvement over such models. We apply the method to graph matching in computer vision as well as to a standard benchmark dataset for learning document ranking, in which we obtain state-of-the-art results, in particular improving on max-margin variants. The drawback of this method with respect to max-margin alternatives is its runtime for large graphs, which is comparatively high. 1
5 0.52644992 190 nips-2009-Polynomial Semantic Indexing
Author: Bing Bai, Jason Weston, David Grangier, Ronan Collobert, Kunihiko Sadamasa, Yanjun Qi, Corinna Cortes, Mehryar Mohri
Abstract: We present a class of nonlinear (polynomial) models that are discriminatively trained to directly map from the word content in a query-document or documentdocument pair to a ranking score. Dealing with polynomial models on word features is computationally challenging. We propose a low-rank (but diagonal preserving) representation of our polynomial models to induce feasible memory and computation requirements. We provide an empirical study on retrieval tasks based on Wikipedia documents, where we obtain state-of-the-art performance while providing realistically scalable methods. 1
6 0.50131524 206 nips-2009-Riffled Independence for Ranked Data
7 0.38480902 7 nips-2009-A Data-Driven Approach to Modeling Choice
8 0.37222603 78 nips-2009-Efficient Moments-based Permutation Tests
9 0.34475803 3 nips-2009-AUC optimization and the two-sample problem
10 0.28994712 233 nips-2009-Streaming Pointwise Mutual Information
11 0.28708014 244 nips-2009-The Wisdom of Crowds in the Recollection of Order Information
12 0.272957 71 nips-2009-Distribution-Calibrated Hierarchical Classification
13 0.26070091 98 nips-2009-From PAC-Bayes Bounds to KL Regularization
14 0.2458677 161 nips-2009-Nash Equilibria of Static Prediction Games
15 0.24545194 102 nips-2009-Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models
16 0.2266718 14 nips-2009-A Parameter-free Hedging Algorithm
17 0.21293126 207 nips-2009-Robust Nonparametric Regression with Metric-Space Valued Output
19 0.20900218 94 nips-2009-Fast Learning from Non-i.i.d. Observations
20 0.20719177 198 nips-2009-Rank-Approximate Nearest Neighbor Search: Retaining Meaning and Speed in High Dimensions
topicId topicWeight
[(1, 0.158), (12, 0.029), (21, 0.013), (24, 0.061), (25, 0.06), (35, 0.04), (36, 0.079), (39, 0.062), (58, 0.109), (71, 0.034), (77, 0.092), (86, 0.151)]
simIndex simValue paperId paperTitle
same-paper 1 0.87311804 199 nips-2009-Ranking Measures and Loss Functions in Learning to Rank
Author: Wei Chen, Tie-yan Liu, Yanyan Lan, Zhi-ming Ma, Hang Li
Abstract: Learning to rank has become an important research topic in machine learning. While most learning-to-rank methods learn the ranking functions by minimizing loss functions, it is the ranking measures (such as NDCG and MAP) that are used to evaluate the performance of the learned ranking functions. In this work, we reveal the relationship between ranking measures and loss functions in learningto-rank methods, such as Ranking SVM, RankBoost, RankNet, and ListMLE. We show that the loss functions of these methods are upper bounds of the measurebased ranking errors. As a result, the minimization of these loss functions will lead to the maximization of the ranking measures. The key to obtaining this result is to model ranking as a sequence of classification tasks, and define a so-called essential loss for ranking as the weighted sum of the classification errors of individual tasks in the sequence. We have proved that the essential loss is both an upper bound of the measure-based ranking errors, and a lower bound of the loss functions in the aforementioned methods. Our proof technique also suggests a way to modify existing loss functions to make them tighter bounds of the measure-based ranking errors. Experimental results on benchmark datasets show that the modifications can lead to better ranking performances, demonstrating the correctness of our theoretical analysis. 1
2 0.85802448 50 nips-2009-Canonical Time Warping for Alignment of Human Behavior
Author: Feng Zhou, Fernando Torre
Abstract: Alignment of time series is an important problem to solve in many scientific disciplines. In particular, temporal alignment of two or more subjects performing similar activities is a challenging problem due to the large temporal scale difference between human actions as well as the inter/intra subject variability. In this paper we present canonical time warping (CTW), an extension of canonical correlation analysis (CCA) for spatio-temporal alignment of human motion between two subjects. CTW extends previous work on CCA in two ways: (i) it combines CCA with dynamic time warping (DTW), and (ii) it extends CCA by allowing local spatial deformations. We show CTW’s effectiveness in three experiments: alignment of synthetic data, alignment of motion capture data of two subjects performing similar actions, and alignment of similar facial expressions made by two people. Our results demonstrate that CTW provides both visually and qualitatively better alignment than state-of-the-art techniques based on DTW. 1
3 0.80712926 136 nips-2009-Learning to Rank by Optimizing NDCG Measure
Author: Hamed Valizadegan, Rong Jin, Ruofei Zhang, Jianchang Mao
Abstract: Learning to rank is a relatively new field of study, aiming to learn a ranking function from a set of training data with relevancy labels. The ranking algorithms are often evaluated using information retrieval measures, such as Normalized Discounted Cumulative Gain (NDCG) [1] and Mean Average Precision (MAP) [2]. Until recently, most learning to rank algorithms were not using a loss function related to the above mentioned evaluation measures. The main difficulty in direct optimization of these measures is that they depend on the ranks of documents, not the numerical values output by the ranking function. We propose a probabilistic framework that addresses this challenge by optimizing the expectation of NDCG over all the possible permutations of documents. A relaxation strategy is used to approximate the average of NDCG over the space of permutation, and a bound optimization approach is proposed to make the computation efficient. Extensive experiments show that the proposed algorithm outperforms state-of-the-art ranking algorithms on several benchmark data sets. 1
4 0.79893386 29 nips-2009-An Infinite Factor Model Hierarchy Via a Noisy-Or Mechanism
Author: Douglas Eck, Yoshua Bengio, Aaron C. Courville
Abstract: The Indian Buffet Process is a Bayesian nonparametric approach that models objects as arising from an infinite number of latent factors. Here we extend the latent factor model framework to two or more unbounded layers of latent factors. From a generative perspective, each layer defines a conditional factorial prior distribution over the binary latent variables of the layer below via a noisy-or mechanism. We explore the properties of the model with two empirical studies, one digit recognition task and one music tag data experiment. 1
5 0.79860973 106 nips-2009-Heavy-Tailed Symmetric Stochastic Neighbor Embedding
Author: Zhirong Yang, Irwin King, Zenglin Xu, Erkki Oja
Abstract: Stochastic Neighbor Embedding (SNE) has shown to be quite promising for data visualization. Currently, the most popular implementation, t-SNE, is restricted to a particular Student t-distribution as its embedding distribution. Moreover, it uses a gradient descent algorithm that may require users to tune parameters such as the learning step size, momentum, etc., in finding its optimum. In this paper, we propose the Heavy-tailed Symmetric Stochastic Neighbor Embedding (HSSNE) method, which is a generalization of the t-SNE to accommodate various heavytailed embedding similarity functions. With this generalization, we are presented with two difficulties. The first is how to select the best embedding similarity among all heavy-tailed functions and the second is how to optimize the objective function once the heavy-tailed function has been selected. Our contributions then are: (1) we point out that various heavy-tailed embedding similarities can be characterized by their negative score functions. Based on this finding, we present a parameterized subset of similarity functions for choosing the best tail-heaviness for HSSNE; (2) we present a fixed-point optimization algorithm that can be applied to all heavy-tailed functions and does not require the user to set any parameters; and (3) we present two empirical studies, one for unsupervised visualization showing that our optimization algorithm runs as fast and as good as the best known t-SNE implementation and the other for semi-supervised visualization showing quantitative superiority using the homogeneity measure as well as qualitative advantage in cluster separation over t-SNE.
6 0.79252815 237 nips-2009-Subject independent EEG-based BCI decoding
7 0.75863719 32 nips-2009-An Online Algorithm for Large Scale Image Similarity Learning
8 0.75513649 104 nips-2009-Group Sparse Coding
9 0.7473526 137 nips-2009-Learning transport operators for image manifolds
10 0.74575216 230 nips-2009-Statistical Consistency of Top-k Ranking
11 0.74411559 87 nips-2009-Exponential Family Graph Matching and Ranking
12 0.74144912 119 nips-2009-Kernel Methods for Deep Learning
13 0.73990762 142 nips-2009-Locality-sensitive binary codes from shift-invariant kernels
14 0.73614633 212 nips-2009-Semi-Supervised Learning in Gigantic Image Collections
15 0.73144686 151 nips-2009-Measuring Invariances in Deep Networks
16 0.7289356 148 nips-2009-Matrix Completion from Power-Law Distributed Samples
17 0.72833627 108 nips-2009-Heterogeneous multitask learning with joint sparsity constraints
18 0.72559798 3 nips-2009-AUC optimization and the two-sample problem
19 0.72368968 155 nips-2009-Modelling Relational Data using Bayesian Clustered Tensor Factorization
20 0.72304249 158 nips-2009-Multi-Label Prediction via Sparse Infinite CCA