nips nips2009 nips2009-136 knowledge-graph by maker-knowledge-mining

136 nips-2009-Learning to Rank by Optimizing NDCG Measure


Source: pdf

Author: Hamed Valizadegan, Rong Jin, Ruofei Zhang, Jianchang Mao

Abstract: Learning to rank is a relatively new field of study, aiming to learn a ranking function from a set of training data with relevancy labels. The ranking algorithms are often evaluated using information retrieval measures, such as Normalized Discounted Cumulative Gain (NDCG) [1] and Mean Average Precision (MAP) [2]. Until recently, most learning to rank algorithms were not using a loss function related to the above mentioned evaluation measures. The main difficulty in direct optimization of these measures is that they depend on the ranks of documents, not the numerical values output by the ranking function. We propose a probabilistic framework that addresses this challenge by optimizing the expectation of NDCG over all the possible permutations of documents. A relaxation strategy is used to approximate the average of NDCG over the space of permutation, and a bound optimization approach is proposed to make the computation efficient. Extensive experiments show that the proposed algorithm outperforms state-of-the-art ranking algorithms on several benchmark data sets. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 com Abstract Learning to rank is a relatively new field of study, aiming to learn a ranking function from a set of training data with relevancy labels. [sent-5, score-0.429]

2 The ranking algorithms are often evaluated using information retrieval measures, such as Normalized Discounted Cumulative Gain (NDCG) [1] and Mean Average Precision (MAP) [2]. [sent-6, score-0.286]

3 Until recently, most learning to rank algorithms were not using a loss function related to the above mentioned evaluation measures. [sent-7, score-0.2]

4 The main difficulty in direct optimization of these measures is that they depend on the ranks of documents, not the numerical values output by the ranking function. [sent-8, score-0.252]

5 We propose a probabilistic framework that addresses this challenge by optimizing the expectation of NDCG over all the possible permutations of documents. [sent-9, score-0.097]

6 Extensive experiments show that the proposed algorithm outperforms state-of-the-art ranking algorithms on several benchmark data sets. [sent-11, score-0.258]

7 1 Introduction Learning to rank has attracted the focus of many machine learning researchers in the last decade because of its growing application in the areas like information retrieval (IR) and recommender systems. [sent-12, score-0.177]

8 In the simplest form, the so-called pointwise approaches, ranking can be treated as classification or regression by learning the numeric rank value of documents as an absolute quantity [3, 4]. [sent-13, score-0.514]

9 The second group of algorithms, the pairwise approaches, considers the pair of documents as independent variables and learns a classification (regression) model to correctly order the training pairs [5, 6, 7, 8, 9, 10, 11]. [sent-14, score-0.148]

10 The main problem with these approaches is that their loss functions are related to individual documents while most evaluation metrics of information retrieval measure the ranking quality for individual queries, not documents. [sent-15, score-0.524]

11 This mismatch has motivated the so called listwise approaches for information ranking, which treats each ranking list of documents for a query as a training instance [2, 12, 13, 14, 15, 16, 17]. [sent-16, score-0.566]

12 Unlike the pointwise or pairwise approaches, the listwise approaches aim to optimize the evaluation metrics such as NDCG and MAP. [sent-17, score-0.327]

13 The main difficulty in optimizing these evaluation metrics is that they are dependent on the rank position of documents induced by the ranking function, not the numerical values output by the ranking function. [sent-18, score-0.86]

14 In the past studies, this problem was addressed either by the convex surrogate of the IR metrics or by heuristic optimization methods such as genetic algorithm. [sent-19, score-0.089]

15 In this work, we address this challenge by a probabilistic framework that optimizes the expectation of NDCG over all the possible permutation of documents. [sent-20, score-0.103]

16 To handle the computational difficulty, we present a relaxation strategy that approximates the expectation of NDCG in the space of permutation, and a bound optimization algorithm [18] for efficient optimization. [sent-21, score-0.099]

17 Our experiment with several benchmark data sets shows that our method performs better than several state-of-the-art ranking techniques. [sent-22, score-0.258]

18 2 Related Work We focus on reviewing the listwise approaches that are closely related to the theme of this work. [sent-27, score-0.156]

19 The listwise approaches can be classified into two categories. [sent-28, score-0.156]

20 Most IR evaluation metrics, however, depend on the sorted order of documents, and are non-convex in the target ranking function. [sent-30, score-0.265]

21 In [13], the authors introduced LambdaRank that addresses the difficulty in optimizing IR metrics by defining a virtual gradient on each document after the sorting. [sent-34, score-0.195]

22 This may partially explain why LambdaRank performs very poor when compared to MCRank [3], a simple adjustment of classification for ranking (a pointwise approach). [sent-36, score-0.273]

23 However they use monte carlo sampling to address the intractable task of computing the expectation in the permutation space which could be a bad approximation for the queries with large number of documents. [sent-39, score-0.122]

24 However they deploy heuristics to embed the IR evaluation metrics in computing the weights of queries and the importance of weak rankers; i. [sent-41, score-0.158]

25 it uses NDCG value of each query in the current iteration as the weight for that query in constructing the weak ranker (the documents of each query have similar weight). [sent-43, score-0.318]

26 [21] reduced the ranking, as measured by NDCG, to pairwise classification and applied alternating optimization strategy to address the sorting problem by fixing the rank position in getting the derivative. [sent-47, score-0.249]

27 Since SVM-MAP is designed to optimize MAP, it only considers the binary relevancy and cannot be applied to the data sets that have more than two levels of relevance judgements. [sent-49, score-0.12]

28 The second group of listwise algorithms defines a listwise loss function as an indirect way to optimize the IR evaluation metrics. [sent-50, score-0.411]

29 RankCosine [12] uses cosine similarity between the ranking list and the ground truth as a query level loss function. [sent-51, score-0.337]

30 ListNet [14] adopts the KL divergence for loss function by defining a probabilistic distribution in the space of permutation for learning to rank. [sent-52, score-0.098]

31 FRank [9] uses a new loss function called fidelity loss on the probability framework introduced in ListNet. [sent-53, score-0.088]

32 The main problem with this group of approaches is that the connection between the listwise loss function and the targeted IR evaluation metric is unclear, and therefore optimizing the listwise loss function may not necessarily result in the optimization of the IR metrics. [sent-55, score-0.497]

33 For each query q k , we have a collection of mk documents Dk = {dk , i = 1, . [sent-61, score-0.337]

34 , mk }, whose relevance to i k k q k is given by a vector rk = (r1 , . [sent-64, score-0.179]

35 We denote by F (d, q) the ranking function that k takes a document-query pair (d, q) and outputs a real number score, and by ji the rank of document k k k di within the collection D for query q . [sent-68, score-0.512]

36 The NDCG value for ranking function F (d, q) is then computed as following: k m n 1 k 2ri − 1 1 L(Q, F ) = (1) k n Zk i=1 log(1 + ji ) k=1 where Zk is the normalization factor [1]. [sent-69, score-0.262]

37 NDCG is usually truncated at a particular rank level (e. [sent-70, score-0.124]

38 2 A Probabilistic Framework One of the main challenges faced by optimizing the NDCG metric defined in Equation (1) is that the k dependence of document ranks (i. [sent-74, score-0.131]

39 , ji ) on the ranking function F (d, q) is not explicitly expressed, which makes it computationally challenging. [sent-76, score-0.262]

40 To address this problem, we consider the expectation of L(Q, F ) over all the possible rankings induced by the ranking function F (d, q), i. [sent-77, score-0.263]

41 , 1 ¯ L(Q, F ) = n n k=1 1 Zk mk i=1 k 2ri − 1 k log(1 + ji ) 1 n = F n k=1 1 Zk mk k Pr(π k |F, q k ) i=1 π k ∈Sm k 2ri − 1 (2) log(1 + π k (i)) where Smk stands for the group of permutations of mk documents, and π k is an instance of permutation (or ranking). [sent-79, score-0.641]

42 Notation π k (i) stands for the rank position of the ith document by π k . [sent-80, score-0.239]

43 For any distribution Pr(π|F, q), the inequality L(Q, F ) ≥ H(Q, F ) holds where ¯ H(Q, F ) = 1 n n k=1 1 Zk mk i=1 k 2ri − 1 log(1 + π k (i) (3) F) Proof. [sent-83, score-0.179]

44 In the next step of simplification, we rewrite π k (i) as mk π k (i) = 1 + I(π k (i) > π k (j)) (4) j=1 where I(x) outputs 1 when x is true and zero otherwise. [sent-87, score-0.179]

45 Hence, π k (i) is written as mk π k (i) = 1 + mk I(π k (i) > π k (j)) = 1 + j=1 Pr(π k (i) > π k (j)) (5) j=1 ¯ As a result, to optimize H(Q, F ), we only need to define Pr(π k (i) > π k (j)), i. [sent-88, score-0.381]

46 , the marginal distribution for document dk to be ranked before document dk . [sent-90, score-0.762]

47 In the next section, we will disi j cuss how to define a probability model for Pr(π k |F, q k ), and derive pairwise ranking probability Pr(π k (i) > π k (j)) from distribution Pr(π k |F, q k ). [sent-91, score-0.264]

48 Equation (6) models each pair (dk , dk ) of the ranking list π k by the factor exp(F (dk , q k ) − F (dk , q k )) if dk i j i j i is ranked before dk (i. [sent-94, score-1.14]

49 This modeling choice is consistent j i j with the idea of ranking the documents with largest scores first; intuitively, the more documents in a permutation are in the decreasing order of score, the bigger the probability of the permutation is. [sent-97, score-0.575]

50 ¯ Using Equation (6) for Pr(π k |F, q k ), we have H(Q, F ) expressed in terms of ranking function F . [sent-98, score-0.233]

51 ¯ By maximizing H(Q, F ) over F , we could find the optimal solution for ranking function F . [sent-99, score-0.233]

52 Notice that there is a a b one-to-one mapping between these two sets; namely for any ranking π k ∈ Gk (i, j), we could create a a corresponding ranking π k ∈ Gk (i, j) by switching the rankings of document dk and dk and vice i j b versa. [sent-102, score-1.143]

53 If F (dk , q k ) > F (dk , q k ), we have i j 1 (7) Pr(π k (i) > π k (j)) ≤ 1 + exp 2(F (dk , q k ) − F (dk , q k )) i j Proof. [sent-105, score-0.105]

54 The idea of using logistic model for Pr(π k (i) > π k (j)) is not new in learning to rank [7, 9]; however it has been taken for granted and no justification has been provided in using it for learning to rank. [sent-109, score-0.146]

55 By plugging the i result of this proposition to the objective function in Equation (9), the new objective is to minimize the following quantity: m n k 1 k ri 1 ¯ (2 − 1)Ak (11) M(Q, F ) ≈ i n Zk i=1 k=1 The objective function in Equation (11) is explicitly related to F via term Ak . [sent-112, score-0.115]

56 In the next section, we i ¯ aim to derive an algorithm that learns an effective ranking function by efficiently minimizing M. [sent-113, score-0.233]

57 It is ¯ also important to note that although M is no longer a rigorous lower bound for the original objective ¯ function L, our empirical study shows that this approximation is very effective in identifying the appropriate ranking function from the training data. [sent-114, score-0.276]

58 Let Fik denote the value obtained so far for document dk . [sent-117, score-0.381]

59 To i improve NDCG, following the idea of Adaboost, we restrict the new ranking value for document dk , i ˜ denoted by Fik , is updated as to the following form: ˜ Fik = Fik + αfik (12) where α > 0 is the combination weight and fik = f (dk , q k ) ∈ {0, 1} is a binary value. [sent-118, score-1.178]

60 Note that in i the above, we assume the ranking function F (d, q) is updated iteratively with an addition of binary classification function f (d, q), which leads to efficient computation as well as effective exploitation ¯ of the existing algorithms for data classification. [sent-119, score-0.258]

61 1 1 k k ≤ + γi,j exp(α(fj − fik )) − 1 (13) ˜ ˜ 1 + exp(Fik − Fjk ) 1 + exp(Fik − Fjk ) where exp(Fik − Fjk ) k (14) γi,j = 2 1 + exp(Fik − Fjk ) The proof of this proposition can be found in Appendix A. [sent-123, score-0.595]

62 This proposition separates the term related to Fik from that related to αfik in Equation (11), and shows how the new weak ranker (i. [sent-124, score-0.113]

63 , the binary classification function f (d, q)) will affect the current ranking function F (d, q). [sent-126, score-0.258]

64 Given the solution for binary classifier fid , the optimal α that minimizes the objective function in Equation (11) is   k n mk 2ri −1 k k θi,j I(fj < fik ) 1 k=1 i,j=1 Zk  α = log  (15) k n mk 2 2ri −1 k θ I(f k > f k ) k=1 i,j=1 Zk k k where θi,j = γi,j I(j = i). [sent-129, score-0.943]

65 i,j exp(3α) − 1 ¯ ˜ ¯ M(Q, F ) ≤ M(Q, F ) + γ(α) + 3 j n i mk  mk fik  k=1 i=1 j=1  k k 2ri − 2rj k  θi,j Zk where γ(α) is only a function of α with γ(0) = 0. [sent-131, score-0.877]

66 The importance of this theorem is that the optimal solution for fik can be found without knowing the solution for α. [sent-134, score-0.519]

67 k k First, it computes θij for every pair of documents of query k. [sent-136, score-0.158]

68 Then, it computes wi , a weight for k each document which can be positive or negative. [sent-137, score-0.157]

69 A positive weight wi indicates that the ranking position of dk induced by the current ranking function F is less than its true rank position, while a i k negative weight wi shows that ranking position of dk induced by the current F is greater than its i k true rank position. [sent-138, score-1.743]

70 Therefore, the sign of weight wi provides a clear guidance for how to construct k the next weak ranker, the binary classifier in our case; that is, the documents with a positive wi k should be labeled as +1 by the binary classifier and those with negative wi should be labeled as −1. [sent-139, score-0.366]

71 k The magnitude of wi shows how much the corresponding document is misplaced in the ranking. [sent-140, score-0.137]

72 In other words, it shows the importance of correcting the ranking position of document dk in terms i of improving the value of NDCG. [sent-141, score-0.644]

73 We use sampling strategy in order to maximize η because most binary classifiers do not support the weighted training set; that is, we first sample the k documents according to |wi | and then construct a binary classifier with the sampled documents. [sent-143, score-0.194]

74 i i 5 Algorithm 1 NDCG Boost: A Boosting Algorithm for Maximizing NDCG 1: Initialize F (dk ) = 0 for all documents i 2: repeat k k k 3: Compute θi,j = γi,j I(j = i) for all document pairs of each query. [sent-146, score-0.202]

75 3: Compute the weight for each document as mk k k wi = j=1 k 2ri − 2rj k θi,j Zk (16) k k Assign each document the following class label yi = sign(wi ). [sent-149, score-0.421]

76 d Train a classifier f (x) : R → {0, 1} that maximizes the following quantity 3: 4: n η mk k k |wi |f (dk )yi i = (17) k=1 i=1 5: Predict fi for all documents in {Dk , i = 1, . [sent-150, score-0.321]

77 7: Update the ranking function as Fik ← Fik + αfik . [sent-154, score-0.233]

78 There are 106 queries in the OSHUMED data sets with a number of documents for each query. [sent-160, score-0.155]

79 The relevancy of each document in OHSUMED data set is scored 0 (irrelevant), 1 (possibly) or 2 (definitely). [sent-161, score-0.157]

80 The total number of query-document relevancy judgments provided in OHSUMED data set is 16140 and there are 45 features for each query-document pair. [sent-162, score-0.094]

81 For TD2003, TD2004, HP2003, HP2004 and NP2003, there are 50, 75, 75, 75 and 150 queries, respectively, with about 1000 retrieved documents for each query. [sent-163, score-0.147]

82 For these data sets, there are 63 features extracted for each query-document pair and a binary relevancy judgment for each pair is provided. [sent-165, score-0.097]

83 The results of a number of state-of-the-art learning to rank algorithms are also provided in the LETOR package. [sent-167, score-0.146]

84 Since these baselines include some of the most well-known learning to rank algorithms from each category (pointwise, pairwise and listwise), we use them to study the performance of NDCG Boost. [sent-168, score-0.186]

85 Here is the list of these baselines (the details can be found in the LETOR web page): Regression: This is a simple linear regression which is a basic pointwise approach and can be considered as a reference point. [sent-169, score-0.09]

86 It uses similar probability model to RankNet [7] for the relative rank position of two documents, with a novel loss function called Fidelity loss function [9]. [sent-172, score-0.242]

87 ListNet: ListNet is a listwise learning to rank algorithm [14]. [sent-174, score-0.28]

88 It uses cross-entropy loss as its listwise loss function. [sent-175, score-0.244]

89 AdaRank NDCG: This is a listwise boosting algorithm that incorporates NDCG in computing the samples and combination weights [20]. [sent-176, score-0.189]

90 Figure 1 provides the the average results of five folds for different learning to rank algorithms in terms of NDCG @ each of the first 10 truncation level on the LETOR data sets 3 . [sent-231, score-0.124]

91 Similarly, AdaRank NDCG achieves a decent performance on OHSUMED data set, but fails to deliver accurate ranking results on TD2003, HP2003 and NP2003. [sent-243, score-0.233]

92 We present a relaxation strategy to effectively approximate the expectation of NDCG, and a bound optimization strategy for efficient optimization. [sent-250, score-0.126]

93 Our experiments on benchmark data sets shows that our method is superior to the state-of-the-art learning to rank algorithms in terms of performance and stability. [sent-251, score-0.149]

94 3 NDCG is commonly measured at the first few retrieved documents to emphasize their importance. [sent-252, score-0.147]

95 This n m rk i −1 k k k leads to minimizing k=1 i,j=1 2 Zk θi,j exp(α(fj − fik )) , the term related to α . [sent-259, score-0.519]

96 C Proof of Theorem 2 k First, we provide the following proposition to handle exp(α(fj − fik )). [sent-261, score-0.574]

97 Mcrank: Learning to rank using multiple classification and gradient boosting. [sent-274, score-0.124]

98 Learning to rank: From pairwise approach to listwise approach. [sent-318, score-0.187]

99 Learning to rank for information retrieval using genetic programming. [sent-339, score-0.202]

100 Letor: Benchmark dataset for research on learning to rank for information retrieval. [sent-349, score-0.124]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('fik', 0.519), ('ndcg', 0.449), ('dk', 0.296), ('ranking', 0.233), ('fjk', 0.214), ('mk', 0.179), ('listwise', 0.156), ('zk', 0.139), ('ir', 0.126), ('rank', 0.124), ('pr', 0.119), ('documents', 0.117), ('letor', 0.112), ('sigir', 0.111), ('fj', 0.106), ('exp', 0.105), ('ohsumed', 0.097), ('boost', 0.087), ('document', 0.085), ('adarank', 0.081), ('gk', 0.076), ('relevancy', 0.072), ('frank', 0.065), ('ranksvm', 0.063), ('hang', 0.058), ('proposition', 0.055), ('listnet', 0.054), ('permutation', 0.054), ('retrieval', 0.053), ('wi', 0.052), ('tao', 0.047), ('optimizing', 0.046), ('metrics', 0.045), ('loss', 0.044), ('query', 0.041), ('ak', 0.04), ('pointwise', 0.04), ('qin', 0.039), ('equation', 0.038), ('queries', 0.038), ('ranker', 0.035), ('mcrank', 0.035), ('boosting', 0.033), ('evaluation', 0.032), ('yahoo', 0.032), ('pairwise', 0.031), ('baselines', 0.031), ('retrieved', 0.03), ('expectation', 0.03), ('position', 0.03), ('development', 0.03), ('ji', 0.029), ('rong', 0.029), ('homepage', 0.029), ('jue', 0.029), ('lambdarank', 0.029), ('smk', 0.029), ('valizadegan', 0.029), ('volkovs', 0.029), ('annual', 0.028), ('tsai', 0.028), ('strategy', 0.027), ('culty', 0.027), ('burges', 0.027), ('tie', 0.027), ('jun', 0.026), ('yan', 0.026), ('liu', 0.026), ('hamed', 0.025), ('dcg', 0.025), ('fi', 0.025), ('benchmark', 0.025), ('binary', 0.025), ('genetic', 0.025), ('package', 0.024), ('classi', 0.024), ('optimize', 0.023), ('lemma', 0.023), ('distillation', 0.023), ('weak', 0.023), ('bound', 0.023), ('acm', 0.023), ('provided', 0.022), ('finding', 0.021), ('xu', 0.021), ('proof', 0.021), ('permutations', 0.021), ('log', 0.021), ('ming', 0.02), ('nonsmooth', 0.02), ('deploy', 0.02), ('weight', 0.02), ('objective', 0.02), ('feng', 0.019), ('optimization', 0.019), ('optimizes', 0.019), ('virtual', 0.019), ('map', 0.019), ('list', 0.019), ('getting', 0.018)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9999997 136 nips-2009-Learning to Rank by Optimizing NDCG Measure

Author: Hamed Valizadegan, Rong Jin, Ruofei Zhang, Jianchang Mao

Abstract: Learning to rank is a relatively new field of study, aiming to learn a ranking function from a set of training data with relevancy labels. The ranking algorithms are often evaluated using information retrieval measures, such as Normalized Discounted Cumulative Gain (NDCG) [1] and Mean Average Precision (MAP) [2]. Until recently, most learning to rank algorithms were not using a loss function related to the above mentioned evaluation measures. The main difficulty in direct optimization of these measures is that they depend on the ranks of documents, not the numerical values output by the ranking function. We propose a probabilistic framework that addresses this challenge by optimizing the expectation of NDCG over all the possible permutations of documents. A relaxation strategy is used to approximate the average of NDCG over the space of permutation, and a bound optimization approach is proposed to make the computation efficient. Extensive experiments show that the proposed algorithm outperforms state-of-the-art ranking algorithms on several benchmark data sets. 1

2 0.41007161 199 nips-2009-Ranking Measures and Loss Functions in Learning to Rank

Author: Wei Chen, Tie-yan Liu, Yanyan Lan, Zhi-ming Ma, Hang Li

Abstract: Learning to rank has become an important research topic in machine learning. While most learning-to-rank methods learn the ranking functions by minimizing loss functions, it is the ranking measures (such as NDCG and MAP) that are used to evaluate the performance of the learned ranking functions. In this work, we reveal the relationship between ranking measures and loss functions in learningto-rank methods, such as Ranking SVM, RankBoost, RankNet, and ListMLE. We show that the loss functions of these methods are upper bounds of the measurebased ranking errors. As a result, the minimization of these loss functions will lead to the maximization of the ranking measures. The key to obtaining this result is to model ranking as a sequence of classification tasks, and define a so-called essential loss for ranking as the weighted sum of the classification errors of individual tasks in the sequence. We have proved that the essential loss is both an upper bound of the measure-based ranking errors, and a lower bound of the loss functions in the aforementioned methods. Our proof technique also suggests a way to modify existing loss functions to make them tighter bounds of the measure-based ranking errors. Experimental results on benchmark datasets show that the modifications can lead to better ranking performances, demonstrating the correctness of our theoretical analysis. 1

3 0.30233437 87 nips-2009-Exponential Family Graph Matching and Ranking

Author: James Petterson, Jin Yu, Julian J. Mcauley, Tibério S. Caetano

Abstract: We present a method for learning max-weight matching predictors in bipartite graphs. The method consists of performing maximum a posteriori estimation in exponential families with sufficient statistics that encode permutations and data features. Although inference is in general hard, we show that for one very relevant application–document ranking–exact inference is efficient. For general model instances, an appropriate sampler is readily available. Contrary to existing max-margin matching models, our approach is statistically consistent and, in addition, experiments with increasing sample sizes indicate superior improvement over such models. We apply the method to graph matching in computer vision as well as to a standard benchmark dataset for learning document ranking, in which we obtain state-of-the-art results, in particular improving on max-margin variants. The drawback of this method with respect to max-margin alternatives is its runtime for large graphs, which is comparatively high. 1

4 0.23288937 217 nips-2009-Sharing Features among Dynamical Systems with Beta Processes

Author: Alan S. Willsky, Erik B. Sudderth, Michael I. Jordan, Emily B. Fox

Abstract: We propose a Bayesian nonparametric approach to the problem of modeling related time series. Using a beta process prior, our approach is based on the discovery of a set of latent dynamical behaviors that are shared among multiple time series. The size of the set and the sharing pattern are both inferred from data. We develop an efficient Markov chain Monte Carlo inference method that is based on the Indian buffet process representation of the predictive distribution of the beta process. In particular, our approach uses the sum-product algorithm to efficiently compute Metropolis-Hastings acceptance probabilities, and explores new dynamical behaviors via birth/death proposals. We validate our sampling algorithm using several synthetic datasets, and also demonstrate promising results on unsupervised segmentation of visual motion capture data.

5 0.21557038 230 nips-2009-Statistical Consistency of Top-k Ranking

Author: Fen Xia, Tie-yan Liu, Hang Li

Abstract: This paper is concerned with the consistency analysis on listwise ranking methods. Among various ranking methods, the listwise methods have competitive performances on benchmark datasets and are regarded as one of the state-of-the-art approaches. Most listwise ranking methods manage to optimize ranking on the whole list (permutation) of objects, however, in practical applications such as information retrieval, correct ranking at the top k positions is much more important. This paper aims to analyze whether existing listwise ranking methods are statistically consistent in the top-k setting. For this purpose, we define a top-k ranking framework, where the true loss (and thus the risks) are defined on the basis of top-k subgroup of permutations. This framework can include the permutationlevel ranking framework proposed in previous work as a special case. Based on the new framework, we derive sufficient conditions for a listwise ranking method to be consistent with the top-k true loss, and show an effective way of modifying the surrogate loss functions in existing methods to satisfy these conditions. Experimental results show that after the modifications, the methods can work significantly better than their original versions. 1

6 0.18358038 190 nips-2009-Polynomial Semantic Indexing

7 0.070765726 54 nips-2009-Compositionality of optimal control laws

8 0.069351748 210 nips-2009-STDP enables spiking neurons to detect hidden causes of their inputs

9 0.062091786 129 nips-2009-Learning a Small Mixture of Trees

10 0.056674082 40 nips-2009-Bayesian Nonparametric Models on Decomposable Graphs

11 0.055477887 208 nips-2009-Robust Principal Component Analysis: Exact Recovery of Corrupted Low-Rank Matrices via Convex Optimization

12 0.053638406 3 nips-2009-AUC optimization and the two-sample problem

13 0.050438564 204 nips-2009-Replicated Softmax: an Undirected Topic Model

14 0.049872879 71 nips-2009-Distribution-Calibrated Hierarchical Classification

15 0.047017224 18 nips-2009-A Stochastic approximation method for inference in probabilistic graphical models

16 0.045883153 78 nips-2009-Efficient Moments-based Permutation Tests

17 0.045105845 161 nips-2009-Nash Equilibria of Static Prediction Games

18 0.044165649 114 nips-2009-Indian Buffet Processes with Power-law Behavior

19 0.042081304 153 nips-2009-Modeling Social Annotation Data with Content Relevance using a Topic Model

20 0.042062089 108 nips-2009-Heterogeneous multitask learning with joint sparsity constraints


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.16), (1, 0.065), (2, -0.059), (3, -0.088), (4, 0.054), (5, -0.147), (6, -0.433), (7, -0.212), (8, 0.09), (9, -0.269), (10, 0.124), (11, 0.11), (12, -0.175), (13, -0.004), (14, -0.115), (15, -0.055), (16, -0.075), (17, -0.002), (18, -0.036), (19, 0.029), (20, -0.068), (21, -0.053), (22, -0.025), (23, -0.001), (24, 0.059), (25, -0.067), (26, -0.023), (27, 0.037), (28, -0.024), (29, 0.008), (30, -0.053), (31, -0.023), (32, -0.058), (33, 0.032), (34, -0.005), (35, -0.089), (36, -0.04), (37, 0.028), (38, 0.03), (39, 0.031), (40, -0.045), (41, -0.028), (42, -0.026), (43, -0.019), (44, 0.054), (45, -0.009), (46, 0.009), (47, 0.087), (48, 0.019), (49, -0.076)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96171373 136 nips-2009-Learning to Rank by Optimizing NDCG Measure

Author: Hamed Valizadegan, Rong Jin, Ruofei Zhang, Jianchang Mao

Abstract: Learning to rank is a relatively new field of study, aiming to learn a ranking function from a set of training data with relevancy labels. The ranking algorithms are often evaluated using information retrieval measures, such as Normalized Discounted Cumulative Gain (NDCG) [1] and Mean Average Precision (MAP) [2]. Until recently, most learning to rank algorithms were not using a loss function related to the above mentioned evaluation measures. The main difficulty in direct optimization of these measures is that they depend on the ranks of documents, not the numerical values output by the ranking function. We propose a probabilistic framework that addresses this challenge by optimizing the expectation of NDCG over all the possible permutations of documents. A relaxation strategy is used to approximate the average of NDCG over the space of permutation, and a bound optimization approach is proposed to make the computation efficient. Extensive experiments show that the proposed algorithm outperforms state-of-the-art ranking algorithms on several benchmark data sets. 1

2 0.87942213 199 nips-2009-Ranking Measures and Loss Functions in Learning to Rank

Author: Wei Chen, Tie-yan Liu, Yanyan Lan, Zhi-ming Ma, Hang Li

Abstract: Learning to rank has become an important research topic in machine learning. While most learning-to-rank methods learn the ranking functions by minimizing loss functions, it is the ranking measures (such as NDCG and MAP) that are used to evaluate the performance of the learned ranking functions. In this work, we reveal the relationship between ranking measures and loss functions in learningto-rank methods, such as Ranking SVM, RankBoost, RankNet, and ListMLE. We show that the loss functions of these methods are upper bounds of the measurebased ranking errors. As a result, the minimization of these loss functions will lead to the maximization of the ranking measures. The key to obtaining this result is to model ranking as a sequence of classification tasks, and define a so-called essential loss for ranking as the weighted sum of the classification errors of individual tasks in the sequence. We have proved that the essential loss is both an upper bound of the measure-based ranking errors, and a lower bound of the loss functions in the aforementioned methods. Our proof technique also suggests a way to modify existing loss functions to make them tighter bounds of the measure-based ranking errors. Experimental results on benchmark datasets show that the modifications can lead to better ranking performances, demonstrating the correctness of our theoretical analysis. 1

3 0.81813884 230 nips-2009-Statistical Consistency of Top-k Ranking

Author: Fen Xia, Tie-yan Liu, Hang Li

Abstract: This paper is concerned with the consistency analysis on listwise ranking methods. Among various ranking methods, the listwise methods have competitive performances on benchmark datasets and are regarded as one of the state-of-the-art approaches. Most listwise ranking methods manage to optimize ranking on the whole list (permutation) of objects, however, in practical applications such as information retrieval, correct ranking at the top k positions is much more important. This paper aims to analyze whether existing listwise ranking methods are statistically consistent in the top-k setting. For this purpose, we define a top-k ranking framework, where the true loss (and thus the risks) are defined on the basis of top-k subgroup of permutations. This framework can include the permutationlevel ranking framework proposed in previous work as a special case. Based on the new framework, we derive sufficient conditions for a listwise ranking method to be consistent with the top-k true loss, and show an effective way of modifying the surrogate loss functions in existing methods to satisfy these conditions. Experimental results show that after the modifications, the methods can work significantly better than their original versions. 1

4 0.68208009 87 nips-2009-Exponential Family Graph Matching and Ranking

Author: James Petterson, Jin Yu, Julian J. Mcauley, Tibério S. Caetano

Abstract: We present a method for learning max-weight matching predictors in bipartite graphs. The method consists of performing maximum a posteriori estimation in exponential families with sufficient statistics that encode permutations and data features. Although inference is in general hard, we show that for one very relevant application–document ranking–exact inference is efficient. For general model instances, an appropriate sampler is readily available. Contrary to existing max-margin matching models, our approach is statistically consistent and, in addition, experiments with increasing sample sizes indicate superior improvement over such models. We apply the method to graph matching in computer vision as well as to a standard benchmark dataset for learning document ranking, in which we obtain state-of-the-art results, in particular improving on max-margin variants. The drawback of this method with respect to max-margin alternatives is its runtime for large graphs, which is comparatively high. 1

5 0.56984031 190 nips-2009-Polynomial Semantic Indexing

Author: Bing Bai, Jason Weston, David Grangier, Ronan Collobert, Kunihiko Sadamasa, Yanjun Qi, Corinna Cortes, Mehryar Mohri

Abstract: We present a class of nonlinear (polynomial) models that are discriminatively trained to directly map from the word content in a query-document or documentdocument pair to a ranking score. Dealing with polynomial models on word features is computationally challenging. We propose a low-rank (but diagonal preserving) representation of our polynomial models to induce feasible memory and computation requirements. We provide an empirical study on retrieval tasks based on Wikipedia documents, where we obtain state-of-the-art performance while providing realistically scalable methods. 1

6 0.4190613 206 nips-2009-Riffled Independence for Ranked Data

7 0.34574348 217 nips-2009-Sharing Features among Dynamical Systems with Beta Processes

8 0.31609136 7 nips-2009-A Data-Driven Approach to Modeling Choice

9 0.28479323 114 nips-2009-Indian Buffet Processes with Power-law Behavior

10 0.28282219 233 nips-2009-Streaming Pointwise Mutual Information

11 0.27202857 3 nips-2009-AUC optimization and the two-sample problem

12 0.26022872 198 nips-2009-Rank-Approximate Nearest Neighbor Search: Retaining Meaning and Speed in High Dimensions

13 0.25232431 54 nips-2009-Compositionality of optimal control laws

14 0.25022087 244 nips-2009-The Wisdom of Crowds in the Recollection of Order Information

15 0.24080291 78 nips-2009-Efficient Moments-based Permutation Tests

16 0.23412336 161 nips-2009-Nash Equilibria of Static Prediction Games

17 0.22414716 71 nips-2009-Distribution-Calibrated Hierarchical Classification

18 0.21932112 102 nips-2009-Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models

19 0.20957027 189 nips-2009-Periodic Step Size Adaptation for Single Pass On-line Learning

20 0.20202982 72 nips-2009-Distribution Matching for Transduction


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(7, 0.013), (12, 0.022), (24, 0.064), (25, 0.036), (35, 0.047), (36, 0.089), (39, 0.059), (44, 0.014), (58, 0.073), (61, 0.02), (71, 0.05), (77, 0.261), (81, 0.013), (86, 0.138)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.82586402 106 nips-2009-Heavy-Tailed Symmetric Stochastic Neighbor Embedding

Author: Zhirong Yang, Irwin King, Zenglin Xu, Erkki Oja

Abstract: Stochastic Neighbor Embedding (SNE) has shown to be quite promising for data visualization. Currently, the most popular implementation, t-SNE, is restricted to a particular Student t-distribution as its embedding distribution. Moreover, it uses a gradient descent algorithm that may require users to tune parameters such as the learning step size, momentum, etc., in finding its optimum. In this paper, we propose the Heavy-tailed Symmetric Stochastic Neighbor Embedding (HSSNE) method, which is a generalization of the t-SNE to accommodate various heavytailed embedding similarity functions. With this generalization, we are presented with two difficulties. The first is how to select the best embedding similarity among all heavy-tailed functions and the second is how to optimize the objective function once the heavy-tailed function has been selected. Our contributions then are: (1) we point out that various heavy-tailed embedding similarities can be characterized by their negative score functions. Based on this finding, we present a parameterized subset of similarity functions for choosing the best tail-heaviness for HSSNE; (2) we present a fixed-point optimization algorithm that can be applied to all heavy-tailed functions and does not require the user to set any parameters; and (3) we present two empirical studies, one for unsupervised visualization showing that our optimization algorithm runs as fast and as good as the best known t-SNE implementation and the other for semi-supervised visualization showing quantitative superiority using the homogeneity measure as well as qualitative advantage in cluster separation over t-SNE.

same-paper 2 0.81144738 136 nips-2009-Learning to Rank by Optimizing NDCG Measure

Author: Hamed Valizadegan, Rong Jin, Ruofei Zhang, Jianchang Mao

Abstract: Learning to rank is a relatively new field of study, aiming to learn a ranking function from a set of training data with relevancy labels. The ranking algorithms are often evaluated using information retrieval measures, such as Normalized Discounted Cumulative Gain (NDCG) [1] and Mean Average Precision (MAP) [2]. Until recently, most learning to rank algorithms were not using a loss function related to the above mentioned evaluation measures. The main difficulty in direct optimization of these measures is that they depend on the ranks of documents, not the numerical values output by the ranking function. We propose a probabilistic framework that addresses this challenge by optimizing the expectation of NDCG over all the possible permutations of documents. A relaxation strategy is used to approximate the average of NDCG over the space of permutation, and a bound optimization approach is proposed to make the computation efficient. Extensive experiments show that the proposed algorithm outperforms state-of-the-art ranking algorithms on several benchmark data sets. 1

3 0.79235888 237 nips-2009-Subject independent EEG-based BCI decoding

Author: Siamac Fazli, Cristian Grozea, Marton Danoczy, Benjamin Blankertz, Florin Popescu, Klaus-Robert Müller

Abstract: In the quest to make Brain Computer Interfacing (BCI) more usable, dry electrodes have emerged that get rid of the initial 30 minutes required for placing an electrode cap. Another time consuming step is the required individualized adaptation to the BCI user, which involves another 30 minutes calibration for assessing a subject’s brain signature. In this paper we aim to also remove this calibration proceedure from BCI setup time by means of machine learning. In particular, we harvest a large database of EEG BCI motor imagination recordings (83 subjects) for constructing a library of subject-specific spatio-temporal filters and derive a subject independent BCI classifier. Our offline results indicate that BCI-na¨ve ı users could start real-time BCI use with no prior calibration at only a very moderate performance loss.

4 0.73000234 185 nips-2009-Orthogonal Matching Pursuit From Noisy Random Measurements: A New Analysis

Author: Sundeep Rangan, Alyson K. Fletcher

Abstract: A well-known analysis of Tropp and Gilbert shows that orthogonal matching pursuit (OMP) can recover a k-sparse n-dimensional real vector from m = 4k log(n) noise-free linear measurements obtained through a random Gaussian measurement matrix with a probability that approaches one as n → ∞. This work strengthens this result by showing that a lower number of measurements, m = 2k log(n − k), is in fact sufficient for asymptotic recovery. More generally, when the sparsity level satisfies kmin ≤ k ≤ kmax but is unknown, m = 2kmax log(n − kmin ) measurements is sufficient. Furthermore, this number of measurements is also sufficient for detection of the sparsity pattern (support) of the vector with measurement errors provided the signal-to-noise ratio (SNR) scales to infinity. The scaling m = 2k log(n − k) exactly matches the number of measurements required by the more complex lasso method for signal recovery in a similar SNR scaling.

5 0.68040413 199 nips-2009-Ranking Measures and Loss Functions in Learning to Rank

Author: Wei Chen, Tie-yan Liu, Yanyan Lan, Zhi-ming Ma, Hang Li

Abstract: Learning to rank has become an important research topic in machine learning. While most learning-to-rank methods learn the ranking functions by minimizing loss functions, it is the ranking measures (such as NDCG and MAP) that are used to evaluate the performance of the learned ranking functions. In this work, we reveal the relationship between ranking measures and loss functions in learningto-rank methods, such as Ranking SVM, RankBoost, RankNet, and ListMLE. We show that the loss functions of these methods are upper bounds of the measurebased ranking errors. As a result, the minimization of these loss functions will lead to the maximization of the ranking measures. The key to obtaining this result is to model ranking as a sequence of classification tasks, and define a so-called essential loss for ranking as the weighted sum of the classification errors of individual tasks in the sequence. We have proved that the essential loss is both an upper bound of the measure-based ranking errors, and a lower bound of the loss functions in the aforementioned methods. Our proof technique also suggests a way to modify existing loss functions to make them tighter bounds of the measure-based ranking errors. Experimental results on benchmark datasets show that the modifications can lead to better ranking performances, demonstrating the correctness of our theoretical analysis. 1

6 0.63458854 87 nips-2009-Exponential Family Graph Matching and Ranking

7 0.62309933 230 nips-2009-Statistical Consistency of Top-k Ranking

8 0.60444731 32 nips-2009-An Online Algorithm for Large Scale Image Similarity Learning

9 0.60338879 104 nips-2009-Group Sparse Coding

10 0.60140187 119 nips-2009-Kernel Methods for Deep Learning

11 0.59850818 3 nips-2009-AUC optimization and the two-sample problem

12 0.59806269 137 nips-2009-Learning transport operators for image manifolds

13 0.59451687 2 nips-2009-3D Object Recognition with Deep Belief Nets

14 0.5944826 145 nips-2009-Manifold Embeddings for Model-Based Reinforcement Learning under Partial Observability

15 0.59202713 151 nips-2009-Measuring Invariances in Deep Networks

16 0.59067076 142 nips-2009-Locality-sensitive binary codes from shift-invariant kernels

17 0.58911955 132 nips-2009-Learning in Markov Random Fields using Tempered Transitions

18 0.58878821 250 nips-2009-Training Factor Graphs with Reinforcement Learning for Efficient MAP Inference

19 0.58802694 112 nips-2009-Human Rademacher Complexity

20 0.58581269 139 nips-2009-Linear-time Algorithms for Pairwise Statistical Problems