nips nips2007 nips2007-126 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Ping Li, Qiang Wu, Christopher J. Burges
Abstract: We cast the ranking problem as (1) multiple classification (“Mc”) (2) multiple ordinal classification, which lead to computationally tractable learning algorithms for relevance ranking in Web search. We consider the DCG criterion (discounted cumulative gain), a standard quality measure in information retrieval. Our approach is motivated by the fact that perfect classifications result in perfect DCG scores and the DCG errors are bounded by classification errors. We propose using the Expected Relevance to convert class probabilities into ranking scores. The class probabilities are learned using a gradient boosting tree algorithm. Evaluations on large-scale datasets show that our approach can improve LambdaRank [5] and the regressions-based ranker [6], in terms of the (normalized) DCG scores. An efficient implementation of the boosting tree algorithm is also presented.
Reference: text
sentIndex sentText sentNum sentScore
1 com Abstract We cast the ranking problem as (1) multiple classification (“Mc”) (2) multiple ordinal classification, which lead to computationally tractable learning algorithms for relevance ranking in Web search. [sent-7, score-1.048]
2 Our approach is motivated by the fact that perfect classifications result in perfect DCG scores and the DCG errors are bounded by classification errors. [sent-9, score-0.284]
3 We propose using the Expected Relevance to convert class probabilities into ranking scores. [sent-10, score-0.38]
4 The class probabilities are learned using a gradient boosting tree algorithm. [sent-11, score-0.319]
5 Evaluations on large-scale datasets show that our approach can improve LambdaRank [5] and the regressions-based ranker [6], in terms of the (normalized) DCG scores. [sent-12, score-0.163]
6 An efficient implementation of the boosting tree algorithm is also presented. [sent-13, score-0.29]
7 1 Introduction The general ranking problem has widespread applications including commercial search engines and recommender systems. [sent-14, score-0.388]
8 We develop McRank, a computationally tractable learning algorithm for the general ranking problem; and we present our approach in the context of ranking in Web search. [sent-15, score-0.566]
9 For a given user input query, a commercial search engine returns many pages of URLs, in an order determined by the underlying proprietary ranking algorithm. [sent-16, score-0.392]
10 The type of ranking problem in this study is sometimes referred to as dynamic ranking (or simply, just ranking), because the URLs are dynamically ranked (in real-time) according to the specific user input query. [sent-18, score-0.603]
11 This is different from the query-independent static ranking based on, for example, “page rank” [3] or “authorities and hubs” [12], which may, at least conceptually, serve as an important “feature” for dynamic ranking or to guide the generation of a list of URLs fed to the dynamic ranker. [sent-19, score-0.566]
12 [6] considered the DCG measure (discounted cumulative gain) [10] and showed that the DCG errors are bounded by regression errors. [sent-25, score-0.141]
13 From the definition of DCG, it appears more direct to cast the ranking problem as multiple classification (“Mc”) as opposed to regression. [sent-27, score-0.367]
14 In order to convert classification results into ranking scores, we propose a simple and stable mechanism by using the Expected Relevance. [sent-28, score-0.323]
15 Our evaluations on large-scale datasets demonstrate the superiority of the classification-based ranker (McRank) over both the regression-based and pair-based schemes. [sent-29, score-0.186]
16 2 Discounted Cumulative Gain (DCG) For an input query, the ranker returns n ordered URLs. [sent-30, score-0.121]
17 Suppose the URLs fed to the ranker are originally ordered {1, 2, 3, . [sent-31, score-0.121]
18 The ranker will output a permutation mapping π : {1, 2, 3, . [sent-35, score-0.15]
19 The DCG score is computed from the relevance levels of the n URLs as n n DCG = i=1 ∗ 1 c[i] (2yσi − 1) = i=1 c[πi ] (2yi − 1) , (1) Much of the work was conducted while Ping Li was an intern at Microsoft in 2006. [sent-43, score-0.192]
20 where [i] is the rank order, and yi ∈ {0, 1, 2, 3, 4} is the relevance level of the ith URL in the original (pre-ranked) order. [sent-45, score-0.346]
21 yi = 4 corresponds to a “perfect” relevance and yi = 0 corresponds to a “poor” relevance. [sent-46, score-0.336]
22 It is a common practice to normalize the DCG score for each query and report the normalized DCG (“NDCG”) score averaged over all queries. [sent-51, score-0.142]
23 In other words, the NDCG for the jth query (NDCGj ) and the final NDCG of the dataset (NDCGF ) are NDCGj = DCGj , DCGj,g NDCGF = 1 NQ NQ NDCGj , (3) j=1 where DCGj,g is the maximum possible (or “gold standard”) DCG score of the jth query. [sent-52, score-0.21]
24 3 Learning to Rank Using Classification The definition of DCG suggests that we can cast the ranking problem naturally as multiple classification (i. [sent-53, score-0.367]
25 , K = 5 classes), because obviously perfect classifications will lead to perfect DCG scores. [sent-55, score-0.154]
26 We should mention that one does not really need perfect classifications in order to produce perfect DCG scores. [sent-57, score-0.154]
27 , URLs labeled level 4 are classified as level 3, and so on), we still have the perfect DCG score but the classification “error” is 100%. [sent-61, score-0.207]
28 This phenomenon to an extent, may provide some “safety cushion” for casting ranking as classification. [sent-62, score-0.283]
29 [6] cast ranking as regression and showed that the DCG errors are bounded by regression errors. [sent-63, score-0.514]
30 For example, it is well-known that, although one can use regression for classification, it is often better to use logistic regression especially for multiple classification [8]. [sent-65, score-0.21]
31 One simple way to obtain the perfect DCGg is to rank the URLs directly according to the gold-standard relevance levels. [sent-69, score-0.245]
32 That is, all URLs with relevance level k + 1 are ranked higher than those with relevance level ≤ k; and the URLs with the same relevance levels are arbitrarily ranked without affecting DCGg . [sent-70, score-0.616]
33 Suppose a classifier assigns a relevance level yi ∈ {0, 1, 2, 3, 4} to the ith URL, for all n URLs. [sent-76, score-0.304]
34 A permutation mapping π ranks the ˆ URLs according to yi , i. [sent-77, score-0.134]
35 , π(i) < π(j) if yi > yj , and, URL i and URL j are arbitrarily ranked if ˆ ˆ ˆ yi = yj . [sent-79, score-0.247]
36 The jth query corresponds to nj URLs; each URL is manually labeled by one of the K = 5 relevance levels. [sent-88, score-0.233]
37 Both pair-based rankers and regression-based rankers implicitly made this assumption, as they tried to learn a single rank function for all queries using the same set of features. [sent-92, score-0.244]
38 Here xi ∈ RP is the ith feature vector i=1 in P dimensions; and yi ∈ {0, 1, 2, 3, 4 = K − 1} is the class (relevance) label of the ith data point. [sent-97, score-0.202]
39 3 From Classification to Ranking Although perfect classifications lead to perfect DCG scores, in reality, we will need a mechanism to convert (imperfect) classification results into ranking scores. [sent-99, score-0.477]
40 This suggestion, however, will lead to highly unstable ranking results. [sent-102, score-0.283]
41 Recall we assume a training dataset {yi , xi }N , where the class label yi ∈ {0, 1, 2, 3, 4 = K − 1}. [sent-105, score-0.178]
42 i=1 We learn the class probabilities pi,k = Pr(yi = k), denoted by pi,k , and define a scoring function: ˆ K−1 Si = pi,k T (k), ˆ (5) k=0 where T (k) is some monotone (increasing) function of the relevance level k. [sent-106, score-0.264]
43 Once we have computed the scores Si for all data points, we can then sort the data points within each query by the descending order of Si . [sent-107, score-0.207]
44 Consequently, the ranking results are not affected by any affine transformation on T (k), aT (k) + b, (a > 0), because K−1 k=0 K−1 pi,k (a × T (k) + b) = a × K−1 pi,k T (k) k=0 + b, pi,k = 1. [sent-115, score-0.305]
45 (7) Algorithm 1 implements a boosting tree algorithm for learning class probabilities pi,k ; and we use basically the same implementation later for regression as well as multiple ordinal classification. [sent-120, score-0.723]
46 Algorithm 1 The boosting tree algorithm for multiple classification, taken from [9, Algorithm 6], although the presentation is slightly different. [sent-121, score-0.308]
47 M is the total number of boosting iterations, J is the tree size (number of terminal nodes), and ν is the shrinkage coefficient. [sent-124, score-0.327]
48 4 Multiple Ordinal Classification to Further Improve Ranking There is the possibility to (slightly) further improve our classification-based ranking scheme by taking into account the natural orders among the class labels, i. [sent-130, score-0.34]
49 A common approach for multiple ordinal classification is to learn the cumulative probabilities Pr (yi ≤ k) instead of the class probabilities Pr (yi = k) = pi,k . [sent-133, score-0.391]
50 Now we have a binary classification problem and hence we can use exactly the same boosting tree algorithm for multiple classification. [sent-138, score-0.308]
51 We then infer the class probabilities pi,k = Pr (yi = k) = Pr (yi ≤ k) − Pr (yi ≤ k − 1) , (8) and again we use the Expected Relevance to compute the ranking scores and sort the URLs. [sent-141, score-0.465]
52 We call both rankers based on multiple classification and multiple ordinal classification as McRank. [sent-142, score-0.391]
53 5 Regression-based Ranking Using Boosting Tree Algorithm With slight modifications, the boosting tree algorithm can be used for regressions. [sent-143, score-0.262]
54 In fact, we also implemented the LAD boosting tree algorithm but we found the performance was considerably worse than the least-square tree boost. [sent-148, score-0.378]
55 Algorithm 2 The boosting tree algorithm for regressions. [sent-149, score-0.262]
56 After we have learned the values for Si , we use them directly as the ranking scores to order the data points within each query. [sent-150, score-0.384]
57 1 The Datasets The artificial dataset [5] was meant to remove any variance caused by the quality of features and/or relevance labels. [sent-161, score-0.222]
58 The Web search dataset Web-1 [5] has 367 features and 10,000/5,000/10,000 queries for train/validation/test, with in total 652,500 URLs. [sent-163, score-0.179]
59 2 The Parameters: M , J, ν There are three main parameters in the boosting tree algorithm. [sent-170, score-0.262]
60 The number of terminal nodes, J, should be reasonably big (but not too big) when the dataset is large with a large number of features, because the tree has to be deep enough to consider higher-order interactions [9]. [sent-179, score-0.205]
61 With these values of J and ν, we did not observe obvious over-fitting even for a very large number of boosting iterations M . [sent-181, score-0.146]
62 3 The Test NDCG Results at Truncation Level L = 10 Table 1 lists the NDCG results (both the mean and standard deviation, in percentages (%)) for all 4 datasets and all 4 ranking algorithms, evaluated at the truncation level L = 10. [sent-184, score-0.56]
63 The NDCG scores indicate that that McRank (ordinal classification and classification) considerably improves the regression-based ranker and LambdaRank. [sent-185, score-0.222]
64 If we conduct a one-sided t-test, the im- Table 1: The test NDCG scores produced by 4 rankers on 4 datasets. [sent-186, score-0.174]
65 The average NDCG scores are presented in percentages (%) with the standard deviations in the parentheses. [sent-187, score-0.124]
66 For the artificial data, Web-1, and Web-3, we use the ordinal classification results to compute the p-values. [sent-190, score-0.226]
67 However, for Web-2, because our implementation for testing ordinal classification required too much memory for M = 2000, we did not obtain the final test NDCG scores; the partial results indicated that ordinal classification did not improve classification for this dataset. [sent-191, score-0.48]
68 However, multiple ordinal classification did not show significant improvement over multiple classification, except for the artificial dataset. [sent-230, score-0.318]
69 This is probably due to the fact that the artificial data are generated noise-free and hence the flexible (with high capacity) rankers using boosting tree algorithms tend to fit the data very well. [sent-232, score-0.335]
70 4 The NDCG Results at Various Truncation Levels (L = 1 to 10) For the artificial dataset and Web-1, [5] also reported the NDCG scores at various truncation levels, L = 1 to 10. [sent-234, score-0.334]
71 Figure 1 verifies that the improvements shown in Table 1 are not only true for L = 10 but also (essentially) true for smaller truncation levels. [sent-237, score-0.162]
72 7 Conclusion The ranking problem has become an important topic in machine learning, partly due to its widespread applications in many decision-making processes especially in commercial search engines. [sent-241, score-0.361]
73 In one aspect, the ranking problem is difficult because the measures of rank quality are usually based on sorting, which is not directly optimizable (at least not efficiently). [sent-242, score-0.349]
74 On the other hand, one can cast ranking into various classical learning tasks such as regression and classification. [sent-243, score-0.403]
75 The proposed classification-based ranking scheme is motivated by the fact that perfect classifications lead to perfect DCG scores and the DCG errors are bounded by the classification errors. [sent-244, score-0.599]
76 It appears natural that the classification-based ranker is more direct and should work better than the regressionbased ranker suggested in [6]. [sent-245, score-0.242]
77 To learn the class probabilities, we implement a boosting tree algorithm for multiple classification and we use the same implementation for multiple ordinal classification and regression. [sent-247, score-0.633]
78 Since commercial proprietary datasets are usually very large, an adaptive quantization-based approach efficiently implements the boosting tree algorithm, which avoids sorting and has lower memory cost. [sent-248, score-0.439]
79 Our experimental results have demonstrated that McRank (including multiple classification and multiple ordinal classification) outperforms both the regression-based ranker and the pair-based LambdaRank. [sent-249, score-0.439]
80 However, except for the artificial dataset, we did not observe significant improvement of ordinal classification over classification. [sent-250, score-0.226]
81 In a summary, we regard McRank algorithm (retrospectively) simple, robust, and capable of producing quality ranking results. [sent-251, score-0.307]
82 Appendix I An Efficient Implementation for Building Boosting Trees We use the standard regression tree algorithm [2], which recursively splits the training data points into two groups on the current “best” feature that will reduce the mean square errors (MSE) the most. [sent-252, score-0.253]
83 We suggest a simpler and more efficient approach, by taking advantage of some properties of the boosting tree algorithm. [sent-255, score-0.262]
84 While the boosting tree algorithm is well-known to be robust and also accurate, an individual tree has limited predictive power and usually can be built quite crudely. [sent-256, score-0.378]
85 In Figure 2(b), we bin (quantize) the data points into two (0/1) levels on the horizontal (i. [sent-260, score-0.152]
86 Panel (b) suggests that, if we bin the data on the x axis to be binary, the reduced MSE will not be affected either, if the data are binned in the way as in (b). [sent-266, score-0.161]
87 Of course, we would not know ahead of time how to bin the data to avoid losing accuracy. [sent-268, score-0.116]
88 In the pre-processing stage, for each feature, the training data points are sorted according to the feature value; and we bin the feature values in the sorted order. [sent-270, score-0.218]
89 We start with a very small initial bin length, e. [sent-271, score-0.116]
90 As shown in Figure 2(c), we only bin the data where there are indeed data, because the boosting tree algorithm will not consider the area where there are no data anyway. [sent-274, score-0.378]
91 If the bin length is so small that we need more than B bins, we simply increment the bin length and re-do the quantization. [sent-276, score-0.232]
92 After the quantization, we replace the original feature value by the bin labels (0, 1, 2, . [sent-277, score-0.142]
93 Note that since we start with a small bin length, the ordinal categorical features are naturally taken care of. [sent-281, score-0.366]
94 This simple binning scheme is very effective particularly for the boosting tree algorithm: • It simplifies the implementation. [sent-282, score-0.294]
95 Appendix II Some More Experiments on Web-1 Figure 3 (a)(b) present the experiment with our adaptive quantization scheme on Web-1 dataset. [sent-290, score-0.12]
96 We binned the data with the maximum bin number B = 23 , 24 , 25 , 26 , 27 , 28 , and 216 . [sent-291, score-0.139]
97 Panel (a) plots the relative number of total bins in Web-1 as a function of the exponent, normalized by the total number of bins at B = 216 . [sent-293, score-0.144]
98 When B = 28 , the total number of bins is only about 6% of that when B = 216 ; however, both quantization levels achieved the same test NDCG scores. [sent-295, score-0.196]
99 Figure 3 (c) compares two scoring functions to convert learned class probabilities into rankK−1 ing scores, including the Expected Relevance Si = ˆ k=0 pi,k k and the Expected Gain Si = K−1 k ˆ k=0 pi,k 2 − 1 . [sent-300, score-0.128]
100 A regression framework for learning ranking functions using relative relevance judgments. [sent-383, score-0.491]
wordName wordTfidf (topN-words)
[('ndcg', 0.442), ('dcg', 0.365), ('urls', 0.323), ('ranking', 0.283), ('lambdarank', 0.277), ('ordinal', 0.226), ('truncation', 0.162), ('boosting', 0.146), ('classi', 0.145), ('relevance', 0.126), ('ranker', 0.121), ('bin', 0.116), ('tree', 0.116), ('mcrank', 0.108), ('yi', 0.105), ('scores', 0.101), ('dcgg', 0.092), ('quantization', 0.088), ('regression', 0.082), ('query', 0.082), ('ranknet', 0.08), ('url', 0.08), ('classification', 0.078), ('perfect', 0.077), ('rankers', 0.073), ('gi', 0.07), ('nq', 0.067), ('si', 0.067), ('cation', 0.063), ('web', 0.06), ('mse', 0.057), ('pr', 0.056), ('queries', 0.056), ('commercial', 0.051), ('level', 0.05), ('dataset', 0.048), ('bins', 0.048), ('ndcgj', 0.046), ('multiple', 0.046), ('std', 0.043), ('datasets', 0.042), ('rank', 0.042), ('frank', 0.041), ('terminal', 0.041), ('convert', 0.04), ('cations', 0.038), ('cast', 0.038), ('arti', 0.037), ('microsoft', 0.037), ('ranked', 0.037), ('exponent', 0.036), ('cial', 0.036), ('levels', 0.036), ('panel', 0.036), ('burges', 0.034), ('probabilities', 0.032), ('scheme', 0.032), ('scoring', 0.031), ('lad', 0.031), ('ndcgf', 0.031), ('proprietary', 0.031), ('sl', 0.031), ('sorting', 0.031), ('appendix', 0.03), ('score', 0.03), ('cumulative', 0.03), ('surrogate', 0.029), ('errors', 0.029), ('permutation', 0.029), ('affecting', 0.028), ('gain', 0.028), ('loss', 0.028), ('implementation', 0.028), ('search', 0.027), ('engines', 0.027), ('sigir', 0.027), ('ranksvm', 0.027), ('feature', 0.026), ('class', 0.025), ('split', 0.025), ('jth', 0.025), ('sorted', 0.025), ('splitting', 0.025), ('ping', 0.024), ('convincing', 0.024), ('corporation', 0.024), ('sr', 0.024), ('total', 0.024), ('features', 0.024), ('sort', 0.024), ('quality', 0.024), ('reported', 0.023), ('evaluations', 0.023), ('rankboost', 0.023), ('binned', 0.023), ('percentages', 0.023), ('ith', 0.023), ('discounted', 0.022), ('affected', 0.022), ('implements', 0.022)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999964 126 nips-2007-McRank: Learning to Rank Using Multiple Classification and Gradient Boosting
Author: Ping Li, Qiang Wu, Christopher J. Burges
Abstract: We cast the ranking problem as (1) multiple classification (“Mc”) (2) multiple ordinal classification, which lead to computationally tractable learning algorithms for relevance ranking in Web search. We consider the DCG criterion (discounted cumulative gain), a standard quality measure in information retrieval. Our approach is motivated by the fact that perfect classifications result in perfect DCG scores and the DCG errors are bounded by classification errors. We propose using the Expected Relevance to convert class probabilities into ranking scores. The class probabilities are learned using a gradient boosting tree algorithm. Evaluations on large-scale datasets show that our approach can improve LambdaRank [5] and the regressions-based ranker [6], in terms of the (normalized) DCG scores. An efficient implementation of the boosting tree algorithm is also presented.
2 0.31244475 83 nips-2007-Evaluating Search Engines by Modeling the Relationship Between Relevance and Clicks
Author: Ben Carterette, Rosie Jones
Abstract: We propose a model that leverages the millions of clicks received by web search engines to predict document relevance. This allows the comparison of ranking functions when clicks are available but complete relevance judgments are not. After an initial training phase using a set of relevance judgments paired with click data, we show that our model can predict the relevance score of documents that have not been judged. These predictions can be used to evaluate the performance of a search engine, using our novel formalization of the confidence of the standard evaluation metric discounted cumulative gain (DCG), so comparisons can be made across time and datasets. This contrasts with previous methods which can provide only pair-wise relevance judgments between results shown for the same query. When no relevance judgments are available, we can identify the better of two ranked lists up to 82% of the time, and with only two relevance judgments for each query, we can identify the better ranking up to 94% of the time. While our experiments are on sponsored search results, which is the financial backbone of web search, our method is general enough to be applicable to algorithmic web search results as well. Furthermore, we give an algorithm to guide the selection of additional documents to judge to improve confidence. 1
3 0.26395085 41 nips-2007-COFI RANK - Maximum Margin Matrix Factorization for Collaborative Ranking
Author: Markus Weimer, Alexandros Karatzoglou, Quoc V. Le, Alex J. Smola
Abstract: In this paper, we consider collaborative filtering as a ranking problem. We present a method which uses Maximum Margin Matrix Factorization and optimizes ranking instead of rating. We employ structured output prediction to optimize directly for ranking scores. Experimental results show that our method gives very good ranking scores and scales well on collaborative filtering tasks. 1
4 0.23016717 6 nips-2007-A General Boosting Method and its Application to Learning Ranking Functions for Web Search
Author: Zhaohui Zheng, Hongyuan Zha, Tong Zhang, Olivier Chapelle, Keke Chen, Gordon Sun
Abstract: We present a general boosting method extending functional gradient boosting to optimize complex loss functions that are encountered in many machine learning problems. Our approach is based on optimization of quadratic upper bounds of the loss functions which allows us to present a rigorous convergence analysis of the algorithm. More importantly, this general framework enables us to use a standard regression base learner such as single regression tree for £tting any loss function. We illustrate an application of the proposed method in learning ranking functions for Web search by combining both preference data and labeled data for training. We present experimental results for Web search using data from a commercial search engine that show signi£cant improvements of our proposed methods over some existing methods. 1
5 0.17670812 39 nips-2007-Boosting the Area under the ROC Curve
Author: Phil Long, Rocco Servedio
Abstract: We show that any weak ranker that can achieve an area under the ROC curve slightly better than 1/2 (which can be achieved by random guessing) can be efficiently boosted to achieve an area under the ROC curve arbitrarily close to 1. We further show that this boosting can be performed even in the presence of independent misclassification noise, given access to a noise-tolerant weak ranker.
6 0.11619104 134 nips-2007-Multi-Task Learning via Conic Programming
7 0.10475378 142 nips-2007-Non-parametric Modeling of Partially Ranked Data
8 0.10181615 166 nips-2007-Regularized Boost for Semi-Supervised Learning
9 0.083449259 116 nips-2007-Learning the structure of manifolds using random projections
10 0.075973287 147 nips-2007-One-Pass Boosting
11 0.071576677 11 nips-2007-A Risk Minimization Principle for a Class of Parzen Estimators
12 0.07009659 144 nips-2007-On Ranking in Survival Analysis: Bounds on the Concordance Index
13 0.064268082 149 nips-2007-Optimal ROC Curve for a Combination of Classifiers
14 0.064073212 75 nips-2007-Efficient Bayesian Inference for Dynamically Changing Graphs
15 0.062105056 32 nips-2007-Bayesian Co-Training
16 0.059228744 160 nips-2007-Random Features for Large-Scale Kernel Machines
17 0.057125341 97 nips-2007-Hidden Common Cause Relations in Relational Learning
18 0.055314951 186 nips-2007-Statistical Analysis of Semi-Supervised Regression
19 0.055144332 16 nips-2007-A learning framework for nearest neighbor search
20 0.052388046 69 nips-2007-Discriminative Batch Mode Active Learning
topicId topicWeight
[(0, -0.175), (1, 0.05), (2, -0.131), (3, 0.097), (4, 0.073), (5, 0.122), (6, 0.228), (7, -0.17), (8, 0.21), (9, -0.133), (10, -0.102), (11, 0.072), (12, 0.44), (13, 0.14), (14, 0.044), (15, -0.011), (16, -0.072), (17, 0.019), (18, -0.059), (19, 0.079), (20, -0.052), (21, -0.034), (22, -0.006), (23, 0.023), (24, -0.01), (25, -0.046), (26, 0.032), (27, 0.043), (28, 0.031), (29, -0.0), (30, -0.038), (31, 0.053), (32, 0.071), (33, -0.009), (34, 0.033), (35, -0.028), (36, -0.056), (37, -0.054), (38, -0.028), (39, -0.041), (40, 0.074), (41, 0.091), (42, -0.047), (43, -0.111), (44, 0.099), (45, 0.006), (46, -0.015), (47, 0.042), (48, 0.022), (49, -0.06)]
simIndex simValue paperId paperTitle
same-paper 1 0.93569446 126 nips-2007-McRank: Learning to Rank Using Multiple Classification and Gradient Boosting
Author: Ping Li, Qiang Wu, Christopher J. Burges
Abstract: We cast the ranking problem as (1) multiple classification (“Mc”) (2) multiple ordinal classification, which lead to computationally tractable learning algorithms for relevance ranking in Web search. We consider the DCG criterion (discounted cumulative gain), a standard quality measure in information retrieval. Our approach is motivated by the fact that perfect classifications result in perfect DCG scores and the DCG errors are bounded by classification errors. We propose using the Expected Relevance to convert class probabilities into ranking scores. The class probabilities are learned using a gradient boosting tree algorithm. Evaluations on large-scale datasets show that our approach can improve LambdaRank [5] and the regressions-based ranker [6], in terms of the (normalized) DCG scores. An efficient implementation of the boosting tree algorithm is also presented.
2 0.88285553 83 nips-2007-Evaluating Search Engines by Modeling the Relationship Between Relevance and Clicks
Author: Ben Carterette, Rosie Jones
Abstract: We propose a model that leverages the millions of clicks received by web search engines to predict document relevance. This allows the comparison of ranking functions when clicks are available but complete relevance judgments are not. After an initial training phase using a set of relevance judgments paired with click data, we show that our model can predict the relevance score of documents that have not been judged. These predictions can be used to evaluate the performance of a search engine, using our novel formalization of the confidence of the standard evaluation metric discounted cumulative gain (DCG), so comparisons can be made across time and datasets. This contrasts with previous methods which can provide only pair-wise relevance judgments between results shown for the same query. When no relevance judgments are available, we can identify the better of two ranked lists up to 82% of the time, and with only two relevance judgments for each query, we can identify the better ranking up to 94% of the time. While our experiments are on sponsored search results, which is the financial backbone of web search, our method is general enough to be applicable to algorithmic web search results as well. Furthermore, we give an algorithm to guide the selection of additional documents to judge to improve confidence. 1
3 0.79529279 6 nips-2007-A General Boosting Method and its Application to Learning Ranking Functions for Web Search
Author: Zhaohui Zheng, Hongyuan Zha, Tong Zhang, Olivier Chapelle, Keke Chen, Gordon Sun
Abstract: We present a general boosting method extending functional gradient boosting to optimize complex loss functions that are encountered in many machine learning problems. Our approach is based on optimization of quadratic upper bounds of the loss functions which allows us to present a rigorous convergence analysis of the algorithm. More importantly, this general framework enables us to use a standard regression base learner such as single regression tree for £tting any loss function. We illustrate an application of the proposed method in learning ranking functions for Web search by combining both preference data and labeled data for training. We present experimental results for Web search using data from a commercial search engine that show signi£cant improvements of our proposed methods over some existing methods. 1
4 0.6730895 41 nips-2007-COFI RANK - Maximum Margin Matrix Factorization for Collaborative Ranking
Author: Markus Weimer, Alexandros Karatzoglou, Quoc V. Le, Alex J. Smola
Abstract: In this paper, we consider collaborative filtering as a ranking problem. We present a method which uses Maximum Margin Matrix Factorization and optimizes ranking instead of rating. We employ structured output prediction to optimize directly for ranking scores. Experimental results show that our method gives very good ranking scores and scales well on collaborative filtering tasks. 1
5 0.51572108 39 nips-2007-Boosting the Area under the ROC Curve
Author: Phil Long, Rocco Servedio
Abstract: We show that any weak ranker that can achieve an area under the ROC curve slightly better than 1/2 (which can be achieved by random guessing) can be efficiently boosted to achieve an area under the ROC curve arbitrarily close to 1. We further show that this boosting can be performed even in the presence of independent misclassification noise, given access to a noise-tolerant weak ranker.
6 0.42624572 142 nips-2007-Non-parametric Modeling of Partially Ranked Data
7 0.40841809 144 nips-2007-On Ranking in Survival Analysis: Bounds on the Concordance Index
8 0.33388177 27 nips-2007-Anytime Induction of Cost-sensitive Trees
9 0.32113761 19 nips-2007-Active Preference Learning with Discrete Choice Data
10 0.30196628 166 nips-2007-Regularized Boost for Semi-Supervised Learning
11 0.29655445 149 nips-2007-Optimal ROC Curve for a Combination of Classifiers
12 0.27950224 134 nips-2007-Multi-Task Learning via Conic Programming
13 0.27140981 88 nips-2007-Fast and Scalable Training of Semi-Supervised CRFs with Application to Activity Recognition
14 0.25787032 147 nips-2007-One-Pass Boosting
15 0.25722834 16 nips-2007-A learning framework for nearest neighbor search
16 0.25634938 116 nips-2007-Learning the structure of manifolds using random projections
17 0.25058237 45 nips-2007-Classification via Minimum Incremental Coding Length (MICL)
18 0.24402887 137 nips-2007-Multiple-Instance Pruning For Learning Efficient Cascade Detectors
19 0.24187149 175 nips-2007-Semi-Supervised Multitask Learning
20 0.23045579 32 nips-2007-Bayesian Co-Training
topicId topicWeight
[(5, 0.048), (13, 0.024), (16, 0.022), (19, 0.016), (21, 0.081), (34, 0.444), (35, 0.034), (47, 0.082), (83, 0.088), (85, 0.018), (90, 0.029)]
simIndex simValue paperId paperTitle
1 0.83511204 17 nips-2007-A neural network implementing optimal state estimation based on dynamic spike train decoding
Author: Omer Bobrowski, Ron Meir, Shy Shoham, Yonina Eldar
Abstract: It is becoming increasingly evident that organisms acting in uncertain dynamical environments often employ exact or approximate Bayesian statistical calculations in order to continuously estimate the environmental state, integrate information from multiple sensory modalities, form predictions and choose actions. What is less clear is how these putative computations are implemented by cortical neural networks. An additional level of complexity is introduced because these networks observe the world through spike trains received from primary sensory afferents, rather than directly. A recent line of research has described mechanisms by which such computations can be implemented using a network of neurons whose activity directly represents a probability distribution across the possible “world states”. Much of this work, however, uses various approximations, which severely restrict the domain of applicability of these implementations. Here we make use of rigorous mathematical results from the theory of continuous time point process filtering, and show how optimal real-time state estimation and prediction may be implemented in a general setting using linear neural networks. We demonstrate the applicability of the approach with several examples, and relate the required network properties to the statistical nature of the environment, thereby quantifying the compatibility of a given network with its environment. 1
same-paper 2 0.80645007 126 nips-2007-McRank: Learning to Rank Using Multiple Classification and Gradient Boosting
Author: Ping Li, Qiang Wu, Christopher J. Burges
Abstract: We cast the ranking problem as (1) multiple classification (“Mc”) (2) multiple ordinal classification, which lead to computationally tractable learning algorithms for relevance ranking in Web search. We consider the DCG criterion (discounted cumulative gain), a standard quality measure in information retrieval. Our approach is motivated by the fact that perfect classifications result in perfect DCG scores and the DCG errors are bounded by classification errors. We propose using the Expected Relevance to convert class probabilities into ranking scores. The class probabilities are learned using a gradient boosting tree algorithm. Evaluations on large-scale datasets show that our approach can improve LambdaRank [5] and the regressions-based ranker [6], in terms of the (normalized) DCG scores. An efficient implementation of the boosting tree algorithm is also presented.
3 0.78811491 121 nips-2007-Local Algorithms for Approximate Inference in Minor-Excluded Graphs
Author: Kyomin Jung, Devavrat Shah
Abstract: We present a new local approximation algorithm for computing MAP and logpartition function for arbitrary exponential family distribution represented by a finite-valued pair-wise Markov random field (MRF), say G. Our algorithm is based on decomposing G into appropriately chosen small components; computing estimates locally in each of these components and then producing a good global solution. We prove that the algorithm can provide approximate solution within arbitrary accuracy when G excludes some finite sized graph as its minor and G has bounded degree: all Planar graphs with bounded degree are examples of such graphs. The running time of the algorithm is Θ(n) (n is the number of nodes in G), with constant dependent on accuracy, degree of graph and size of the graph that is excluded as a minor (constant for Planar graphs). Our algorithm for minor-excluded graphs uses the decomposition scheme of Klein, Plotkin and Rao (1993). In general, our algorithm works with any decomposition scheme and provides quantifiable approximation guarantee that depends on the decomposition scheme.
4 0.75809705 44 nips-2007-Catching Up Faster in Bayesian Model Selection and Model Averaging
Author: Tim V. Erven, Steven D. Rooij, Peter Grünwald
Abstract: Bayesian model averaging, model selection and their approximations such as BIC are generally statistically consistent, but sometimes achieve slower rates of convergence than other methods such as AIC and leave-one-out cross-validation. On the other hand, these other methods can be inconsistent. We identify the catch-up phenomenon as a novel explanation for the slow convergence of Bayesian methods. Based on this analysis we define the switch-distribution, a modification of the Bayesian model averaging distribution. We prove that in many situations model selection and prediction based on the switch-distribution is both consistent and achieves optimal convergence rates, thereby resolving the AIC-BIC dilemma. The method is practical; we give an efficient algorithm. 1
5 0.48923174 83 nips-2007-Evaluating Search Engines by Modeling the Relationship Between Relevance and Clicks
Author: Ben Carterette, Rosie Jones
Abstract: We propose a model that leverages the millions of clicks received by web search engines to predict document relevance. This allows the comparison of ranking functions when clicks are available but complete relevance judgments are not. After an initial training phase using a set of relevance judgments paired with click data, we show that our model can predict the relevance score of documents that have not been judged. These predictions can be used to evaluate the performance of a search engine, using our novel formalization of the confidence of the standard evaluation metric discounted cumulative gain (DCG), so comparisons can be made across time and datasets. This contrasts with previous methods which can provide only pair-wise relevance judgments between results shown for the same query. When no relevance judgments are available, we can identify the better of two ranked lists up to 82% of the time, and with only two relevance judgments for each query, we can identify the better ranking up to 94% of the time. While our experiments are on sponsored search results, which is the financial backbone of web search, our method is general enough to be applicable to algorithmic web search results as well. Furthermore, we give an algorithm to guide the selection of additional documents to judge to improve confidence. 1
6 0.47599518 92 nips-2007-Fixing Max-Product: Convergent Message Passing Algorithms for MAP LP-Relaxations
7 0.45970514 123 nips-2007-Loop Series and Bethe Variational Bounds in Attractive Graphical Models
8 0.44585979 41 nips-2007-COFI RANK - Maximum Margin Matrix Factorization for Collaborative Ranking
9 0.43891585 171 nips-2007-Scan Strategies for Meteorological Radars
10 0.43664074 6 nips-2007-A General Boosting Method and its Application to Learning Ranking Functions for Web Search
11 0.43390104 128 nips-2007-Message Passing for Max-weight Independent Set
12 0.43179205 146 nips-2007-On higher-order perceptron algorithms
13 0.42801595 141 nips-2007-New Outer Bounds on the Marginal Polytope
14 0.4212096 149 nips-2007-Optimal ROC Curve for a Combination of Classifiers
15 0.41447589 138 nips-2007-Near-Maximum Entropy Models for Binary Neural Representations of Natural Images
17 0.4132354 39 nips-2007-Boosting the Area under the ROC Curve
18 0.41247544 187 nips-2007-Structured Learning with Approximate Inference
19 0.40987167 208 nips-2007-TrueSkill Through Time: Revisiting the History of Chess
20 0.40232012 97 nips-2007-Hidden Common Cause Relations in Relational Learning