nips nips2008 nips2008-93 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Tao Qin, Tie-yan Liu, Xu-dong Zhang, De-sheng Wang, Hang Li
Abstract: This paper studies global ranking problem by learning to rank methods. Conventional learning to rank methods are usually designed for ‘local ranking’, in the sense that the ranking model is defined on a single object, for example, a document in information retrieval. For many applications, this is a very loose approximation. Relations always exist between objects and it is better to define the ranking model as a function on all the objects to be ranked (i.e., the relations are also included). This paper refers to the problem as global ranking and proposes employing a Continuous Conditional Random Fields (CRF) for conducting the learning task. The Continuous CRF model is defined as a conditional probability distribution over ranking scores of objects conditioned on the objects. It can naturally represent the content information of objects as well as the relation information between objects, necessary for global ranking. Taking two specific information retrieval tasks as examples, the paper shows how the Continuous CRF method can perform global ranking better than baselines.
Reference: text
sentIndex sentText sentNum sentScore
1 cn Abstract This paper studies global ranking problem by learning to rank methods. [sent-4, score-0.827]
2 Conventional learning to rank methods are usually designed for ‘local ranking’, in the sense that the ranking model is defined on a single object, for example, a document in information retrieval. [sent-5, score-0.813]
3 Relations always exist between objects and it is better to define the ranking model as a function on all the objects to be ranked (i. [sent-7, score-0.834]
4 This paper refers to the problem as global ranking and proposes employing a Continuous Conditional Random Fields (CRF) for conducting the learning task. [sent-10, score-0.78]
5 The Continuous CRF model is defined as a conditional probability distribution over ranking scores of objects conditioned on the objects. [sent-11, score-0.846]
6 It can naturally represent the content information of objects as well as the relation information between objects, necessary for global ranking. [sent-12, score-0.339]
7 Taking two specific information retrieval tasks as examples, the paper shows how the Continuous CRF method can perform global ranking better than baselines. [sent-13, score-0.842]
8 1 Introduction Learning to rank is aimed at constructing a model for ordering objects by means of machine learning. [sent-14, score-0.153]
9 Traditionally learning to rank is restricted to ‘local ranking’, in which the ranking model is defined on a single object. [sent-17, score-0.723]
10 In other words, the relations between the objects are not directly represented in the model. [sent-18, score-0.134]
11 For example, in Pseudo Relevance Feedback [17, 8], we manage to rank documents on the basis of not only relevance of documents to the query, but also similarity between documents. [sent-20, score-0.862]
12 Therefore, the use of a model solely based on individual documents would not be sufficient. [sent-21, score-0.3]
13 Ideally, in information retrieval we would exploit a ranking model defined as a function on all the documents with respect to the query. [sent-24, score-0.986]
14 In other words, ranking should be conducted on the basis of the contents of objects as well as the relations between objects. [sent-25, score-0.804]
15 Conditional Random Fields (CRF) technique is a powerful tool for relational learning, because it allows the uses of both relations between objects and contents of objects [16]. [sent-27, score-0.307]
16 However, conventional CRF cannot be directly applied to global ranking because it is a discrete model in the sense that the output variables are discrete [16]. [sent-28, score-0.799]
17 The C-CRF model is defined as a conditional probability distribution over ranking scores of objects (documents) conditioned on the objects (documents). [sent-30, score-0.921]
18 We apply C-CRF to two global ranking tasks: Pseudo Relevance Feedback and Topic Distillation. [sent-33, score-0.749]
19 2 Global Ranking Problem Document ranking in information retrieval is a problem as follows. [sent-35, score-0.703]
20 When the user submits a query, the system retrieves all the documents containing at least one query term, calculates a ranking score for each of the documents using the ranking model, and sorts the documents according to the ranking scores. [sent-36, score-2.965]
21 , xn(q) } denote the documents retrieved with q, and (q) (q) (q) y (q) = {y1 , y2 , . [sent-42, score-0.307]
22 , yn(q) } denote the ranking scores assigned to the documents. [sent-45, score-0.74]
23 Here n(q) stands for the number of documents retrieved with q. [sent-46, score-0.307]
24 We assume that y (q) is determined by a ranking model. [sent-48, score-0.645]
25 We call the ranking ‘local ranking’, if the ranking model is defined as (q) yi (q) = f (xi ), i = 1, . [sent-49, score-1.428]
26 , n(q) (1) Furthermore, we call the ranking ‘global ranking’, if the ranking model is defined as y (q) = F (x(q) ) (2) The major difference between the two is that F takes on all the documents together as its input, while f takes on an individual document as its input. [sent-52, score-1.663]
27 In other words, in global ranking, we use not only the content information of documents but also the relation information between documents. [sent-53, score-0.547]
28 There are many specific application tasks that can be viewed as examples of global ranking. [sent-54, score-0.139]
29 (4) i,j k=1 k=1 Given a set of documents x(q) for a query, we select the ranking score vector y (q) with the maximum conditional probability Pr(y (q) |x(q) ) as the output of our proposed global ranking model: F (x(q) ) = arg max Pr(y (q) |x(q) ). [sent-59, score-1.766]
30 (In principle a ranking score can depend on all the documents of the query; here for ease of presentation we only consider the simple case in which it only depends on the corresponding document. [sent-62, score-0.986]
31 ) In C-CRF, feature function hk represents the dependency between the ranking score of a document and x4 x6 x2 the content of it, and feature function gk represents a relation between the ranking scores of two documents. [sent-63, score-1.91]
32 x3 x1 x5 Different retrieval tasks may have different relations (e. [sent-64, score-0.152]
33 For ease of reference, we y4 y6 y2 call the feature functions hk vertex features, and the feature functions gk edge features. [sent-67, score-0.311]
34 , xn(q) } is a set of documents q=1 (q) (q) (q) of query q, and each y (q) = {y1 , y2 , . [sent-75, score-0.406]
35 , yn(q) } is a set of ranking scores associated with the documents of query q, we employ Maximum Likelihood Estimation to estimate the parameters {α, β} of C-CRF. [sent-78, score-1.169]
36 Specifically, we calculate the conditional log likelihood of the training data with respect to the C-CRF model, N log Pr(y (q) |x(q) ; α, β). [sent-79, score-0.135]
37 L(α, β) = (6) q=1 We then use Gradient Ascend to maximze the log likelihood, and use the optimal parameter α, β to ˆ ˆ rank the documents of a new query. [sent-80, score-0.413]
38 1 Pseudo Relevance Feedback (PRF) Pseudo Relevance Feedback (PRF) [17, 8] is an example of global ranking, in which similarity between documents are considered in the ranking process. [sent-82, score-1.083]
39 The underlying assumption is that similar documents are likely to have similar ranking scores. [sent-84, score-0.928]
40 The relevance of a document to the query depends on many factors, such as term frequency, page importance, and so on. [sent-89, score-0.447]
41 For each factor we define a vertex (q) feature function. [sent-90, score-0.122]
42 Suppose that xi,k is the k-th relevance factor of document xi with respect to query 3 (q) q extracted by operator tk : xi,k = tk (xi , q). [sent-91, score-0.468]
43 We define the k-th feature function1 hk (yi , x) as hk (yi , x) = −(yi − xi,k )2 . [sent-92, score-0.176]
44 Recall that there are two rounds in PRF: the first round scores each document, and the second round re-ranks the documents considering similarity between documents. [sent-94, score-0.545]
45 Here the similarities between any two documents are supposed to be given. [sent-95, score-0.308]
46 1 g(yi , yj , x) = − Si,j (yi − yj )2 , (8) 2 where Si,j is similarity between documents xi and xj , which can be extracted by some operator s from the raw content2 of document xi and xj : Si,j = s(xi , xj ). [sent-97, score-0.844]
47 The larger Si,j is, the more similar the two documents are. [sent-98, score-0.283]
48 Sine only similarity relation is considered in this task, we have only one edge function (K2 = 1). [sent-99, score-0.208]
49 (9) plays a role similar to the first round of PRF: the ranking score yi is determined solely by the relevance factors of document xi . [sent-103, score-1.211]
50 (9) plays a role similar to the second round of PRF: it makes sure that similar documents have similar ranking scores. [sent-105, score-0.976]
51 We can see that CRF combines the two rounds of ranking of PRF into one. [sent-106, score-0.665]
52 To rank the documents of a query, we calculate the ranking scores of documents with respect to this query in the following way. [sent-107, score-1.507]
53 (11) y where e is a K1 -dimensional all-ones vector, I is an n × n identity matrix, S is a similarity matrix with Si,j = s(xi , xj ), D is an n × n diagonal matrix with Di,i = j Si,j , and X is a factor matrix with Xi,k = xi,k . [sent-109, score-0.165]
54 If we ignore the relation between documents and set β = 0, then the ranking model degenerates to F (x) = Xα, which is equivalent to a linear model used in conventional local ranking. [sent-110, score-1.144]
55 For n documents, the time complexity of straightforwardly computing the ranking model (11) is of order O(n3 ) and thus the computation is expensive. [sent-111, score-0.645]
56 We can do so by only considering the similarity between each document and its K nearest neighbors. [sent-115, score-0.141]
57 Finally we solve the following system of linear equation and take the solution as ranking scores. [sent-117, score-0.645]
58 (αT eI + βD − βS)F (x) = Xα (12) Since S is a banded matrix, the scores F (x) in Eq. [sent-118, score-0.131]
59 That is to say, the time complexity of testing a new query is comparable with those of existing local ranking methods. [sent-120, score-0.787]
60 Note that Si,j is not computed from the ranking factors of documents xi and xj but from their raw terms. [sent-122, score-1.01]
61 3 αk > 0 means that the factor xi,k is positively correlated with the ranking score yi . [sent-124, score-0.879]
62 Considering that some factor may be negatively correlated with yi , we double a factor xi,k into two factors xi,k and xi,k = −xi,k in experiments. [sent-125, score-0.239]
63 Then if αk > αk , one can get the factor xi,k is negatively correlated with the ranking score yi . [sent-126, score-0.902]
64 (13) and (14) for a single query (x(i) , y (i) , S (i) ). [sent-128, score-0.123]
65 Update log αk = log αk + η × log αk and log β = log β + η × log β end for end for Output: parameters of CRF model αk and β. [sent-129, score-0.312]
66 Algorithm 1 shows the learning algorithm based on Stochastic Gradient Ascent 4 , in which the gradient log αk and log β can be computed as follows5 . [sent-139, score-0.14]
67 In this task, one selects a page that can best represent the topic of the query from a web site by using structure (relation) information of the site. [sent-143, score-0.337]
68 If both a page and its parent page are concerned with the topic, then the parent page is preferred (to be ranked higher) [12, 11]. [sent-144, score-0.286]
69 1 Continuous CRF for Topic Distillation We define the vertex feature function hk (yi , x) in the same way as in Eq. [sent-148, score-0.173]
70 Recall that in Topic Distillation, a page is more preferred than its child page if both of them are relevant to a query. [sent-150, score-0.134]
71 Here the parent-child relation between two pages is supposed to be given. [sent-151, score-0.136]
72 Specifically, we define the (and the only) edge feature function as g(yi , yj , x) = Ri,j (yi − yj ), (15) where Ri,j = r(xi , xj ) denotes the parent-child relation: r(xi , xj ) = 1 if document xi is the parent of xj , and r(xi , xj ) = 0 for other cases. [sent-153, score-0.619]
73 The C-CRF for Topic Distillation then becomes Pr(y|x) = 4 5 1 exp Z(x) K1 −αk (yi − xi,k )2 + i k=1 βRi,j (yi − yj ) , i,j Stochastic Gradient means conducting gradient ascent from one query to another. [sent-154, score-0.393]
74 (17) i,j −αk (yi − xi,k )2 + i,j βRi,j (yi − yj ) is integrable, we must The C-CRF can naturally model Topic Distillation: if the value of Ri,j is one, then the value of yi is large than that of yj with high probability. [sent-158, score-0.4]
75 To rank the documents of a query, we calculate the ranking scores in the following way. [sent-159, score-1.101]
76 Similarly to Pseudo Relevance Feedback, if we ignore the relation between documents and set β = 0, the ranking model degenerates to a linear ranking model in conventional local ranking. [sent-161, score-1.789]
77 3 Continuous CRF for Multiple Relations We only consider using one type of relation in the previous two cases. [sent-171, score-0.111]
78 We can also conduct global ranking by utilizing multiple types of relation. [sent-172, score-0.749]
79 It can easily incorporate various types of relation as edge feature functions. [sent-174, score-0.191]
80 For example, we can combine similarity relation and parent-child relation by using the following C-CRF model: K1 1 Si,j Pr(y|x) = exp −αk (yi − xi,k )2 + β1 Ri,j (yi − yj ) − β2 (yi − yj )2 . [sent-175, score-0.558]
81 Z(x) 2 i i,j k=1 In this case, the ranking scores of documents for a new query is calculated as follows. [sent-176, score-1.146]
82 As baseline methods for the two tasks, we used several local ranking algorithms such as BM25, RankSVM [7] and ListNet [2]. [sent-216, score-0.685]
83 For Pseudo Relevance Feedback, we also compared with a traditional feedback method based on BM25 (BM25-PRF for short). [sent-219, score-0.142]
84 For Topic Distillation, we also compared with two traditional methods, sitemap based term propagation (ST) and sitemap based score propagation (SS) [11], which propagate the relevance along sitemap structure. [sent-220, score-0.446]
85 These algorithms can be regarded as a kind of global ranking methods but they are not based on supervised learning. [sent-221, score-0.767]
86 The left part of Table 1 shows the ranking accuracies of BM25, BM25-PRF, RankSVM, ListNet, and C-CRF, in terms of NDCG averaged over five trials on OHSUMED data. [sent-223, score-0.645]
87 The results indicate that C-CRF based global ranking can indeed improve search relevance. [sent-226, score-0.749]
88 The result indicates that C-CRF based global ranking can achieve better results than local ranking for this task. [sent-233, score-1.413]
89 6 Related Work Most existing work on using relation information in learning is for classification (e. [sent-236, score-0.111]
90 In contrast, C-CRF can easily do it by incorportating the relations in different edge feature functions. [sent-244, score-0.139]
91 There is a hyper parameter β in RRSVM representing the trade-off between content and relation information. [sent-245, score-0.16]
92 7 Besides, C-CRF achieves better ranking accuracy than that reported for RRSVM [14] on the same benchmark dataset. [sent-252, score-0.672]
93 7 Conclusions We studied learning to rank methods for global ranking problem, in which we use both content information of objects and relation information between objects for ranking. [sent-253, score-1.137]
94 Experimental results on benchmark data show that C-CRF improves upon the baseline methods in the global ranking tasks. [sent-256, score-0.797]
95 (2) We have assumed absolute ranking scores given in training data. [sent-260, score-0.74]
96 (3) We have studied two global ranking tasks: Pseudo Relevance Feedback and Topic Distillation. [sent-262, score-0.749]
97 Co-clustering documents and words using bipartite spectral graph partitioning. [sent-292, score-0.283]
98 A document-document similarity measure based on cited titles and probability theory, and its application to relevance feedback retrieval. [sent-318, score-0.339]
99 Global ranking of documents using continuous conditional random fields. [sent-371, score-1.007]
100 Learning to rank relational objects and its application to web search. [sent-384, score-0.227]
wordName wordTfidf (topN-words)
[('ranking', 0.645), ('documents', 0.283), ('distillation', 0.214), ('crf', 0.187), ('ranksvm', 0.179), ('relevance', 0.167), ('pseudo', 0.156), ('prf', 0.143), ('yi', 0.138), ('yj', 0.131), ('topic', 0.125), ('listnet', 0.125), ('query', 0.123), ('feedback', 0.121), ('relation', 0.111), ('rrsvm', 0.107), ('global', 0.104), ('scores', 0.095), ('qin', 0.094), ('document', 0.09), ('ndcg', 0.086), ('rank', 0.078), ('objects', 0.075), ('hk', 0.071), ('vertex', 0.068), ('page', 0.067), ('pr', 0.065), ('relations', 0.059), ('score', 0.058), ('gk', 0.058), ('retrieval', 0.058), ('letor', 0.057), ('liu', 0.057), ('sitemap', 0.054), ('subtopic', 0.054), ('log', 0.052), ('relational', 0.052), ('similarity', 0.051), ('dr', 0.051), ('conventional', 0.05), ('ascent', 0.049), ('content', 0.049), ('continuous', 0.048), ('round', 0.048), ('edge', 0.046), ('dc', 0.045), ('sigir', 0.045), ('ss', 0.045), ('ohsumed', 0.04), ('ranked', 0.039), ('bt', 0.038), ('fields', 0.036), ('banded', 0.036), ('degenerates', 0.036), ('gradient', 0.036), ('tasks', 0.035), ('feature', 0.034), ('xj', 0.034), ('ei', 0.031), ('conducting', 0.031), ('conducts', 0.031), ('please', 0.031), ('zhang', 0.031), ('conditional', 0.031), ('listwise', 0.029), ('wang', 0.028), ('xi', 0.028), ('benchmark', 0.027), ('tao', 0.027), ('xiong', 0.027), ('supposed', 0.025), ('contents', 0.025), ('st', 0.025), ('retrieved', 0.024), ('parent', 0.023), ('negatively', 0.023), ('integrable', 0.023), ('exp', 0.023), ('employ', 0.023), ('guarantee', 0.022), ('web', 0.022), ('kdd', 0.021), ('white', 0.021), ('td', 0.021), ('baseline', 0.021), ('traditional', 0.021), ('technique', 0.021), ('matrix', 0.02), ('tk', 0.02), ('rounds', 0.02), ('factor', 0.02), ('factors', 0.02), ('dependency', 0.02), ('propagation', 0.019), ('item', 0.019), ('local', 0.019), ('performances', 0.018), ('correlated', 0.018), ('kind', 0.018), ('solely', 0.017)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000004 93 nips-2008-Global Ranking Using Continuous Conditional Random Fields
Author: Tao Qin, Tie-yan Liu, Xu-dong Zhang, De-sheng Wang, Hang Li
Abstract: This paper studies global ranking problem by learning to rank methods. Conventional learning to rank methods are usually designed for ‘local ranking’, in the sense that the ranking model is defined on a single object, for example, a document in information retrieval. For many applications, this is a very loose approximation. Relations always exist between objects and it is better to define the ranking model as a function on all the objects to be ranked (i.e., the relations are also included). This paper refers to the problem as global ranking and proposes employing a Continuous Conditional Random Fields (CRF) for conducting the learning task. The Continuous CRF model is defined as a conditional probability distribution over ranking scores of objects conditioned on the objects. It can naturally represent the content information of objects as well as the relation information between objects, necessary for global ranking. Taking two specific information retrieval tasks as examples, the paper shows how the Continuous CRF method can perform global ranking better than baselines.
2 0.30670369 190 nips-2008-Reconciling Real Scores with Binary Comparisons: A New Logistic Based Model for Ranking
Author: Nir Ailon
Abstract: The problem of ranking arises ubiquitously in almost every aspect of life, and in particular in Machine Learning/Information Retrieval. A statistical model for ranking predicts how humans rank subsets V of some universe U . In this work we define a statistical model for ranking that satisfies certain desirable properties. The model automatically gives rise to a logistic regression based approach to learning how to rank, for which the score and comparison based approaches are dual views. This offers a new generative approach to ranking which can be used for IR. There are two main contexts for this work. The first is the theory of econometrics and study of statistical models explaining human choice of alternatives. In this context, we will compare our model with other well known models. The second context is the problem of ranking in machine learning, usually arising in the context of information retrieval. Here, much work has been done in the discriminative setting, where different heuristics are used to define ranking risk functions. Our model is built rigorously and axiomatically based on very simple desirable properties defined locally for comparisons, and automatically implies the existence of a global score function serving as a natural model parameter which can be efficiently fitted to pairwise comparison judgment data by solving a convex optimization problem. 1
3 0.26559818 224 nips-2008-Structured ranking learning using cumulative distribution networks
Author: Jim C. Huang, Brendan J. Frey
Abstract: Ranking is at the heart of many information retrieval applications. Unlike standard regression or classification in which we predict outputs independently, in ranking we are interested in predicting structured outputs so that misranking one object can significantly affect whether we correctly rank the other objects. In practice, the problem of ranking involves a large number of objects to be ranked and either approximate structured prediction methods are required, or assumptions of independence between object scores must be made in order to make the problem tractable. We present a probabilistic method for learning to rank using the graphical modelling framework of cumulative distribution networks (CDNs), where we can take into account the structure inherent to the problem of ranking by modelling the joint cumulative distribution functions (CDFs) over multiple pairwise preferences. We apply our framework to the problem of document retrieval in the case of the OHSUMED benchmark dataset. We will show that the RankNet, ListNet and ListMLE probabilistic models can be viewed as particular instances of CDNs and that our proposed framework allows for the exploration of a broad class of flexible structured loss functionals for learning to rank. 1
4 0.16951099 72 nips-2008-Empirical performance maximization for linear rank statistics
Author: Stéphan J. Clémençcon, Nicolas Vayatis
Abstract: The ROC curve is known to be the golden standard for measuring performance of a test/scoring statistic regarding its capacity of discrimination between two populations in a wide variety of applications, ranging from anomaly detection in signal processing to information retrieval, through medical diagnosis. Most practical performance measures used in scoring applications such as the AUC, the local AUC, the p-norm push, the DCG and others, can be seen as summaries of the ROC curve. This paper highlights the fact that many of these empirical criteria can be expressed as (conditional) linear rank statistics. We investigate the properties of empirical maximizers of such performance criteria and provide preliminary results for the concentration properties of a novel class of random variables that we will call a linear rank process. 1
5 0.16844137 73 nips-2008-Estimating Robust Query Models with Convex Optimization
Author: Kevyn Collins-thompson
Abstract: Query expansion is a long-studied approach for improving retrieval effectiveness by enhancing the user’s original query with additional related words. Current algorithms for automatic query expansion can often improve retrieval accuracy on average, but are not robust: that is, they are highly unstable and have poor worst-case performance for individual queries. To address this problem, we introduce a novel formulation of query expansion as a convex optimization problem over a word graph. The model combines initial weights from a baseline feedback algorithm with edge weights based on word similarity, and integrates simple constraints to enforce set-based criteria such as aspect balance, aspect coverage, and term centrality. Results across multiple standard test collections show consistent and significant reductions in the number and magnitude of expansion failures, while retaining the strong positive gains of the baseline algorithm. Our approach does not assume a particular retrieval model, making it applicable to a broad class of existing expansion algorithms. 1
6 0.1645274 239 nips-2008-Tighter Bounds for Structured Estimation
7 0.13630131 184 nips-2008-Predictive Indexing for Fast Search
8 0.12684599 114 nips-2008-Large Margin Taxonomy Embedding for Document Categorization
9 0.1057051 229 nips-2008-Syntactic Topic Models
10 0.098567344 120 nips-2008-Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text
11 0.094241194 174 nips-2008-Overlaying classifiers: a practical approach for optimal ranking
12 0.088609137 64 nips-2008-DiscLDA: Discriminative Learning for Dimensionality Reduction and Classification
13 0.081104234 106 nips-2008-Inferring rankings under constrained sensing
14 0.073039256 113 nips-2008-Kernelized Sorting
15 0.071330331 116 nips-2008-Learning Hybrid Models for Image Annotation with Partially Labeled Data
16 0.071149834 194 nips-2008-Regularized Learning with Networks of Features
17 0.070362054 197 nips-2008-Relative Performance Guarantees for Approximate Inference in Latent Dirichlet Allocation
18 0.061709858 248 nips-2008-Using matrices to model symbolic relationship
19 0.058468133 205 nips-2008-Semi-supervised Learning with Weakly-Related Unlabeled Data : Towards Better Text Categorization
20 0.056139443 207 nips-2008-Shape-Based Object Localization for Descriptive Classification
topicId topicWeight
[(0, -0.183), (1, -0.101), (2, -0.045), (3, -0.07), (4, -0.076), (5, -0.077), (6, 0.356), (7, -0.007), (8, 0.008), (9, 0.098), (10, -0.176), (11, -0.072), (12, -0.1), (13, 0.165), (14, 0.146), (15, -0.026), (16, 0.087), (17, -0.031), (18, 0.207), (19, 0.184), (20, 0.032), (21, -0.072), (22, -0.062), (23, 0.005), (24, 0.116), (25, -0.032), (26, -0.163), (27, 0.083), (28, 0.147), (29, -0.115), (30, 0.068), (31, -0.092), (32, -0.034), (33, 0.104), (34, -0.028), (35, 0.007), (36, -0.13), (37, -0.068), (38, 0.095), (39, -0.095), (40, -0.035), (41, 0.055), (42, 0.008), (43, 0.01), (44, -0.06), (45, -0.026), (46, 0.045), (47, 0.022), (48, 0.037), (49, 0.027)]
simIndex simValue paperId paperTitle
same-paper 1 0.97333324 93 nips-2008-Global Ranking Using Continuous Conditional Random Fields
Author: Tao Qin, Tie-yan Liu, Xu-dong Zhang, De-sheng Wang, Hang Li
Abstract: This paper studies global ranking problem by learning to rank methods. Conventional learning to rank methods are usually designed for ‘local ranking’, in the sense that the ranking model is defined on a single object, for example, a document in information retrieval. For many applications, this is a very loose approximation. Relations always exist between objects and it is better to define the ranking model as a function on all the objects to be ranked (i.e., the relations are also included). This paper refers to the problem as global ranking and proposes employing a Continuous Conditional Random Fields (CRF) for conducting the learning task. The Continuous CRF model is defined as a conditional probability distribution over ranking scores of objects conditioned on the objects. It can naturally represent the content information of objects as well as the relation information between objects, necessary for global ranking. Taking two specific information retrieval tasks as examples, the paper shows how the Continuous CRF method can perform global ranking better than baselines.
2 0.87718594 190 nips-2008-Reconciling Real Scores with Binary Comparisons: A New Logistic Based Model for Ranking
Author: Nir Ailon
Abstract: The problem of ranking arises ubiquitously in almost every aspect of life, and in particular in Machine Learning/Information Retrieval. A statistical model for ranking predicts how humans rank subsets V of some universe U . In this work we define a statistical model for ranking that satisfies certain desirable properties. The model automatically gives rise to a logistic regression based approach to learning how to rank, for which the score and comparison based approaches are dual views. This offers a new generative approach to ranking which can be used for IR. There are two main contexts for this work. The first is the theory of econometrics and study of statistical models explaining human choice of alternatives. In this context, we will compare our model with other well known models. The second context is the problem of ranking in machine learning, usually arising in the context of information retrieval. Here, much work has been done in the discriminative setting, where different heuristics are used to define ranking risk functions. Our model is built rigorously and axiomatically based on very simple desirable properties defined locally for comparisons, and automatically implies the existence of a global score function serving as a natural model parameter which can be efficiently fitted to pairwise comparison judgment data by solving a convex optimization problem. 1
3 0.76667446 224 nips-2008-Structured ranking learning using cumulative distribution networks
Author: Jim C. Huang, Brendan J. Frey
Abstract: Ranking is at the heart of many information retrieval applications. Unlike standard regression or classification in which we predict outputs independently, in ranking we are interested in predicting structured outputs so that misranking one object can significantly affect whether we correctly rank the other objects. In practice, the problem of ranking involves a large number of objects to be ranked and either approximate structured prediction methods are required, or assumptions of independence between object scores must be made in order to make the problem tractable. We present a probabilistic method for learning to rank using the graphical modelling framework of cumulative distribution networks (CDNs), where we can take into account the structure inherent to the problem of ranking by modelling the joint cumulative distribution functions (CDFs) over multiple pairwise preferences. We apply our framework to the problem of document retrieval in the case of the OHSUMED benchmark dataset. We will show that the RankNet, ListNet and ListMLE probabilistic models can be viewed as particular instances of CDNs and that our proposed framework allows for the exploration of a broad class of flexible structured loss functionals for learning to rank. 1
4 0.46840349 72 nips-2008-Empirical performance maximization for linear rank statistics
Author: Stéphan J. Clémençcon, Nicolas Vayatis
Abstract: The ROC curve is known to be the golden standard for measuring performance of a test/scoring statistic regarding its capacity of discrimination between two populations in a wide variety of applications, ranging from anomaly detection in signal processing to information retrieval, through medical diagnosis. Most practical performance measures used in scoring applications such as the AUC, the local AUC, the p-norm push, the DCG and others, can be seen as summaries of the ROC curve. This paper highlights the fact that many of these empirical criteria can be expressed as (conditional) linear rank statistics. We investigate the properties of empirical maximizers of such performance criteria and provide preliminary results for the concentration properties of a novel class of random variables that we will call a linear rank process. 1
5 0.43745768 114 nips-2008-Large Margin Taxonomy Embedding for Document Categorization
Author: Kilian Q. Weinberger, Olivier Chapelle
Abstract: Applications of multi-class classification, such as document categorization, often appear in cost-sensitive settings. Recent work has significantly improved the state of the art by moving beyond “flat” classification through incorporation of class hierarchies [4]. We present a novel algorithm that goes beyond hierarchical classification and estimates the latent semantic space that underlies the class hierarchy. In this space, each class is represented by a prototype and classification is done with the simple nearest neighbor rule. The optimization of the semantic space incorporates large margin constraints that ensure that for each instance the correct class prototype is closer than any other. We show that our optimization is convex and can be solved efficiently for large data sets. Experiments on the OHSUMED medical journal data base yield state-of-the-art results on topic categorization. 1
6 0.42081836 73 nips-2008-Estimating Robust Query Models with Convex Optimization
7 0.41889939 184 nips-2008-Predictive Indexing for Fast Search
8 0.40270814 239 nips-2008-Tighter Bounds for Structured Estimation
9 0.3435958 169 nips-2008-Online Models for Content Optimization
10 0.33060229 64 nips-2008-DiscLDA: Discriminative Learning for Dimensionality Reduction and Classification
11 0.33051383 106 nips-2008-Inferring rankings under constrained sensing
12 0.31309083 229 nips-2008-Syntactic Topic Models
13 0.29116267 197 nips-2008-Relative Performance Guarantees for Approximate Inference in Latent Dirichlet Allocation
14 0.27819386 134 nips-2008-Mixed Membership Stochastic Blockmodels
15 0.26830876 176 nips-2008-Partially Observed Maximum Entropy Discrimination Markov Networks
16 0.25871399 120 nips-2008-Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text
17 0.24413201 183 nips-2008-Predicting the Geometry of Metal Binding Sites from Protein Sequence
18 0.23359159 78 nips-2008-Exact Convex Confidence-Weighted Learning
19 0.23283824 113 nips-2008-Kernelized Sorting
20 0.21957998 28 nips-2008-Asynchronous Distributed Learning of Topic Models
topicId topicWeight
[(4, 0.014), (6, 0.043), (7, 0.084), (12, 0.04), (15, 0.042), (28, 0.254), (35, 0.015), (57, 0.071), (59, 0.022), (63, 0.011), (77, 0.046), (78, 0.017), (83, 0.069), (99, 0.184)]
simIndex simValue paperId paperTitle
same-paper 1 0.89878237 93 nips-2008-Global Ranking Using Continuous Conditional Random Fields
Author: Tao Qin, Tie-yan Liu, Xu-dong Zhang, De-sheng Wang, Hang Li
Abstract: This paper studies global ranking problem by learning to rank methods. Conventional learning to rank methods are usually designed for ‘local ranking’, in the sense that the ranking model is defined on a single object, for example, a document in information retrieval. For many applications, this is a very loose approximation. Relations always exist between objects and it is better to define the ranking model as a function on all the objects to be ranked (i.e., the relations are also included). This paper refers to the problem as global ranking and proposes employing a Continuous Conditional Random Fields (CRF) for conducting the learning task. The Continuous CRF model is defined as a conditional probability distribution over ranking scores of objects conditioned on the objects. It can naturally represent the content information of objects as well as the relation information between objects, necessary for global ranking. Taking two specific information retrieval tasks as examples, the paper shows how the Continuous CRF method can perform global ranking better than baselines.
2 0.89862627 200 nips-2008-Robust Kernel Principal Component Analysis
Author: Minh H. Nguyen, Fernando Torre
Abstract: Kernel Principal Component Analysis (KPCA) is a popular generalization of linear PCA that allows non-linear feature extraction. In KPCA, data in the input space is mapped to higher (usually) dimensional feature space where the data can be linearly modeled. The feature space is typically induced implicitly by a kernel function, and linear PCA in the feature space is performed via the kernel trick. However, due to the implicitness of the feature space, some extensions of PCA such as robust PCA cannot be directly generalized to KPCA. This paper presents a technique to overcome this problem, and extends it to a unified framework for treating noise, missing data, and outliers in KPCA. Our method is based on a novel cost function to perform inference in KPCA. Extensive experiments, in both synthetic and real data, show that our algorithm outperforms existing methods. 1
3 0.8364296 231 nips-2008-Temporal Dynamics of Cognitive Control
Author: Jeremy Reynolds, Michael C. Mozer
Abstract: Cognitive control refers to the flexible deployment of memory and attention in response to task demands and current goals. Control is often studied experimentally by presenting sequences of stimuli, some demanding a response, and others modulating the stimulus-response mapping. In these tasks, participants must maintain information about the current stimulus-response mapping in working memory. Prominent theories of cognitive control use recurrent neural nets to implement working memory, and optimize memory utilization via reinforcement learning. We present a novel perspective on cognitive control in which working memory representations are intrinsically probabilistic, and control operations that maintain and update working memory are dynamically determined via probabilistic inference. We show that our model provides a parsimonious account of behavioral and neuroimaging data, and suggest that it offers an elegant conceptualization of control in which behavior can be cast as optimal, subject to limitations on learning and the rate of information processing. Moreover, our model provides insight into how task instructions can be directly translated into appropriate behavior and then efficiently refined with subsequent task experience. 1
4 0.83053857 47 nips-2008-Clustered Multi-Task Learning: A Convex Formulation
Author: Laurent Jacob, Jean-philippe Vert, Francis R. Bach
Abstract: In multi-task learning several related tasks are considered simultaneously, with the hope that by an appropriate sharing of information across tasks, each task may benefit from the others. In the context of learning linear functions for supervised classification or regression, this can be achieved by including a priori information about the weight vectors associated with the tasks, and how they are expected to be related to each other. In this paper, we assume that tasks are clustered into groups, which are unknown beforehand, and that tasks within a group have similar weight vectors. We design a new spectral norm that encodes this a priori assumption, without the prior knowledge of the partition of tasks into groups, resulting in a new convex optimization formulation for multi-task learning. We show in simulations on synthetic examples and on the IEDB MHC-I binding dataset, that our approach outperforms well-known convex methods for multi-task learning, as well as related non-convex methods dedicated to the same problem. 1
5 0.83039236 138 nips-2008-Modeling human function learning with Gaussian processes
Author: Thomas L. Griffiths, Chris Lucas, Joseph Williams, Michael L. Kalish
Abstract: Accounts of how people learn functional relationships between continuous variables have tended to focus on two possibilities: that people are estimating explicit functions, or that they are performing associative learning supported by similarity. We provide a rational analysis of function learning, drawing on work on regression in machine learning and statistics. Using the equivalence of Bayesian linear regression and Gaussian processes, we show that learning explicit rules and using similarity can be seen as two views of one solution to this problem. We use this insight to define a Gaussian process model of human function learning that combines the strengths of both approaches. 1
6 0.82832134 211 nips-2008-Simple Local Models for Complex Dynamical Systems
7 0.82828122 4 nips-2008-A Scalable Hierarchical Distributed Language Model
8 0.82730043 224 nips-2008-Structured ranking learning using cumulative distribution networks
10 0.82445949 227 nips-2008-Supervised Exponential Family Principal Component Analysis via Convex Optimization
11 0.82426679 101 nips-2008-Human Active Learning
12 0.82401782 176 nips-2008-Partially Observed Maximum Entropy Discrimination Markov Networks
13 0.82398576 34 nips-2008-Bayesian Network Score Approximation using a Metagraph Kernel
14 0.82215685 113 nips-2008-Kernelized Sorting
15 0.82170087 112 nips-2008-Kernel Measures of Independence for non-iid Data
16 0.81984228 247 nips-2008-Using Bayesian Dynamical Systems for Motion Template Libraries
17 0.819511 218 nips-2008-Spectral Clustering with Perturbed Data
18 0.81899077 50 nips-2008-Continuously-adaptive discretization for message-passing algorithms
19 0.8188014 63 nips-2008-Dimensionality Reduction for Data in Multiple Feature Representations
20 0.81830299 195 nips-2008-Regularized Policy Iteration