nips nips2008 nips2008-93 knowledge-graph by maker-knowledge-mining

93 nips-2008-Global Ranking Using Continuous Conditional Random Fields

Source: pdf

Author: Tao Qin, Tie-yan Liu, Xu-dong Zhang, De-sheng Wang, Hang Li

Abstract: This paper studies global ranking problem by learning to rank methods. Conventional learning to rank methods are usually designed for ‘local ranking’, in the sense that the ranking model is deﬁned on a single object, for example, a document in information retrieval. For many applications, this is a very loose approximation. Relations always exist between objects and it is better to deﬁne the ranking model as a function on all the objects to be ranked (i.e., the relations are also included). This paper refers to the problem as global ranking and proposes employing a Continuous Conditional Random Fields (CRF) for conducting the learning task. The Continuous CRF model is deﬁned as a conditional probability distribution over ranking scores of objects conditioned on the objects. It can naturally represent the content information of objects as well as the relation information between objects, necessary for global ranking. Taking two speciﬁc information retrieval tasks as examples, the paper shows how the Continuous CRF method can perform global ranking better than baselines.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 cn Abstract This paper studies global ranking problem by learning to rank methods. [sent-4, score-0.827]

2 Conventional learning to rank methods are usually designed for ‘local ranking’, in the sense that the ranking model is deﬁned on a single object, for example, a document in information retrieval. [sent-5, score-0.813]

3 Relations always exist between objects and it is better to deﬁne the ranking model as a function on all the objects to be ranked (i. [sent-7, score-0.834]

4 This paper refers to the problem as global ranking and proposes employing a Continuous Conditional Random Fields (CRF) for conducting the learning task. [sent-10, score-0.78]

5 The Continuous CRF model is deﬁned as a conditional probability distribution over ranking scores of objects conditioned on the objects. [sent-11, score-0.846]

6 It can naturally represent the content information of objects as well as the relation information between objects, necessary for global ranking. [sent-12, score-0.339]

7 Taking two speciﬁc information retrieval tasks as examples, the paper shows how the Continuous CRF method can perform global ranking better than baselines. [sent-13, score-0.842]

8 1 Introduction Learning to rank is aimed at constructing a model for ordering objects by means of machine learning. [sent-14, score-0.153]

9 Traditionally learning to rank is restricted to ‘local ranking’, in which the ranking model is deﬁned on a single object. [sent-17, score-0.723]

10 In other words, the relations between the objects are not directly represented in the model. [sent-18, score-0.134]

11 For example, in Pseudo Relevance Feedback [17, 8], we manage to rank documents on the basis of not only relevance of documents to the query, but also similarity between documents. [sent-20, score-0.862]

12 Therefore, the use of a model solely based on individual documents would not be sufﬁcient. [sent-21, score-0.3]

13 Ideally, in information retrieval we would exploit a ranking model deﬁned as a function on all the documents with respect to the query. [sent-24, score-0.986]

14 In other words, ranking should be conducted on the basis of the contents of objects as well as the relations between objects. [sent-25, score-0.804]

15 Conditional Random Fields (CRF) technique is a powerful tool for relational learning, because it allows the uses of both relations between objects and contents of objects [16]. [sent-27, score-0.307]

16 However, conventional CRF cannot be directly applied to global ranking because it is a discrete model in the sense that the output variables are discrete [16]. [sent-28, score-0.799]

17 The C-CRF model is deﬁned as a conditional probability distribution over ranking scores of objects (documents) conditioned on the objects (documents). [sent-30, score-0.921]

18 We apply C-CRF to two global ranking tasks: Pseudo Relevance Feedback and Topic Distillation. [sent-33, score-0.749]

19 2 Global Ranking Problem Document ranking in information retrieval is a problem as follows. [sent-35, score-0.703]

20 When the user submits a query, the system retrieves all the documents containing at least one query term, calculates a ranking score for each of the documents using the ranking model, and sorts the documents according to the ranking scores. [sent-36, score-2.965]

21 , xn(q) } denote the documents retrieved with q, and (q) (q) (q) y (q) = {y1 , y2 , . [sent-42, score-0.307]

22 , yn(q) } denote the ranking scores assigned to the documents. [sent-45, score-0.74]

23 Here n(q) stands for the number of documents retrieved with q. [sent-46, score-0.307]

24 We assume that y (q) is determined by a ranking model. [sent-48, score-0.645]

25 We call the ranking ‘local ranking’, if the ranking model is deﬁned as (q) yi (q) = f (xi ), i = 1, . [sent-49, score-1.428]

26 , n(q) (1) Furthermore, we call the ranking ‘global ranking’, if the ranking model is deﬁned as y (q) = F (x(q) ) (2) The major difference between the two is that F takes on all the documents together as its input, while f takes on an individual document as its input. [sent-52, score-1.663]

27 In other words, in global ranking, we use not only the content information of documents but also the relation information between documents. [sent-53, score-0.547]

28 There are many speciﬁc application tasks that can be viewed as examples of global ranking. [sent-54, score-0.139]

29 (4) i,j k=1 k=1 Given a set of documents x(q) for a query, we select the ranking score vector y (q) with the maximum conditional probability Pr(y (q) |x(q) ) as the output of our proposed global ranking model: F (x(q) ) = arg max Pr(y (q) |x(q) ). [sent-59, score-1.766]

30 (In principle a ranking score can depend on all the documents of the query; here for ease of presentation we only consider the simple case in which it only depends on the corresponding document. [sent-62, score-0.986]

31 ) In C-CRF, feature function hk represents the dependency between the ranking score of a document and x4 x6 x2 the content of it, and feature function gk represents a relation between the ranking scores of two documents. [sent-63, score-1.91]

32 x3 x1 x5 Different retrieval tasks may have different relations (e. [sent-64, score-0.152]

33 For ease of reference, we y4 y6 y2 call the feature functions hk vertex features, and the feature functions gk edge features. [sent-67, score-0.311]

34 , xn(q) } is a set of documents q=1 (q) (q) (q) of query q, and each y (q) = {y1 , y2 , . [sent-75, score-0.406]

35 , yn(q) } is a set of ranking scores associated with the documents of query q, we employ Maximum Likelihood Estimation to estimate the parameters {α, β} of C-CRF. [sent-78, score-1.169]

36 Speciﬁcally, we calculate the conditional log likelihood of the training data with respect to the C-CRF model, N log Pr(y (q) |x(q) ; α, β). [sent-79, score-0.135]

37 L(α, β) = (6) q=1 We then use Gradient Ascend to maximze the log likelihood, and use the optimal parameter α, β to ˆ ˆ rank the documents of a new query. [sent-80, score-0.413]

38 1 Pseudo Relevance Feedback (PRF) Pseudo Relevance Feedback (PRF) [17, 8] is an example of global ranking, in which similarity between documents are considered in the ranking process. [sent-82, score-1.083]

39 The underlying assumption is that similar documents are likely to have similar ranking scores. [sent-84, score-0.928]

40 The relevance of a document to the query depends on many factors, such as term frequency, page importance, and so on. [sent-89, score-0.447]

41 For each factor we deﬁne a vertex (q) feature function. [sent-90, score-0.122]

42 Suppose that xi,k is the k-th relevance factor of document xi with respect to query 3 (q) q extracted by operator tk : xi,k = tk (xi , q). [sent-91, score-0.468]

43 We deﬁne the k-th feature function1 hk (yi , x) as hk (yi , x) = −(yi − xi,k )2 . [sent-92, score-0.176]

44 Recall that there are two rounds in PRF: the ﬁrst round scores each document, and the second round re-ranks the documents considering similarity between documents. [sent-94, score-0.545]

45 Here the similarities between any two documents are supposed to be given. [sent-95, score-0.308]

46 1 g(yi , yj , x) = − Si,j (yi − yj )2 , (8) 2 where Si,j is similarity between documents xi and xj , which can be extracted by some operator s from the raw content2 of document xi and xj : Si,j = s(xi , xj ). [sent-97, score-0.844]

47 The larger Si,j is, the more similar the two documents are. [sent-98, score-0.283]

48 Sine only similarity relation is considered in this task, we have only one edge function (K2 = 1). [sent-99, score-0.208]

49 (9) plays a role similar to the ﬁrst round of PRF: the ranking score yi is determined solely by the relevance factors of document xi . [sent-103, score-1.211]

50 (9) plays a role similar to the second round of PRF: it makes sure that similar documents have similar ranking scores. [sent-105, score-0.976]

51 We can see that CRF combines the two rounds of ranking of PRF into one. [sent-106, score-0.665]

52 To rank the documents of a query, we calculate the ranking scores of documents with respect to this query in the following way. [sent-107, score-1.507]

53 (11) y where e is a K1 -dimensional all-ones vector, I is an n × n identity matrix, S is a similarity matrix with Si,j = s(xi , xj ), D is an n × n diagonal matrix with Di,i = j Si,j , and X is a factor matrix with Xi,k = xi,k . [sent-109, score-0.165]

54 If we ignore the relation between documents and set β = 0, then the ranking model degenerates to F (x) = Xα, which is equivalent to a linear model used in conventional local ranking. [sent-110, score-1.144]

55 For n documents, the time complexity of straightforwardly computing the ranking model (11) is of order O(n3 ) and thus the computation is expensive. [sent-111, score-0.645]

56 We can do so by only considering the similarity between each document and its K nearest neighbors. [sent-115, score-0.141]

57 Finally we solve the following system of linear equation and take the solution as ranking scores. [sent-117, score-0.645]

58 (αT eI + βD − βS)F (x) = Xα (12) Since S is a banded matrix, the scores F (x) in Eq. [sent-118, score-0.131]

59 That is to say, the time complexity of testing a new query is comparable with those of existing local ranking methods. [sent-120, score-0.787]

60 Note that Si,j is not computed from the ranking factors of documents xi and xj but from their raw terms. [sent-122, score-1.01]

61 3 αk > 0 means that the factor xi,k is positively correlated with the ranking score yi . [sent-124, score-0.879]

62 Considering that some factor may be negatively correlated with yi , we double a factor xi,k into two factors xi,k and xi,k = −xi,k in experiments. [sent-125, score-0.239]

63 Then if αk > αk , one can get the factor xi,k is negatively correlated with the ranking score yi . [sent-126, score-0.902]

64 (13) and (14) for a single query (x(i) , y (i) , S (i) ). [sent-128, score-0.123]

65 Update log αk = log αk + η × log αk and log β = log β + η × log β end for end for Output: parameters of CRF model αk and β. [sent-129, score-0.312]

66 Algorithm 1 shows the learning algorithm based on Stochastic Gradient Ascent 4 , in which the gradient log αk and log β can be computed as follows5 . [sent-139, score-0.14]

67 In this task, one selects a page that can best represent the topic of the query from a web site by using structure (relation) information of the site. [sent-143, score-0.337]

68 If both a page and its parent page are concerned with the topic, then the parent page is preferred (to be ranked higher) [12, 11]. [sent-144, score-0.286]

69 1 Continuous CRF for Topic Distillation We deﬁne the vertex feature function hk (yi , x) in the same way as in Eq. [sent-148, score-0.173]

70 Recall that in Topic Distillation, a page is more preferred than its child page if both of them are relevant to a query. [sent-150, score-0.134]

71 Here the parent-child relation between two pages is supposed to be given. [sent-151, score-0.136]

72 Speciﬁcally, we deﬁne the (and the only) edge feature function as g(yi , yj , x) = Ri,j (yi − yj ), (15) where Ri,j = r(xi , xj ) denotes the parent-child relation: r(xi , xj ) = 1 if document xi is the parent of xj , and r(xi , xj ) = 0 for other cases. [sent-153, score-0.619]

73 The C-CRF for Topic Distillation then becomes Pr(y|x) = 4 5 1 exp Z(x) K1 −αk (yi − xi,k )2 + i k=1 βRi,j (yi − yj ) , i,j Stochastic Gradient means conducting gradient ascent from one query to another. [sent-154, score-0.393]

74 (17) i,j −αk (yi − xi,k )2 + i,j βRi,j (yi − yj ) is integrable, we must The C-CRF can naturally model Topic Distillation: if the value of Ri,j is one, then the value of yi is large than that of yj with high probability. [sent-158, score-0.4]

75 To rank the documents of a query, we calculate the ranking scores in the following way. [sent-159, score-1.101]

76 Similarly to Pseudo Relevance Feedback, if we ignore the relation between documents and set β = 0, the ranking model degenerates to a linear ranking model in conventional local ranking. [sent-161, score-1.789]

77 3 Continuous CRF for Multiple Relations We only consider using one type of relation in the previous two cases. [sent-171, score-0.111]

78 We can also conduct global ranking by utilizing multiple types of relation. [sent-172, score-0.749]

79 It can easily incorporate various types of relation as edge feature functions. [sent-174, score-0.191]

80 For example, we can combine similarity relation and parent-child relation by using the following C-CRF model:  K1  1 Si,j Pr(y|x) = exp −αk (yi − xi,k )2 + β1 Ri,j (yi − yj ) − β2 (yi − yj )2 . [sent-175, score-0.558]

81  Z(x) 2 i i,j k=1 In this case, the ranking scores of documents for a new query is calculated as follows. [sent-176, score-1.146]

82 As baseline methods for the two tasks, we used several local ranking algorithms such as BM25, RankSVM [7] and ListNet [2]. [sent-216, score-0.685]

83 For Pseudo Relevance Feedback, we also compared with a traditional feedback method based on BM25 (BM25-PRF for short). [sent-219, score-0.142]

84 For Topic Distillation, we also compared with two traditional methods, sitemap based term propagation (ST) and sitemap based score propagation (SS) [11], which propagate the relevance along sitemap structure. [sent-220, score-0.446]

85 These algorithms can be regarded as a kind of global ranking methods but they are not based on supervised learning. [sent-221, score-0.767]

86 The left part of Table 1 shows the ranking accuracies of BM25, BM25-PRF, RankSVM, ListNet, and C-CRF, in terms of NDCG averaged over ﬁve trials on OHSUMED data. [sent-223, score-0.645]

87 The results indicate that C-CRF based global ranking can indeed improve search relevance. [sent-226, score-0.749]

88 The result indicates that C-CRF based global ranking can achieve better results than local ranking for this task. [sent-233, score-1.413]

89 6 Related Work Most existing work on using relation information in learning is for classiﬁcation (e. [sent-236, score-0.111]

90 In contrast, C-CRF can easily do it by incorportating the relations in different edge feature functions. [sent-244, score-0.139]

91 There is a hyper parameter β in RRSVM representing the trade-off between content and relation information. [sent-245, score-0.16]

92 7 Besides, C-CRF achieves better ranking accuracy than that reported for RRSVM [14] on the same benchmark dataset. [sent-252, score-0.672]

93 7 Conclusions We studied learning to rank methods for global ranking problem, in which we use both content information of objects and relation information between objects for ranking. [sent-253, score-1.137]

94 Experimental results on benchmark data show that C-CRF improves upon the baseline methods in the global ranking tasks. [sent-256, score-0.797]

95 (2) We have assumed absolute ranking scores given in training data. [sent-260, score-0.74]

96 (3) We have studied two global ranking tasks: Pseudo Relevance Feedback and Topic Distillation. [sent-262, score-0.749]

97 Co-clustering documents and words using bipartite spectral graph partitioning. [sent-292, score-0.283]

98 A document-document similarity measure based on cited titles and probability theory, and its application to relevance feedback retrieval. [sent-318, score-0.339]

99 Global ranking of documents using continuous conditional random ﬁelds. [sent-371, score-1.007]

100 Learning to rank relational objects and its application to web search. [sent-384, score-0.227]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('ranking', 0.645), ('documents', 0.283), ('distillation', 0.214), ('crf', 0.187), ('ranksvm', 0.179), ('relevance', 0.167), ('pseudo', 0.156), ('prf', 0.143), ('yi', 0.138), ('yj', 0.131), ('topic', 0.125), ('listnet', 0.125), ('query', 0.123), ('feedback', 0.121), ('relation', 0.111), ('rrsvm', 0.107), ('global', 0.104), ('scores', 0.095), ('qin', 0.094), ('document', 0.09), ('ndcg', 0.086), ('rank', 0.078), ('objects', 0.075), ('hk', 0.071), ('vertex', 0.068), ('page', 0.067), ('pr', 0.065), ('relations', 0.059), ('score', 0.058), ('gk', 0.058), ('retrieval', 0.058), ('letor', 0.057), ('liu', 0.057), ('sitemap', 0.054), ('subtopic', 0.054), ('log', 0.052), ('relational', 0.052), ('similarity', 0.051), ('dr', 0.051), ('conventional', 0.05), ('ascent', 0.049), ('content', 0.049), ('continuous', 0.048), ('round', 0.048), ('edge', 0.046), ('dc', 0.045), ('sigir', 0.045), ('ss', 0.045), ('ohsumed', 0.04), ('ranked', 0.039), ('bt', 0.038), ('fields', 0.036), ('banded', 0.036), ('degenerates', 0.036), ('gradient', 0.036), ('tasks', 0.035), ('feature', 0.034), ('xj', 0.034), ('ei', 0.031), ('conducting', 0.031), ('conducts', 0.031), ('please', 0.031), ('zhang', 0.031), ('conditional', 0.031), ('listwise', 0.029), ('wang', 0.028), ('xi', 0.028), ('benchmark', 0.027), ('tao', 0.027), ('xiong', 0.027), ('supposed', 0.025), ('contents', 0.025), ('st', 0.025), ('retrieved', 0.024), ('parent', 0.023), ('negatively', 0.023), ('integrable', 0.023), ('exp', 0.023), ('employ', 0.023), ('guarantee', 0.022), ('web', 0.022), ('kdd', 0.021), ('white', 0.021), ('td', 0.021), ('baseline', 0.021), ('traditional', 0.021), ('technique', 0.021), ('matrix', 0.02), ('tk', 0.02), ('rounds', 0.02), ('factor', 0.02), ('factors', 0.02), ('dependency', 0.02), ('propagation', 0.019), ('item', 0.019), ('local', 0.019), ('performances', 0.018), ('correlated', 0.018), ('kind', 0.018), ('solely', 0.017)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000004 93 nips-2008-Global Ranking Using Continuous Conditional Random Fields

Author: Tao Qin, Tie-yan Liu, Xu-dong Zhang, De-sheng Wang, Hang Li

2 0.30670369 190 nips-2008-Reconciling Real Scores with Binary Comparisons: A New Logistic Based Model for Ranking

Author: Nir Ailon

Abstract: The problem of ranking arises ubiquitously in almost every aspect of life, and in particular in Machine Learning/Information Retrieval. A statistical model for ranking predicts how humans rank subsets V of some universe U . In this work we deﬁne a statistical model for ranking that satisﬁes certain desirable properties. The model automatically gives rise to a logistic regression based approach to learning how to rank, for which the score and comparison based approaches are dual views. This offers a new generative approach to ranking which can be used for IR. There are two main contexts for this work. The ﬁrst is the theory of econometrics and study of statistical models explaining human choice of alternatives. In this context, we will compare our model with other well known models. The second context is the problem of ranking in machine learning, usually arising in the context of information retrieval. Here, much work has been done in the discriminative setting, where different heuristics are used to deﬁne ranking risk functions. Our model is built rigorously and axiomatically based on very simple desirable properties deﬁned locally for comparisons, and automatically implies the existence of a global score function serving as a natural model parameter which can be efﬁciently ﬁtted to pairwise comparison judgment data by solving a convex optimization problem. 1

3 0.26559818 224 nips-2008-Structured ranking learning using cumulative distribution networks

Author: Jim C. Huang, Brendan J. Frey

Abstract: Ranking is at the heart of many information retrieval applications. Unlike standard regression or classiﬁcation in which we predict outputs independently, in ranking we are interested in predicting structured outputs so that misranking one object can signiﬁcantly affect whether we correctly rank the other objects. In practice, the problem of ranking involves a large number of objects to be ranked and either approximate structured prediction methods are required, or assumptions of independence between object scores must be made in order to make the problem tractable. We present a probabilistic method for learning to rank using the graphical modelling framework of cumulative distribution networks (CDNs), where we can take into account the structure inherent to the problem of ranking by modelling the joint cumulative distribution functions (CDFs) over multiple pairwise preferences. We apply our framework to the problem of document retrieval in the case of the OHSUMED benchmark dataset. We will show that the RankNet, ListNet and ListMLE probabilistic models can be viewed as particular instances of CDNs and that our proposed framework allows for the exploration of a broad class of ﬂexible structured loss functionals for learning to rank. 1

4 0.16951099 72 nips-2008-Empirical performance maximization for linear rank statistics

Author: Stéphan J. Clémençcon, Nicolas Vayatis

Abstract: The ROC curve is known to be the golden standard for measuring performance of a test/scoring statistic regarding its capacity of discrimination between two populations in a wide variety of applications, ranging from anomaly detection in signal processing to information retrieval, through medical diagnosis. Most practical performance measures used in scoring applications such as the AUC, the local AUC, the p-norm push, the DCG and others, can be seen as summaries of the ROC curve. This paper highlights the fact that many of these empirical criteria can be expressed as (conditional) linear rank statistics. We investigate the properties of empirical maximizers of such performance criteria and provide preliminary results for the concentration properties of a novel class of random variables that we will call a linear rank process. 1

5 0.16844137 73 nips-2008-Estimating Robust Query Models with Convex Optimization

Author: Kevyn Collins-thompson

Abstract: Query expansion is a long-studied approach for improving retrieval effectiveness by enhancing the user’s original query with additional related words. Current algorithms for automatic query expansion can often improve retrieval accuracy on average, but are not robust: that is, they are highly unstable and have poor worst-case performance for individual queries. To address this problem, we introduce a novel formulation of query expansion as a convex optimization problem over a word graph. The model combines initial weights from a baseline feedback algorithm with edge weights based on word similarity, and integrates simple constraints to enforce set-based criteria such as aspect balance, aspect coverage, and term centrality. Results across multiple standard test collections show consistent and signiﬁcant reductions in the number and magnitude of expansion failures, while retaining the strong positive gains of the baseline algorithm. Our approach does not assume a particular retrieval model, making it applicable to a broad class of existing expansion algorithms. 1

6 0.1645274 239 nips-2008-Tighter Bounds for Structured Estimation

7 0.13630131 184 nips-2008-Predictive Indexing for Fast Search

8 0.12684599 114 nips-2008-Large Margin Taxonomy Embedding for Document Categorization

9 0.1057051 229 nips-2008-Syntactic Topic Models

10 0.098567344 120 nips-2008-Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text

11 0.094241194 174 nips-2008-Overlaying classifiers: a practical approach for optimal ranking

12 0.088609137 64 nips-2008-DiscLDA: Discriminative Learning for Dimensionality Reduction and Classification

13 0.081104234 106 nips-2008-Inferring rankings under constrained sensing

14 0.073039256 113 nips-2008-Kernelized Sorting

15 0.071330331 116 nips-2008-Learning Hybrid Models for Image Annotation with Partially Labeled Data

16 0.071149834 194 nips-2008-Regularized Learning with Networks of Features

17 0.070362054 197 nips-2008-Relative Performance Guarantees for Approximate Inference in Latent Dirichlet Allocation

18 0.061709858 248 nips-2008-Using matrices to model symbolic relationship

19 0.058468133 205 nips-2008-Semi-supervised Learning with Weakly-Related Unlabeled Data : Towards Better Text Categorization

20 0.056139443 207 nips-2008-Shape-Based Object Localization for Descriptive Classification

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.183), (1, -0.101), (2, -0.045), (3, -0.07), (4, -0.076), (5, -0.077), (6, 0.356), (7, -0.007), (8, 0.008), (9, 0.098), (10, -0.176), (11, -0.072), (12, -0.1), (13, 0.165), (14, 0.146), (15, -0.026), (16, 0.087), (17, -0.031), (18, 0.207), (19, 0.184), (20, 0.032), (21, -0.072), (22, -0.062), (23, 0.005), (24, 0.116), (25, -0.032), (26, -0.163), (27, 0.083), (28, 0.147), (29, -0.115), (30, 0.068), (31, -0.092), (32, -0.034), (33, 0.104), (34, -0.028), (35, 0.007), (36, -0.13), (37, -0.068), (38, 0.095), (39, -0.095), (40, -0.035), (41, 0.055), (42, 0.008), (43, 0.01), (44, -0.06), (45, -0.026), (46, 0.045), (47, 0.022), (48, 0.037), (49, 0.027)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97333324 93 nips-2008-Global Ranking Using Continuous Conditional Random Fields

Author: Tao Qin, Tie-yan Liu, Xu-dong Zhang, De-sheng Wang, Hang Li

2 0.87718594 190 nips-2008-Reconciling Real Scores with Binary Comparisons: A New Logistic Based Model for Ranking

Author: Nir Ailon

3 0.76667446 224 nips-2008-Structured ranking learning using cumulative distribution networks

Author: Jim C. Huang, Brendan J. Frey

4 0.46840349 72 nips-2008-Empirical performance maximization for linear rank statistics

Author: Stéphan J. Clémençcon, Nicolas Vayatis

5 0.43745768 114 nips-2008-Large Margin Taxonomy Embedding for Document Categorization

Author: Kilian Q. Weinberger, Olivier Chapelle

Abstract: Applications of multi-class classiﬁcation, such as document categorization, often appear in cost-sensitive settings. Recent work has signiﬁcantly improved the state of the art by moving beyond “ﬂat” classiﬁcation through incorporation of class hierarchies [4]. We present a novel algorithm that goes beyond hierarchical classiﬁcation and estimates the latent semantic space that underlies the class hierarchy. In this space, each class is represented by a prototype and classiﬁcation is done with the simple nearest neighbor rule. The optimization of the semantic space incorporates large margin constraints that ensure that for each instance the correct class prototype is closer than any other. We show that our optimization is convex and can be solved efﬁciently for large data sets. Experiments on the OHSUMED medical journal data base yield state-of-the-art results on topic categorization. 1

6 0.42081836 73 nips-2008-Estimating Robust Query Models with Convex Optimization

7 0.41889939 184 nips-2008-Predictive Indexing for Fast Search

8 0.40270814 239 nips-2008-Tighter Bounds for Structured Estimation

9 0.3435958 169 nips-2008-Online Models for Content Optimization

10 0.33060229 64 nips-2008-DiscLDA: Discriminative Learning for Dimensionality Reduction and Classification

11 0.33051383 106 nips-2008-Inferring rankings under constrained sensing

12 0.31309083 229 nips-2008-Syntactic Topic Models

13 0.29116267 197 nips-2008-Relative Performance Guarantees for Approximate Inference in Latent Dirichlet Allocation

14 0.27819386 134 nips-2008-Mixed Membership Stochastic Blockmodels

15 0.26830876 176 nips-2008-Partially Observed Maximum Entropy Discrimination Markov Networks

16 0.25871399 120 nips-2008-Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text

17 0.24413201 183 nips-2008-Predicting the Geometry of Metal Binding Sites from Protein Sequence

18 0.23359159 78 nips-2008-Exact Convex Confidence-Weighted Learning

19 0.23283824 113 nips-2008-Kernelized Sorting

20 0.21957998 28 nips-2008-Asynchronous Distributed Learning of Topic Models

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(4, 0.014), (6, 0.043), (7, 0.084), (12, 0.04), (15, 0.042), (28, 0.254), (35, 0.015), (57, 0.071), (59, 0.022), (63, 0.011), (77, 0.046), (78, 0.017), (83, 0.069), (99, 0.184)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.89878237 93 nips-2008-Global Ranking Using Continuous Conditional Random Fields

Author: Tao Qin, Tie-yan Liu, Xu-dong Zhang, De-sheng Wang, Hang Li

2 0.89862627 200 nips-2008-Robust Kernel Principal Component Analysis

Author: Minh H. Nguyen, Fernando Torre

Abstract: Kernel Principal Component Analysis (KPCA) is a popular generalization of linear PCA that allows non-linear feature extraction. In KPCA, data in the input space is mapped to higher (usually) dimensional feature space where the data can be linearly modeled. The feature space is typically induced implicitly by a kernel function, and linear PCA in the feature space is performed via the kernel trick. However, due to the implicitness of the feature space, some extensions of PCA such as robust PCA cannot be directly generalized to KPCA. This paper presents a technique to overcome this problem, and extends it to a uniﬁed framework for treating noise, missing data, and outliers in KPCA. Our method is based on a novel cost function to perform inference in KPCA. Extensive experiments, in both synthetic and real data, show that our algorithm outperforms existing methods. 1

3 0.8364296 231 nips-2008-Temporal Dynamics of Cognitive Control

Author: Jeremy Reynolds, Michael C. Mozer

Abstract: Cognitive control refers to the ﬂexible deployment of memory and attention in response to task demands and current goals. Control is often studied experimentally by presenting sequences of stimuli, some demanding a response, and others modulating the stimulus-response mapping. In these tasks, participants must maintain information about the current stimulus-response mapping in working memory. Prominent theories of cognitive control use recurrent neural nets to implement working memory, and optimize memory utilization via reinforcement learning. We present a novel perspective on cognitive control in which working memory representations are intrinsically probabilistic, and control operations that maintain and update working memory are dynamically determined via probabilistic inference. We show that our model provides a parsimonious account of behavioral and neuroimaging data, and suggest that it offers an elegant conceptualization of control in which behavior can be cast as optimal, subject to limitations on learning and the rate of information processing. Moreover, our model provides insight into how task instructions can be directly translated into appropriate behavior and then efﬁciently reﬁned with subsequent task experience. 1

4 0.83053857 47 nips-2008-Clustered Multi-Task Learning: A Convex Formulation

Author: Laurent Jacob, Jean-philippe Vert, Francis R. Bach

Abstract: In multi-task learning several related tasks are considered simultaneously, with the hope that by an appropriate sharing of information across tasks, each task may beneﬁt from the others. In the context of learning linear functions for supervised classiﬁcation or regression, this can be achieved by including a priori information about the weight vectors associated with the tasks, and how they are expected to be related to each other. In this paper, we assume that tasks are clustered into groups, which are unknown beforehand, and that tasks within a group have similar weight vectors. We design a new spectral norm that encodes this a priori assumption, without the prior knowledge of the partition of tasks into groups, resulting in a new convex optimization formulation for multi-task learning. We show in simulations on synthetic examples and on the IEDB MHC-I binding dataset, that our approach outperforms well-known convex methods for multi-task learning, as well as related non-convex methods dedicated to the same problem. 1

5 0.83039236 138 nips-2008-Modeling human function learning with Gaussian processes

Author: Thomas L. Griffiths, Chris Lucas, Joseph Williams, Michael L. Kalish

Abstract: Accounts of how people learn functional relationships between continuous variables have tended to focus on two possibilities: that people are estimating explicit functions, or that they are performing associative learning supported by similarity. We provide a rational analysis of function learning, drawing on work on regression in machine learning and statistics. Using the equivalence of Bayesian linear regression and Gaussian processes, we show that learning explicit rules and using similarity can be seen as two views of one solution to this problem. We use this insight to deﬁne a Gaussian process model of human function learning that combines the strengths of both approaches. 1

6 0.82832134 211 nips-2008-Simple Local Models for Complex Dynamical Systems

7 0.82828122 4 nips-2008-A Scalable Hierarchical Distributed Language Model

8 0.82730043 224 nips-2008-Structured ranking learning using cumulative distribution networks

9 0.82598853 135 nips-2008-Model Selection in Gaussian Graphical Models: High-Dimensional Consistency of \boldmath$\ell 1$-regularized MLE

10 0.82445949 227 nips-2008-Supervised Exponential Family Principal Component Analysis via Convex Optimization

11 0.82426679 101 nips-2008-Human Active Learning

12 0.82401782 176 nips-2008-Partially Observed Maximum Entropy Discrimination Markov Networks

13 0.82398576 34 nips-2008-Bayesian Network Score Approximation using a Metagraph Kernel

14 0.82215685 113 nips-2008-Kernelized Sorting

15 0.82170087 112 nips-2008-Kernel Measures of Independence for non-iid Data

16 0.81984228 247 nips-2008-Using Bayesian Dynamical Systems for Motion Template Libraries

17 0.819511 218 nips-2008-Spectral Clustering with Perturbed Data

18 0.81899077 50 nips-2008-Continuously-adaptive discretization for message-passing algorithms

19 0.8188014 63 nips-2008-Dimensionality Reduction for Data in Multiple Feature Representations

20 0.81830299 195 nips-2008-Regularized Policy Iteration