nips nips2009 nips2009-87 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: James Petterson, Jin Yu, Julian J. Mcauley, Tibério S. Caetano
Abstract: We present a method for learning max-weight matching predictors in bipartite graphs. The method consists of performing maximum a posteriori estimation in exponential families with sufficient statistics that encode permutations and data features. Although inference is in general hard, we show that for one very relevant application–document ranking–exact inference is efficient. For general model instances, an appropriate sampler is readily available. Contrary to existing max-margin matching models, our approach is statistically consistent and, in addition, experiments with increasing sample sizes indicate superior improvement over such models. We apply the method to graph matching in computer vision as well as to a standard benchmark dataset for learning document ranking, in which we obtain state-of-the-art results, in particular improving on max-margin variants. The drawback of this method with respect to max-margin alternatives is its runtime for large graphs, which is comparatively high. 1
Reference: text
sentIndex sentText sentNum sentScore
1 McAuley and Jin Yu e NICTA, Australian National University Canberra, Australia Abstract We present a method for learning max-weight matching predictors in bipartite graphs. [sent-3, score-0.402]
2 The method consists of performing maximum a posteriori estimation in exponential families with sufficient statistics that encode permutations and data features. [sent-4, score-0.19]
3 Contrary to existing max-margin matching models, our approach is statistically consistent and, in addition, experiments with increasing sample sizes indicate superior improvement over such models. [sent-7, score-0.252]
4 We apply the method to graph matching in computer vision as well as to a standard benchmark dataset for learning document ranking, in which we obtain state-of-the-art results, in particular improving on max-margin variants. [sent-8, score-0.437]
5 This is the problem of finding the ‘heaviest’ perfect match in a weighted bipartite graph. [sent-11, score-0.264]
6 For example, in computer vision the crucial problem of finding a correspondence between sets of image features is often modeled as a matching problem [2, 4]. [sent-14, score-0.225]
7 Ranking algorithms can be based on a matching framework [13], as can clustering algorithms [8]. [sent-15, score-0.183]
8 The problem is that in real applications we typically observe edge feature vectors, not edge weights. [sent-17, score-0.227]
9 In this setting, it is natural to ask whether we could parameterize the features, and use labeled matches in order to estimate the parameters such that, given graphs with ‘similar’ features, their resulting max-weight matches are also ‘similar’. [sent-21, score-0.286]
10 This idea of ‘parameterizing algorithms’ and then optimizing for agreement with data is called structured estimation [27, 29]. [sent-22, score-0.186]
11 [27] and [4] describe max-margin structured estimation formalisms for this problem. [sent-23, score-0.186]
12 Max-margin structured estimators are appealing in that they try to minimize the loss that one really cares about (‘structured losses’, of which the Hamming loss is an example). [sent-24, score-0.353]
13 However structured losses are typically piecewise constant in the parameters, which eliminates any hope of using smooth optimization directly. [sent-25, score-0.153]
14 Max-margin estimators instead minimize a surrogate loss which is easier to optimize, namely a convex upper bound on the structured loss [29]. [sent-26, score-0.403]
15 In practice the results are often good, but known convex relaxations produce estimators which are statistically inconsistent [18], i. [sent-27, score-0.203]
16 1 Motivated by the inconsistency issues of max-margin structured estimators as well as by the wellknown benefits of having a full probabilistic model, in this paper we present a maximum a posteriori (MAP) estimator for the matching problem. [sent-31, score-0.543]
17 The observed data are the edge feature vectors and the labeled matches provided for training. [sent-32, score-0.203]
18 We build an exponential family model where the sufficient statistics are such that the mode of the distribution (the prediction) is the solution of a max-weight matching problem. [sent-34, score-0.329]
19 We then compare the performance of our model instance against a large number of state-of-the-art ranking methods, including DORM [13], an approach that only differs from our model instance by using max-margin instead of a MAP formulation. [sent-37, score-0.223]
20 We show very competitive results on standard document ranking datasets, and in particular we show that our model performs better than or on par with DORM. [sent-38, score-0.398]
21 However the fastest suitable sampler is still quite slow for large models, in which case max-margin matching estimators like those of [4] and [27] are likely to be preferable even in spite of their potential inferior accuracy. [sent-40, score-0.303]
22 1 Background Structured Prediction In recent years, great attention has been devoted in Machine Learning to so-called structured predictors, which are predictors of the kind g✓ : X 7! [sent-42, score-0.189]
23 This structured nature of Y is what structured prediction refers to. [sent-45, score-0.347]
24 In the setting of this paper, X is the set of vector-weighted bipartite graphs (i. [sent-46, score-0.264]
25 , each edge has a feature vector associated with it), and Y is the set of perfect matches induced by X. [sent-48, score-0.27]
26 If N graphs are available, along with corresponding annotated matches (i. [sent-49, score-0.153]
27 , a set {(xn , y n )}N ), our task will be to estimate ✓ such that when we apply n=1 the predictor g✓ to a new graph it produces a match that is similar to matches of similar graphs from the annotated set. [sent-51, score-0.282]
28 Structured learning or structured estimation refers to the process of estimating a vector ✓ for predictor g✓ when data {(x1 , y 1 ), . [sent-52, score-0.186]
29 Two generic estimation strategies have been popular in producing structured predictors. [sent-57, score-0.186]
30 One is based on max-margin estimators [29, 27], and the other on maximum-likelihood (ML) or MAP estimators in exponential family models [12]. [sent-58, score-0.386]
31 However the resulting estimators are known to be inconsistent in general: in the limit of infinite training data the algorithm fails to recover the best model in the model class [18, 16, 15]. [sent-60, score-0.167]
32 The other approach uses ML or MAP estimation in conditional exponential families with ‘structured’ sufficient statistics, such as in probabilistic graphical models, where they are decomposed over the cliques of the graph (in which case they are called Conditional Random Fields, or CRFs [12]). [sent-62, score-0.272]
33 ML and MAP estimators in exponential families not only amount to solving an unconstrained and convex optimization problem; in addition they are statistically consistent. [sent-64, score-0.36]
34 2 The Matching Problem Consider a weighted bipartite graph with m nodes in each part, G = (V, E, w), where V is the set of vertices, E is the set of edges and w : E 7! [sent-68, score-0.307]
35 G can be simply represented by a matrix (wij ) where the entry wij is the weight of the edge ij. [sent-70, score-0.187]
36 Then the matching problem consists of computing 2 QN to attain the graph G = (V, E, w). [sent-81, score-0.265]
37 Therefore, p(✓|Y, X) / p(✓) G = exp log p(✓) + N Y exp (h (xn , y n ), ✓i n=1 N X (h (xn , y n ), ✓i g(x n=1 j i i xij wij = hxij , ✓i j We impose a Gaussian prior on ✓. [sent-84, score-0.262]
38 x is shown, corresponding to the solid |X; ✓), which becomes our log-posterior `(Y is a vector xe associated with each edge with 3 clarity only There is a vector xe ij tion (we suppress edge). [sent-88, score-0.211]
39 Right: weighted associatedgraph G obtained byclarity only Gx onshown, bipartite to each edge e (for evaluating xij is the learned vector ✓ (again the constant term): only edge ij is shown). [sent-89, score-0.499]
40 Right: weighted biparN tite graph G obtained by evaluating Gx on the learned 1 X 2 m X (g(xn ; ✓) h (x `(Y |X; ✓) = k✓k + vector ✓ (again only edge ij is shown). [sent-91, score-0.178]
41 (n) is typical setting consists in each part ofthe scoreis a convex function of ✓ (Wainwright The the number of nodes of engineering g(✓) matrix wij Here M according to domain knowledge and subsequently solving the combinatorial problem. [sent-106, score-0.222]
42 and the other terms are clearly conve 2003) the vector-weighted bipartite graph xn . [sent-107, score-0.409]
43 We then parameterize xij as wiy(i) = f (xiy(i) ; ✓), and the 3. [sent-108, score-0.185]
44 lution of the matching problem (2) to the In this paper we assume that the weights wij are instead to be estimated from training data. [sent-115, score-0.321]
45 More P of the exponential precisely, the weight wij associated with the edge ij in a graph will be the result of an appropriate family model (5), i. [sent-116, score-0.448]
46 Since our goal is to parame h from training composition of a feature vector xij (observed) and Model tures E, x) (x : data). [sent-121, score-0.206]
47 Therefore, in practice, our input is a vector-weighted bipartite graph Gx = (V, of individual pairs of nodes (so as t We assume an exponential family model, where the the to attain an E 7! [sent-122, score-0.508]
48 More formally, assume that a training set See Figure 1 for is M {X, Y } = {(xn , y n )}N is available, where xn := (xn , xn . [sent-125, score-0.335]
49 We then parameterize xij the vector-weighted y), ✓i graph x i=1 ↵ as wiy(i) = f (xiy(i) ; ✓), and the goal is to find the ✓ which maximizes the posterior probability of = ⌦x wiy(i) ⌦ ↵ iy(i) , ✓ , where the observed data. [sent-130, score-0.267]
50 In light of (10), (2) now clea y We assume an exponential family model, where the probability model is a prediction of the best match for Gx under p(y|x; ✓) = exp function, ✓i g(x; convex and (3) is the log-partition(h (x, y), which is a ✓)), where dif✓. [sent-138, score-0.284]
51 Basics y = argmax p(y|x; ✓) = argmax h (x, y), ✓i (5) is the log-partition function, which is a convex and differentiable function of ✓ [31]. [sent-145, score-0.372]
52 ` a convex and di↵erentiable function of ✓ (W and ⇤ML estimation amounts to maximizing the cony = argmax p(y|x; ✓) = argmax h (x, computing (5) & Jordan, 2003), therefore gradient descen ditional likelihood of a sample {X, Y }, i. [sent-147, score-0.405]
53 In practice we will in general introduce a prior exponential families that the gradient o partition function is the expectation of the form MAP estimation: ✓⇤ = argmax p(Y |X; ✓)p(✓) = argmax p(✓|Y, X). [sent-155, score-0.525]
54 3 Feature Parameterization The critical observation now is that we equate the solution of the matching problem (2) to the preP diction of the exponential family model (5), i. [sent-167, score-0.329]
55 Since our goal is to parameterize features of individual pairs of nodes (so as to produce the weight of an edge), the most natural model is M X (x, y) = xiy(i) , which gives (9) i=1 ⌦ ↵ wiy(i) = xiy(i) , ✓ , (10) i. [sent-170, score-0.158]
56 It is a standard result of exponential families that the gradient of the log-partition function is the expectation of the sufficient statistics: r✓ g(x; ✓) = Ey⇠p(y|x;✓) [ (x, y)]. [sent-179, score-0.157]
57 In our experiments, we successfully apply a tractable instance of our model to benchmark document ranking datasets, obtaining very competitive results. [sent-191, score-0.43]
58 The algorithm works by producing exact samples from the distribution of perfect matches on weighted bipartite graphs. [sent-196, score-0.289]
59 K i=1 (15) In our experiments, we apply this algorithm to an image matching application. [sent-200, score-0.225]
60 1 Experiments Ranking Here we apply the general matching model introduced in previous sections to the task of learning to rank. [sent-202, score-0.183]
61 Ranking is a fundamental problem with applications in diverse areas such as document retrieval, recommender systems, product rating and others. [sent-203, score-0.229]
62 Early learning to rank methods applied a pairwise approach, where pairs of documents were used as instances in learning [7, 6, 3]. [sent-204, score-0.229]
63 Recently there has been interest in listwise approaches, where document lists are used as instances, as in our method. [sent-205, score-0.172]
64 In this paper we focus, without loss of generality, on document ranking. [sent-206, score-0.18]
65 We are given a set of queries {qk } and, for each query qk , a list of D(k) documents {dk , . [sent-207, score-0.441]
66 , rD(k) } (assigned by a human editor), measuring the relevance degree of each document with respect to query qk . [sent-213, score-0.485]
67 A rating or relevance degree is usually a nominal value in the list {1, . [sent-214, score-0.198]
68 We are also given, for every k retrieved document dk , a joint feature vector i for that document and the query qk . [sent-218, score-0.713]
69 The subset itself is chosen randomly, provided at least one exemplar document of every rating is present. [sent-223, score-0.261]
70 , dk }, M documents at a time (conditioned on the fact that at least one 1 D(k) exemplar of every rating is present, but otherwise randomly). [sent-228, score-0.289]
71 This effectively boosts the number of training examples since each query qk ends up being selected many times, each time with a different subset of M documents from the original set of D(k) documents. [sent-229, score-0.434]
72 Here we follow the construction used in [13] to map matching problems to ranking problems (indeed the only difference between our ranking model and that of [13] is that they use a max-margin estimator and we use MAP in an exponential family. [sent-231, score-0.84]
73 ) Our edge feature vector xij will be the product of the feature vector i associated with document i, and a scalar cj (the choice of which will be explained below) associated with ranking position j xij = (16) i cj . [sent-232, score-0.953]
74 From (10) and (16), we have wij = cj h i , ✓i, and training proceeds as explained in Section 4. [sent-234, score-0.193]
75 Testing At test time, we are given a query q and its corresponding list of D associated documents. [sent-235, score-0.223]
76 , y ⇤ = argmax y D X⌦ i=1 D X ↵ xiy(i) , ✓ = argmax cy(i) h i , ✓i . [sent-238, score-0.322]
77 4 SortNet 20 hiddens P@10 IsoRank StructRank SortNet 10 hiddens P@10 0. [sent-249, score-0.188]
78 We now notice that if the scalar cj = c(j), where c is a non-increasing function of rank position j, then (17) can be solved simply by sorting the values of h i , ✓i in decreasing order. [sent-265, score-0.168]
79 1 In other words, the matching problem becomes one of ranking the values h i , ✓i. [sent-266, score-0.406]
80 2 In this setting it makes sense to interpret the quantity h i , ✓i as a score of document di for query q. [sent-268, score-0.276]
81 For each query there are a number of associated documents, with relevance degrees judged by humans on three levels: definitely, possibly or not relevant. [sent-277, score-0.256]
82 Again, for each query there are a number of associated documents, with relevance degrees judged by humans, but in this case only two levels are provided: relevant or not relevant. [sent-281, score-0.256]
83 Runtimes for M = 3, 4, 5 are from the ranking experiments, computed by full enumeration; M = 20 corresponds to the image matching experiments, which use the sampler from [10]. [sent-291, score-0.448]
84 Results For the first experiment training was done on subsets sampled as described above, where for each query qk we sampled 0. [sent-305, score-0.337]
85 Also, benchmarking of ranking algorithms is still in its infancy and we don’t yet have publicly available code for all of the competitive methods. [sent-317, score-0.258]
86 Runtime The runtime of our algorithm is competitive with that of max-margin for small graphs, such as those that arise from the ranking application. [sent-325, score-0.294]
87 This is certainly the benefit of the max-margin matching formulations of [4, 13]: it is much faster for large graphs. [sent-327, score-0.183]
88 In this setup, xij = | and i i 2 j | , where | · | denotes the elementwise difference (19) is the Shape Context feature vector [1] for point i. [sent-337, score-0.159]
89 Given the fact that the MAP estimator is consistent while the max-margin estimator is not, one is tempted to investigate the practical performance of both estimators as the sample size grows. [sent-341, score-0.218]
90 Left: Hamming loss for different numbers of training pairs in the image matching problem (test set size fixed to 500 pairs). [sent-363, score-0.367]
91 Right: results of NDCG@1 on the ranking dataset OHSUMED. [sent-364, score-0.223]
92 Left:method for learning max-weight bipartite matching predictors,training pairs (test tensively to to 500 pairs). [sent-367, score-0.421]
93 Right: anwell-known document ranking datasets, larger problems can also results. [sent-368, score-0.363]
94 Wealbeit red example match from the obtaining state-of-the-art be solved, also illustrated–with an image matching application–that test set (blue are correct and slowly, with a recently developed sampler. [sent-369, score-0.272]
95 it consists of performing maximum-a-posteriori estimation in an exponential family model, which results in a simple unconstrained convex optimization problem solvable by standard algorithms such as BFGS. [sent-372, score-0.229]
96 We are going to focus on web page References em we are given a set of queries {qk } and, for each query qk , a list of D(k) documents [1] Belongie, S. [sent-378, score-0.441]
97 Shape matching and object a human editor), measur[2] Belongie, S. [sent-388, score-0.183]
98 R}, where R is k ry retrieved document dk , a joint feature vector i for that document and the query i 8 raining time, we model each query q as a vector-weighted bipartite graph (Figure [4] Caetano, T. [sent-409, score-0.96]
99 Large margin methods for structured and interdependent output variables. [sent-576, score-0.191]
100 A general boosting method and its application to learning ranking functions for web search. [sent-612, score-0.223]
wordName wordTfidf (topN-words)
[('xiy', 0.26), ('rankmatch', 0.236), ('ranking', 0.223), ('ndcg', 0.218), ('sortnet', 0.213), ('matching', 0.183), ('bipartite', 0.183), ('dorm', 0.165), ('argmax', 0.161), ('qk', 0.154), ('structured', 0.153), ('adarank', 0.152), ('xn', 0.144), ('ohsumed', 0.142), ('structrank', 0.142), ('wiy', 0.142), ('document', 0.14), ('query', 0.136), ('xij', 0.124), ('estimators', 0.12), ('isorank', 0.118), ('permanent', 0.118), ('gx', 0.118), ('documents', 0.097), ('edge', 0.096), ('hiddens', 0.094), ('exponential', 0.092), ('wij', 0.091), ('rating', 0.089), ('qbrank', 0.083), ('ranksvm', 0.083), ('graph', 0.082), ('graphs', 0.081), ('rank', 0.077), ('rankboost', 0.076), ('matches', 0.072), ('dk', 0.071), ('listnet', 0.071), ('map', 0.07), ('letor', 0.067), ('families', 0.065), ('qin', 0.064), ('parameterize', 0.061), ('hamming', 0.061), ('frank', 0.059), ('ml', 0.056), ('cj', 0.055), ('pairs', 0.055), ('relevance', 0.055), ('family', 0.054), ('list', 0.054), ('ey', 0.052), ('liu', 0.05), ('runtimes', 0.05), ('convex', 0.05), ('estimator', 0.049), ('training', 0.047), ('hardy', 0.047), ('hxij', 0.047), ('littlewood', 0.047), ('ryser', 0.047), ('match', 0.047), ('partition', 0.046), ('nodes', 0.042), ('image', 0.042), ('basics', 0.041), ('xe', 0.041), ('multicategory', 0.041), ('iy', 0.041), ('malik', 0.041), ('silhouette', 0.041), ('prediction', 0.041), ('loss', 0.04), ('combinatorial', 0.039), ('caetano', 0.038), ('inconsistency', 0.038), ('trec', 0.038), ('margin', 0.038), ('retrieved', 0.037), ('sorting', 0.036), ('runtime', 0.036), ('sizes', 0.036), ('predictors', 0.036), ('belongie', 0.035), ('feature', 0.035), ('competitive', 0.035), ('perfect', 0.034), ('huber', 0.034), ('associated', 0.033), ('statistically', 0.033), ('estimation', 0.033), ('benchmark', 0.032), ('polya', 0.032), ('mcallester', 0.032), ('listwise', 0.032), ('enumeration', 0.032), ('exemplar', 0.032), ('judged', 0.032), ('zha', 0.032), ('datasets', 0.032)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000002 87 nips-2009-Exponential Family Graph Matching and Ranking
Author: James Petterson, Jin Yu, Julian J. Mcauley, Tibério S. Caetano
Abstract: We present a method for learning max-weight matching predictors in bipartite graphs. The method consists of performing maximum a posteriori estimation in exponential families with sufficient statistics that encode permutations and data features. Although inference is in general hard, we show that for one very relevant application–document ranking–exact inference is efficient. For general model instances, an appropriate sampler is readily available. Contrary to existing max-margin matching models, our approach is statistically consistent and, in addition, experiments with increasing sample sizes indicate superior improvement over such models. We apply the method to graph matching in computer vision as well as to a standard benchmark dataset for learning document ranking, in which we obtain state-of-the-art results, in particular improving on max-margin variants. The drawback of this method with respect to max-margin alternatives is its runtime for large graphs, which is comparatively high. 1
2 0.30233437 136 nips-2009-Learning to Rank by Optimizing NDCG Measure
Author: Hamed Valizadegan, Rong Jin, Ruofei Zhang, Jianchang Mao
Abstract: Learning to rank is a relatively new field of study, aiming to learn a ranking function from a set of training data with relevancy labels. The ranking algorithms are often evaluated using information retrieval measures, such as Normalized Discounted Cumulative Gain (NDCG) [1] and Mean Average Precision (MAP) [2]. Until recently, most learning to rank algorithms were not using a loss function related to the above mentioned evaluation measures. The main difficulty in direct optimization of these measures is that they depend on the ranks of documents, not the numerical values output by the ranking function. We propose a probabilistic framework that addresses this challenge by optimizing the expectation of NDCG over all the possible permutations of documents. A relaxation strategy is used to approximate the average of NDCG over the space of permutation, and a bound optimization approach is proposed to make the computation efficient. Extensive experiments show that the proposed algorithm outperforms state-of-the-art ranking algorithms on several benchmark data sets. 1
3 0.29178095 199 nips-2009-Ranking Measures and Loss Functions in Learning to Rank
Author: Wei Chen, Tie-yan Liu, Yanyan Lan, Zhi-ming Ma, Hang Li
Abstract: Learning to rank has become an important research topic in machine learning. While most learning-to-rank methods learn the ranking functions by minimizing loss functions, it is the ranking measures (such as NDCG and MAP) that are used to evaluate the performance of the learned ranking functions. In this work, we reveal the relationship between ranking measures and loss functions in learningto-rank methods, such as Ranking SVM, RankBoost, RankNet, and ListMLE. We show that the loss functions of these methods are upper bounds of the measurebased ranking errors. As a result, the minimization of these loss functions will lead to the maximization of the ranking measures. The key to obtaining this result is to model ranking as a sequence of classification tasks, and define a so-called essential loss for ranking as the weighted sum of the classification errors of individual tasks in the sequence. We have proved that the essential loss is both an upper bound of the measure-based ranking errors, and a lower bound of the loss functions in the aforementioned methods. Our proof technique also suggests a way to modify existing loss functions to make them tighter bounds of the measure-based ranking errors. Experimental results on benchmark datasets show that the modifications can lead to better ranking performances, demonstrating the correctness of our theoretical analysis. 1
4 0.19897208 190 nips-2009-Polynomial Semantic Indexing
Author: Bing Bai, Jason Weston, David Grangier, Ronan Collobert, Kunihiko Sadamasa, Yanjun Qi, Corinna Cortes, Mehryar Mohri
Abstract: We present a class of nonlinear (polynomial) models that are discriminatively trained to directly map from the word content in a query-document or documentdocument pair to a ranking score. Dealing with polynomial models on word features is computationally challenging. We propose a low-rank (but diagonal preserving) representation of our polynomial models to induce feasible memory and computation requirements. We provide an empirical study on retrieval tasks based on Wikipedia documents, where we obtain state-of-the-art performance while providing realistically scalable methods. 1
5 0.16397619 230 nips-2009-Statistical Consistency of Top-k Ranking
Author: Fen Xia, Tie-yan Liu, Hang Li
Abstract: This paper is concerned with the consistency analysis on listwise ranking methods. Among various ranking methods, the listwise methods have competitive performances on benchmark datasets and are regarded as one of the state-of-the-art approaches. Most listwise ranking methods manage to optimize ranking on the whole list (permutation) of objects, however, in practical applications such as information retrieval, correct ranking at the top k positions is much more important. This paper aims to analyze whether existing listwise ranking methods are statistically consistent in the top-k setting. For this purpose, we define a top-k ranking framework, where the true loss (and thus the risks) are defined on the basis of top-k subgroup of permutations. This framework can include the permutationlevel ranking framework proposed in previous work as a special case. Based on the new framework, we derive sufficient conditions for a listwise ranking method to be consistent with the top-k true loss, and show an effective way of modifying the surrogate loss functions in existing methods to satisfy these conditions. Experimental results show that after the modifications, the methods can work significantly better than their original versions. 1
6 0.10573459 211 nips-2009-Segmenting Scenes by Matching Image Composites
7 0.10084178 72 nips-2009-Distribution Matching for Transduction
8 0.088539727 30 nips-2009-An Integer Projected Fixed Point Method for Graph Matching and MAP Inference
9 0.081782281 139 nips-2009-Linear-time Algorithms for Pairwise Statistical Problems
10 0.079605103 207 nips-2009-Robust Nonparametric Regression with Metric-Space Valued Output
11 0.077806823 95 nips-2009-Fast subtree kernels on graphs
12 0.075718336 102 nips-2009-Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models
13 0.072826579 141 nips-2009-Local Rules for Global MAP: When Do They Work ?
14 0.071936138 31 nips-2009-An LP View of the M-best MAP problem
15 0.071447454 204 nips-2009-Replicated Softmax: an Undirected Topic Model
16 0.071378343 250 nips-2009-Training Factor Graphs with Reinforcement Learning for Efficient MAP Inference
17 0.071162216 3 nips-2009-AUC optimization and the two-sample problem
18 0.070599109 1 nips-2009-$L 1$-Penalized Robust Estimation for a Class of Inverse Problems Arising in Multiview Geometry
19 0.066936545 142 nips-2009-Locality-sensitive binary codes from shift-invariant kernels
20 0.06264893 148 nips-2009-Matrix Completion from Power-Law Distributed Samples
topicId topicWeight
[(0, -0.244), (1, 0.069), (2, -0.113), (3, -0.059), (4, 0.011), (5, -0.101), (6, -0.366), (7, -0.142), (8, 0.063), (9, -0.233), (10, 0.069), (11, 0.097), (12, -0.055), (13, 0.05), (14, -0.042), (15, 0.008), (16, -0.11), (17, 0.008), (18, -0.031), (19, -0.051), (20, -0.013), (21, 0.001), (22, 0.067), (23, 0.082), (24, 0.013), (25, 0.028), (26, 0.077), (27, -0.02), (28, 0.013), (29, -0.056), (30, 0.006), (31, 0.045), (32, -0.034), (33, -0.004), (34, -0.014), (35, 0.073), (36, -0.023), (37, 0.02), (38, -0.001), (39, 0.066), (40, -0.04), (41, 0.033), (42, -0.035), (43, -0.032), (44, 0.007), (45, 0.024), (46, -0.025), (47, 0.04), (48, -0.025), (49, 0.016)]
simIndex simValue paperId paperTitle
same-paper 1 0.92838854 87 nips-2009-Exponential Family Graph Matching and Ranking
Author: James Petterson, Jin Yu, Julian J. Mcauley, Tibério S. Caetano
Abstract: We present a method for learning max-weight matching predictors in bipartite graphs. The method consists of performing maximum a posteriori estimation in exponential families with sufficient statistics that encode permutations and data features. Although inference is in general hard, we show that for one very relevant application–document ranking–exact inference is efficient. For general model instances, an appropriate sampler is readily available. Contrary to existing max-margin matching models, our approach is statistically consistent and, in addition, experiments with increasing sample sizes indicate superior improvement over such models. We apply the method to graph matching in computer vision as well as to a standard benchmark dataset for learning document ranking, in which we obtain state-of-the-art results, in particular improving on max-margin variants. The drawback of this method with respect to max-margin alternatives is its runtime for large graphs, which is comparatively high. 1
2 0.85684788 136 nips-2009-Learning to Rank by Optimizing NDCG Measure
Author: Hamed Valizadegan, Rong Jin, Ruofei Zhang, Jianchang Mao
Abstract: Learning to rank is a relatively new field of study, aiming to learn a ranking function from a set of training data with relevancy labels. The ranking algorithms are often evaluated using information retrieval measures, such as Normalized Discounted Cumulative Gain (NDCG) [1] and Mean Average Precision (MAP) [2]. Until recently, most learning to rank algorithms were not using a loss function related to the above mentioned evaluation measures. The main difficulty in direct optimization of these measures is that they depend on the ranks of documents, not the numerical values output by the ranking function. We propose a probabilistic framework that addresses this challenge by optimizing the expectation of NDCG over all the possible permutations of documents. A relaxation strategy is used to approximate the average of NDCG over the space of permutation, and a bound optimization approach is proposed to make the computation efficient. Extensive experiments show that the proposed algorithm outperforms state-of-the-art ranking algorithms on several benchmark data sets. 1
3 0.85492224 199 nips-2009-Ranking Measures and Loss Functions in Learning to Rank
Author: Wei Chen, Tie-yan Liu, Yanyan Lan, Zhi-ming Ma, Hang Li
Abstract: Learning to rank has become an important research topic in machine learning. While most learning-to-rank methods learn the ranking functions by minimizing loss functions, it is the ranking measures (such as NDCG and MAP) that are used to evaluate the performance of the learned ranking functions. In this work, we reveal the relationship between ranking measures and loss functions in learningto-rank methods, such as Ranking SVM, RankBoost, RankNet, and ListMLE. We show that the loss functions of these methods are upper bounds of the measurebased ranking errors. As a result, the minimization of these loss functions will lead to the maximization of the ranking measures. The key to obtaining this result is to model ranking as a sequence of classification tasks, and define a so-called essential loss for ranking as the weighted sum of the classification errors of individual tasks in the sequence. We have proved that the essential loss is both an upper bound of the measure-based ranking errors, and a lower bound of the loss functions in the aforementioned methods. Our proof technique also suggests a way to modify existing loss functions to make them tighter bounds of the measure-based ranking errors. Experimental results on benchmark datasets show that the modifications can lead to better ranking performances, demonstrating the correctness of our theoretical analysis. 1
4 0.7851463 230 nips-2009-Statistical Consistency of Top-k Ranking
Author: Fen Xia, Tie-yan Liu, Hang Li
Abstract: This paper is concerned with the consistency analysis on listwise ranking methods. Among various ranking methods, the listwise methods have competitive performances on benchmark datasets and are regarded as one of the state-of-the-art approaches. Most listwise ranking methods manage to optimize ranking on the whole list (permutation) of objects, however, in practical applications such as information retrieval, correct ranking at the top k positions is much more important. This paper aims to analyze whether existing listwise ranking methods are statistically consistent in the top-k setting. For this purpose, we define a top-k ranking framework, where the true loss (and thus the risks) are defined on the basis of top-k subgroup of permutations. This framework can include the permutationlevel ranking framework proposed in previous work as a special case. Based on the new framework, we derive sufficient conditions for a listwise ranking method to be consistent with the top-k true loss, and show an effective way of modifying the surrogate loss functions in existing methods to satisfy these conditions. Experimental results show that after the modifications, the methods can work significantly better than their original versions. 1
5 0.72577953 190 nips-2009-Polynomial Semantic Indexing
Author: Bing Bai, Jason Weston, David Grangier, Ronan Collobert, Kunihiko Sadamasa, Yanjun Qi, Corinna Cortes, Mehryar Mohri
Abstract: We present a class of nonlinear (polynomial) models that are discriminatively trained to directly map from the word content in a query-document or documentdocument pair to a ranking score. Dealing with polynomial models on word features is computationally challenging. We propose a low-rank (but diagonal preserving) representation of our polynomial models to induce feasible memory and computation requirements. We provide an empirical study on retrieval tasks based on Wikipedia documents, where we obtain state-of-the-art performance while providing realistically scalable methods. 1
6 0.46155494 206 nips-2009-Riffled Independence for Ranked Data
7 0.46059334 233 nips-2009-Streaming Pointwise Mutual Information
8 0.44919306 143 nips-2009-Localizing Bugs in Program Executions with Graphical Models
9 0.42814136 7 nips-2009-A Data-Driven Approach to Modeling Choice
10 0.41018277 30 nips-2009-An Integer Projected Fixed Point Method for Graph Matching and MAP Inference
11 0.39428166 198 nips-2009-Rank-Approximate Nearest Neighbor Search: Retaining Meaning and Speed in High Dimensions
12 0.39322776 141 nips-2009-Local Rules for Global MAP: When Do They Work ?
13 0.38401571 72 nips-2009-Distribution Matching for Transduction
14 0.36640862 244 nips-2009-The Wisdom of Crowds in the Recollection of Order Information
15 0.36095461 46 nips-2009-Bilinear classifiers for visual recognition
16 0.35406458 148 nips-2009-Matrix Completion from Power-Law Distributed Samples
17 0.35308316 3 nips-2009-AUC optimization and the two-sample problem
18 0.35175243 68 nips-2009-Dirichlet-Bernoulli Alignment: A Generative Model for Multi-Class Multi-Label Multi-Instance Corpora
19 0.35138479 71 nips-2009-Distribution-Calibrated Hierarchical Classification
20 0.34504908 102 nips-2009-Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models
topicId topicWeight
[(7, 0.011), (24, 0.067), (25, 0.042), (35, 0.05), (36, 0.104), (39, 0.066), (58, 0.073), (61, 0.027), (71, 0.056), (77, 0.033), (81, 0.014), (86, 0.136), (88, 0.244), (91, 0.012)]
simIndex simValue paperId paperTitle
same-paper 1 0.80614966 87 nips-2009-Exponential Family Graph Matching and Ranking
Author: James Petterson, Jin Yu, Julian J. Mcauley, Tibério S. Caetano
Abstract: We present a method for learning max-weight matching predictors in bipartite graphs. The method consists of performing maximum a posteriori estimation in exponential families with sufficient statistics that encode permutations and data features. Although inference is in general hard, we show that for one very relevant application–document ranking–exact inference is efficient. For general model instances, an appropriate sampler is readily available. Contrary to existing max-margin matching models, our approach is statistically consistent and, in addition, experiments with increasing sample sizes indicate superior improvement over such models. We apply the method to graph matching in computer vision as well as to a standard benchmark dataset for learning document ranking, in which we obtain state-of-the-art results, in particular improving on max-margin variants. The drawback of this method with respect to max-margin alternatives is its runtime for large graphs, which is comparatively high. 1
2 0.73170745 28 nips-2009-An Additive Latent Feature Model for Transparent Object Recognition
Author: Mario Fritz, Gary Bradski, Sergey Karayev, Trevor Darrell, Michael J. Black
Abstract: Existing methods for visual recognition based on quantized local features can perform poorly when local features exist on transparent surfaces, such as glass or plastic objects. There are characteristic patterns to the local appearance of transparent objects, but they may not be well captured by distances to individual examples or by a local pattern codebook obtained by vector quantization. The appearance of a transparent patch is determined in part by the refraction of a background pattern through a transparent medium: the energy from the background usually dominates the patch appearance. We model transparent local patch appearance using an additive model of latent factors: background factors due to scene content, and factors which capture a local edge energy distribution characteristic of the refraction. We implement our method using a novel LDA-SIFT formulation which performs LDA prior to any vector quantization step; we discover latent topics which are characteristic of particular transparent patches and quantize the SIFT space into transparent visual words according to the latent topic dimensions. No knowledge of the background scene is required at test time; we show examples recognizing transparent glasses in a domestic environment. 1
3 0.66365814 136 nips-2009-Learning to Rank by Optimizing NDCG Measure
Author: Hamed Valizadegan, Rong Jin, Ruofei Zhang, Jianchang Mao
Abstract: Learning to rank is a relatively new field of study, aiming to learn a ranking function from a set of training data with relevancy labels. The ranking algorithms are often evaluated using information retrieval measures, such as Normalized Discounted Cumulative Gain (NDCG) [1] and Mean Average Precision (MAP) [2]. Until recently, most learning to rank algorithms were not using a loss function related to the above mentioned evaluation measures. The main difficulty in direct optimization of these measures is that they depend on the ranks of documents, not the numerical values output by the ranking function. We propose a probabilistic framework that addresses this challenge by optimizing the expectation of NDCG over all the possible permutations of documents. A relaxation strategy is used to approximate the average of NDCG over the space of permutation, and a bound optimization approach is proposed to make the computation efficient. Extensive experiments show that the proposed algorithm outperforms state-of-the-art ranking algorithms on several benchmark data sets. 1
4 0.64905554 237 nips-2009-Subject independent EEG-based BCI decoding
Author: Siamac Fazli, Cristian Grozea, Marton Danoczy, Benjamin Blankertz, Florin Popescu, Klaus-Robert Müller
Abstract: In the quest to make Brain Computer Interfacing (BCI) more usable, dry electrodes have emerged that get rid of the initial 30 minutes required for placing an electrode cap. Another time consuming step is the required individualized adaptation to the BCI user, which involves another 30 minutes calibration for assessing a subject’s brain signature. In this paper we aim to also remove this calibration proceedure from BCI setup time by means of machine learning. In particular, we harvest a large database of EEG BCI motor imagination recordings (83 subjects) for constructing a library of subject-specific spatio-temporal filters and derive a subject independent BCI classifier. Our offline results indicate that BCI-na¨ve ı users could start real-time BCI use with no prior calibration at only a very moderate performance loss.
5 0.64779347 104 nips-2009-Group Sparse Coding
Author: Samy Bengio, Fernando Pereira, Yoram Singer, Dennis Strelow
Abstract: Bag-of-words document representations are often used in text, image and video processing. While it is relatively easy to determine a suitable word dictionary for text documents, there is no simple mapping from raw images or videos to dictionary terms. The classical approach builds a dictionary using vector quantization over a large set of useful visual descriptors extracted from a training set, and uses a nearest-neighbor algorithm to count the number of occurrences of each dictionary word in documents to be encoded. More robust approaches have been proposed recently that represent each visual descriptor as a sparse weighted combination of dictionary words. While favoring a sparse representation at the level of visual descriptors, those methods however do not ensure that images have sparse representation. In this work, we use mixed-norm regularization to achieve sparsity at the image level as well as a small overall dictionary. This approach can also be used to encourage using the same dictionary words for all the images in a class, providing a discriminative signal in the construction of image representations. Experimental results on a benchmark image classification dataset show that when compact image or dictionary representations are needed for computational efficiency, the proposed approach yields better mean average precision in classification. 1
6 0.647475 199 nips-2009-Ranking Measures and Loss Functions in Learning to Rank
7 0.64582962 3 nips-2009-AUC optimization and the two-sample problem
8 0.64541876 32 nips-2009-An Online Algorithm for Large Scale Image Similarity Learning
9 0.64353412 119 nips-2009-Kernel Methods for Deep Learning
10 0.64272112 106 nips-2009-Heavy-Tailed Symmetric Stochastic Neighbor Embedding
11 0.64261937 137 nips-2009-Learning transport operators for image manifolds
12 0.64194643 145 nips-2009-Manifold Embeddings for Model-Based Reinforcement Learning under Partial Observability
13 0.64099658 250 nips-2009-Training Factor Graphs with Reinforcement Learning for Efficient MAP Inference
14 0.63937098 112 nips-2009-Human Rademacher Complexity
15 0.63827503 169 nips-2009-Nonlinear Learning using Local Coordinate Coding
16 0.63813496 132 nips-2009-Learning in Markov Random Fields using Tempered Transitions
17 0.63696241 72 nips-2009-Distribution Matching for Transduction
18 0.63659549 260 nips-2009-Zero-shot Learning with Semantic Output Codes
19 0.63366878 139 nips-2009-Linear-time Algorithms for Pairwise Statistical Problems
20 0.63326561 142 nips-2009-Locality-sensitive binary codes from shift-invariant kernels