nips nips2010 nips2010-9 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Tao Qin, Xiubo Geng, Tie-yan Liu
Abstract: This paper is concerned with rank aggregation, which aims to combine multiple input rankings to get a better ranking. A popular approach to rank aggregation is based on probabilistic models on permutations, e.g., the Luce model and the Mallows model. However, these models have their limitations in either poor expressiveness or high computational complexity. To avoid these limitations, in this paper, we propose a new model, which is defined with a coset-permutation distance, and models the generation of a permutation as a stagewise process. We refer to the new model as coset-permutation distance based stagewise (CPS) model. The CPS model has rich expressiveness and can therefore be used in versatile applications, because many different permutation distances can be used to induce the coset-permutation distance. The complexity of the CPS model is low because of the stagewise decomposition of the permutation probability and the efficient computation of most coset-permutation distances. We apply the CPS model to supervised rank aggregation, derive the learning and inference algorithms, and empirically study their effectiveness and efficiency. Experiments on public datasets show that the derived algorithms based on the CPS model can achieve state-ofthe-art ranking accuracy, and are much more efficient than previous algorithms.
Reference: text
sentIndex sentText sentNum sentScore
1 com Abstract This paper is concerned with rank aggregation, which aims to combine multiple input rankings to get a better ranking. [sent-4, score-0.245]
2 A popular approach to rank aggregation is based on probabilistic models on permutations, e. [sent-5, score-0.41]
3 To avoid these limitations, in this paper, we propose a new model, which is defined with a coset-permutation distance, and models the generation of a permutation as a stagewise process. [sent-9, score-0.385]
4 We refer to the new model as coset-permutation distance based stagewise (CPS) model. [sent-10, score-0.275]
5 The CPS model has rich expressiveness and can therefore be used in versatile applications, because many different permutation distances can be used to induce the coset-permutation distance. [sent-11, score-0.507]
6 The complexity of the CPS model is low because of the stagewise decomposition of the permutation probability and the efficient computation of most coset-permutation distances. [sent-12, score-0.478]
7 We apply the CPS model to supervised rank aggregation, derive the learning and inference algorithms, and empirically study their effectiveness and efficiency. [sent-13, score-0.294]
8 Experiments on public datasets show that the derived algorithms based on the CPS model can achieve state-ofthe-art ranking accuracy, and are much more efficient than previous algorithms. [sent-14, score-0.173]
9 1 Introduction Rank aggregation aims at combining multiple rankings of objects to generate a better ranking. [sent-15, score-0.344]
10 For example, in meta search [1], when users issue a query, the query is sent to several search engines and the rankings given by them are aggregated to generate more comprehensive ranking results. [sent-17, score-0.267]
11 Given the underlying correspondence between ranking and permutation, probabilistic models on permutations, originated in statistics [19, 5, 4], have been widely applied to solve the problems of rank aggregation. [sent-18, score-0.286]
12 The Mallows model is a distance-based model, which defines the probability of a permutation according to its distance to a location permutation. [sent-20, score-0.392]
13 Due to many applicable permutation distances, the Mallows model has very rich expressiveness, and therefore can be potentially used in many different applications. [sent-21, score-0.309]
14 ) to compute the probability of a single permutation of n objects. [sent-24, score-0.246]
15 This is clearly intractable when we need to rank a large number of objects in real applications. [sent-25, score-0.214]
16 The Luce model is a stagewise model, which decomposes the process of generating a permutation of n objects into n sequential stages. [sent-26, score-0.548]
17 The expressiveness of the Luce model, however, is limited because it is defined on the scores of individual objects and cannot leverage versatile distance measures between permutations. [sent-30, score-0.272]
18 In this paper, we propose a new probabilistic model on permutations, which inherits the advantages of both the Luce model and the Mallows model and avoids their limitations. [sent-31, score-0.207]
19 We refer to the model as coset-permutation distance based stagewise (CPS) model. [sent-32, score-0.275]
20 Different from the Mallows model, the CPS model is a stagewise model. [sent-33, score-0.187]
21 It decomposes the generative process of a permutation π into sequential stages, which makes the efficient computation possible. [sent-34, score-0.306]
22 Different from the Luce model, the CPS model defines the selection probability based on the distance between a location permutation σ and the right coset of π (referred to as coset-permutation distance) at each stage. [sent-36, score-0.551]
23 Because many different permutation distances can be used to induce the coset-permutation distance, the CPS model also has rich expressiveness. [sent-38, score-0.396]
24 Furthermore, the coset-permutation distances induced by many popular permutation distances can be computed with polynomial time complexity, which further ensures the efficiency of the CPS model. [sent-39, score-0.45]
25 We then apply the CPS model to supervised rank aggregation and derive corresponding algorithms for learning and inference of the model. [sent-40, score-0.476]
26 Experiments on public datasets show that the CPS model based algorithms can achieve state-of-the-art ranking accuracy, and are much more efficient than baseline methods based on previous probabilistic models. [sent-41, score-0.208]
27 1 Rank Aggregation There are mainly two kinds of rank aggregation, i. [sent-43, score-0.173]
28 , score-based rank aggregation [17, 16] and order-based rank aggregation [2, 7, 3]. [sent-45, score-0.724]
29 In the former, objects in the input rankings are associated with scores, while in the latter, only the order information of these objects is available. [sent-46, score-0.185]
30 In this work, we focus on the order-based rank aggregation, because it is more popular in real applications [7], and score-based rank aggregation can be easily converted to order-based rank aggregation by ignoring the additional score information [7]. [sent-47, score-0.911]
31 For example, BordaCount [2, 7] and median rank aggregation [8] are simply based on average rank positions or the number of pairwise wins. [sent-49, score-0.536]
32 In the recent literature, probabilistic models on permutations, such as the Mallows model and the Luce model, have been introduced to solve the problem of rank aggregation. [sent-50, score-0.23]
33 For example, the Mallows model has been shown very effective in both supervised rank aggregation and unsupervised rank aggregation, and the effectiveness of the Luce model has been demonstrated in the context of unsupervised rank aggregation. [sent-52, score-0.825]
34 We use π(i) to denote the position given to object i and π −1 (i) to denote the object assigned to position i. [sent-64, score-0.153]
35 The collection of all permutations of n objects forms a non-abelian group under composition, called the symmetric group of order n, denoted as Sn . [sent-72, score-0.197]
36 (1) The right coset Sn−k π = {σπ|σ ∈ Sn−k } is a subset of permutations whose top-k objects are exactly the same as in π. [sent-78, score-0.334]
37 , ik ⟩) to denote the right coset with object i1 in position 1, i2 in position 2, . [sent-86, score-0.271]
38 The Mallows model is a distance based probabilistic model on permutations. [sent-90, score-0.195]
39 It uses a permutation distance d on the symmetric group Sn to define the probability of a permutation: P (π|θ, σ) = 1 exp(−θd(π, σ)), Z(θ, σ) where σ ∈ Sn is the location permutation, θ ∈ R is a dispersion parameter, and ∑ Z(θ, σ) = exp(−θd(π, σ)). [sent-91, score-0.397]
40 Although for some specific distances (such as dt ), there exist efficient ways for parameter estimation in the Mallows model, for many other distances (such as dr and df ), there is no known efficient method to compute Z(θ, σ) and one has to pay for the high computational complexity of O(n! [sent-99, score-0.535]
41 The Luce model is a stagewise probabilistic model on permutations. [sent-104, score-0.258]
42 In this way, we obtain the permutation probability of π as follows, n ∏ exp(ωπ−1 (i) ) ∑n P (π) = . [sent-110, score-0.246]
43 (4) j=i exp(ωπ −1 (j) ) i=1 The computation of permutation probability in the Luce model is very efficient, as shown above. [sent-111, score-0.282]
44 However, the Luce model is defined as a specific function of the scores of the objects, and therefore cannot make use of versatile permutation distances. [sent-114, score-0.319]
45 We call this model the coset-permutation distance based stagewise (CPS) model. [sent-118, score-0.275]
46 A coset-permutation distance is induced from a permutation distance, as shown in the following definition. [sent-121, score-0.358]
47 It is easy to verify that if the permutation distance d is right invariant, then the induced cosetˆ permutation distance d is also right invariant. [sent-124, score-0.68]
48 With the concept of coset-permutation distance, given a dispersion parameter θ ∈ R and a location permutation σ ∈ Sn , we can define the CPS model as follows. [sent-125, score-0.322]
49 Specifically, the generative process of a permutation π of n objects is decomposed into n sequential stages. [sent-126, score-0.341]
50 At the k-th stage, the task is to select the k-th object in the original permutation π out of the working set. [sent-128, score-0.282]
51 (6), we can see that the closer the coset Sn−k π is to the location permutation σ, the larger the selection probability is. [sent-134, score-0.427]
52 The CPS model defines the probability of a permutation π conditioned on a dispersion parameter θ and a location permutation σ as: P (π|θ, σ) = n ∏ k=1 ∑n j=k ˆ exp(−θd(Sn−k π, σ)) , ˆ exp(−θd(Sn−k (π, k, j), σ)) (7) where Sn−k (π, k, j) is defined in the sentence after Eq. [sent-137, score-0.568]
53 π∈Sn In rank aggregation, one usually needs to combine multiple input rankings. [sent-141, score-0.159]
54 The good news is that the real complexity for computing the coset-permutation distance induced by several popular permutation distances is much lower than O((n − k)! [sent-156, score-0.492]
55 The coset-permutation distances induced from Spearman’s rank correlation dr , Spearman’s footrule df , and Kendall’s tau dt can all be computed with a complexity of O(n2 ). [sent-160, score-0.726]
56 , n − 2, we have2 ˆ dr (Sn−k π, σ) = k ∑ (σ(π −1 (i)) − i)2 + i=1 ˆ df (Sn−k π, σ) = k ∑ n ∑ 1 n−k n ∑ (σ(π −1 (i)) − j)2 , (9) i=k+1 j=k+1 |σ(π −1 (i)) − i| + i=1 n ∑ 1 n−k n ∑ |σ(π −1 (i)) − j|, (10) i=k+1 j=k+1 k n ∑ ∑ 1 ˆ 1{σ(π−1 (i))>σ(π−1 (j))} . [sent-164, score-0.264]
57 dt (Sn−k π, σ) = (n − k)(n − k − 1) + 4 i=1 j=i+1 (11) According to the above theorem, each induced coset-permutation distance can be computed with a time complexity of O(n2 ). [sent-165, score-0.243]
58 For the coset distances induced from dr , df and dt , the CPS model in Eq. [sent-172, score-0.645]
59 The similarity between the CPS model and the Luce model is that they are both defined in a stagewise manner. [sent-176, score-0.223]
60 This stagewise definition enables efficient inference for both models. [sent-177, score-0.205]
61 The difference between the CPS model and the Luce model lies in that the CPS model has a much richer expressiveness than the Luce model. [sent-178, score-0.207]
62 This is mainly because the CPS model is a distance based model while the Luce model is not. [sent-179, score-0.21]
63 Our experiments in Section 5 show that different distances may be appropriate for different applications and datasets, which means a model with rich expressiveness has the potential to be applied for versatile applications. [sent-180, score-0.262]
64 Actually when the coset-permutation distance in the CPS model is induced by the Kendall’s tau dt , the CPS model is even mathematically equivalent to the Mallows model defined with dt . [sent-182, score-0.441]
65 However, for most permutation distances, the complexity of the Mallows model is as huge as O(n! [sent-185, score-0.315]
66 4 Algorithms for Rank Aggregation In this section, we show how to apply the extended CPS model to solve the problem of rank aggregation. [sent-188, score-0.195]
67 Here we take meta search as an example, and consider the supervised case of rank aggregation. [sent-189, score-0.241]
68 That is, given a set of training queries, we need to learn the parameters θ in the CPS model and apply the model with the learned parameters to aggregate rankings for new test queries. [sent-190, score-0.147]
69 A straightforward method is to find the permutation with the largest probability conditioned on the M input rankings, just as the widely-used inference algorithm for the Mallows model [12]. [sent-214, score-0.336]
70 Considering the stagewise definition of the CPS model, we propose a sequential inference algorithm. [sent-219, score-0.257]
71 Theorem 3 shows that the complexity of sequential inference is just O(M n2 ). [sent-227, score-0.151]
72 Our experiments in the next section indicate that such an approximation does not hurt the ranking accuracy by much, while significantly speeds up the inference process. [sent-228, score-0.149]
73 For the coset distance induced from dr , df , and dt , the stagewise inference as shown in Algorithm 1 can be conducted with a time complexity of O(M n2 ) . [sent-230, score-0.871]
74 The larger the NDCG value, the better the aggregation accuracy. [sent-241, score-0.203]
75 For the CPS model, we tested two inference methods: global inference (denoted as CPS-G) and sequential inference (denoted as CPS-S). [sent-244, score-0.23]
76 When applied to supervised rank aggregation, the learning process of the Mallows model is also maximum likelihood estimation. [sent-246, score-0.219]
77 For inference, we chose the permutation with the maximal probability as the final aggregated ranking. [sent-247, score-0.262]
78 The time complexity of both learning and inference of the Mallows model with distance dr and df is O(n! [sent-248, score-0.487]
79 We did not compare with the Luce model because it is not straightforward to be applied to supervised rank aggregation, as far as we know. [sent-255, score-0.219]
80 The aggregation accuracies in terms of NDCG are listed in Table 1(a). [sent-259, score-0.219]
81 We did not implement the samplingbased learning algorithm for the Mallows model with distance dt , because in this case the learning algorithm has already been efficient enough. [sent-262, score-0.198]
82 • For the Mallows model, exact learning is a little better than the approximate learning, especially for distance df . [sent-264, score-0.236]
83 Sampling can improve the efficiency of the algorithm, but also miss some information contained in the original permutation probability space. [sent-266, score-0.246]
84 • For the CPS model, the sequential inference does not lead to much accuracy drop as compared to global inference. [sent-267, score-0.137]
85 For distances df and dr , the CPS model outperforms the Mallows model. [sent-268, score-0.376]
86 For example, when df is used, the CPS model wins the Mallows model by about 0. [sent-269, score-0.22]
87 In addition to the comparison of aggregation accuracy, we have also logged the running time of each model. [sent-274, score-0.215]
88 The inference of the Mallows model based algorithms and the global inference of the CPS model based algorithms took more time than sequential inference of the CPS model, although the difference was not significant (this is mainly because n ≤ 8 for MQ2008-small). [sent-356, score-0.316]
89 From these results, we can see that the proposed CPS model plus sequential inference is the most efficient one, and its accuracy is also very good as compared to other methods. [sent-357, score-0.157]
90 Note that the results of the Mallows model based algorithms and that of the CPS model with global inference are not available because of the high computational complexity for their learning or inference. [sent-359, score-0.187]
91 The results show that the CPS model with sequential inference outperforms BordaCount, no matter which distance is used. [sent-360, score-0.23]
92 Moreover, the CPS model with dt performs the best on MQ2008-agg, and the model with dr performs the best on MQ2007-agg. [sent-361, score-0.262]
93 This indicates that we can achieve good ranking performance by choosing the most suitable distances for different datasets (and so applications). [sent-362, score-0.194]
94 This provides a side evidence that it is beneficial for a probabilistic model on permutations to have rich expressiveness. [sent-363, score-0.23]
95 To sum up, the experimental results indicate that the CPS model based learning and sequential inference algorithms can achieve state-of-the-art ranking accuracy and are more efficient than other algorithms. [sent-364, score-0.237]
96 6 Conclusions and Future Work In this paper, we have proposed a new probabilistic model, named the CPS model, on permutations for rank aggregation. [sent-365, score-0.314]
97 The model is based on coset-permutation distance and defined in a stagewise manner. [sent-366, score-0.275]
98 We have applied the model to supervised rank aggregation and investigated how to perform learning and inference. [sent-368, score-0.422]
99 (2) We have applied the CPS model to the supervised case of rank aggregation. [sent-373, score-0.219]
100 LETOR: Benchmark dataset for research on learning to rank for information retrieval. [sent-453, score-0.159]
wordName wordTfidf (topN-words)
[('cps', 0.586), ('mallows', 0.452), ('sn', 0.254), ('permutation', 0.234), ('luce', 0.21), ('aggregation', 0.203), ('rank', 0.159), ('coset', 0.159), ('stagewise', 0.151), ('df', 0.148), ('permutations', 0.12), ('dr', 0.116), ('ndcg', 0.106), ('bordacount', 0.093), ('distance', 0.088), ('expressiveness', 0.08), ('ranking', 0.08), ('distances', 0.076), ('rankings', 0.075), ('dt', 0.074), ('mallapp', 0.066), ('objects', 0.055), ('inference', 0.054), ('letor', 0.053), ('sequential', 0.052), ('tau', 0.046), ('complexity', 0.045), ('sigir', 0.043), ('kendall', 0.043), ('meta', 0.04), ('rich', 0.039), ('spearman', 0.038), ('object', 0.037), ('model', 0.036), ('induced', 0.036), ('probabilistic', 0.035), ('versatile', 0.031), ('position', 0.031), ('dispersion', 0.03), ('inherits', 0.03), ('public', 0.03), ('datasets', 0.027), ('aslam', 0.026), ('footrule', 0.026), ('supervised', 0.024), ('qin', 0.023), ('location', 0.022), ('exp', 0.022), ('effectiveness', 0.021), ('query', 0.02), ('asia', 0.02), ('minj', 0.02), ('sigmod', 0.02), ('decomposes', 0.02), ('ny', 0.019), ('ciency', 0.019), ('lies', 0.019), ('stages', 0.019), ('advantages', 0.019), ('stage', 0.018), ('search', 0.018), ('scores', 0.018), ('ef', 0.017), ('ir', 0.017), ('assigned', 0.017), ('queries', 0.017), ('aggregated', 0.016), ('listed', 0.016), ('retrieval', 0.016), ('global', 0.016), ('relevance', 0.016), ('positions', 0.015), ('mathematically', 0.015), ('polynomial', 0.015), ('score', 0.015), ('accuracy', 0.015), ('avoids', 0.015), ('mainly', 0.014), ('unsupervised', 0.014), ('methodological', 0.014), ('nes', 0.014), ('york', 0.014), ('popular', 0.013), ('usa', 0.013), ('kumar', 0.013), ('ik', 0.013), ('probability', 0.012), ('heuristic', 0.012), ('mcmc', 0.012), ('management', 0.012), ('logged', 0.012), ('montague', 0.012), ('naor', 0.012), ('originated', 0.012), ('tyliu', 0.012), ('induce', 0.011), ('aims', 0.011), ('select', 0.011), ('suitable', 0.011), ('group', 0.011)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999988 9 nips-2010-A New Probabilistic Model for Rank Aggregation
Author: Tao Qin, Xiubo Geng, Tie-yan Liu
Abstract: This paper is concerned with rank aggregation, which aims to combine multiple input rankings to get a better ranking. A popular approach to rank aggregation is based on probabilistic models on permutations, e.g., the Luce model and the Mallows model. However, these models have their limitations in either poor expressiveness or high computational complexity. To avoid these limitations, in this paper, we propose a new model, which is defined with a coset-permutation distance, and models the generation of a permutation as a stagewise process. We refer to the new model as coset-permutation distance based stagewise (CPS) model. The CPS model has rich expressiveness and can therefore be used in versatile applications, because many different permutation distances can be used to induce the coset-permutation distance. The complexity of the CPS model is low because of the stagewise decomposition of the permutation probability and the efficient computation of most coset-permutation distances. We apply the CPS model to supervised rank aggregation, derive the learning and inference algorithms, and empirically study their effectiveness and efficiency. Experiments on public datasets show that the derived algorithms based on the CPS model can achieve state-ofthe-art ranking accuracy, and are much more efficient than previous algorithms.
2 0.13118608 205 nips-2010-Permutation Complexity Bound on Out-Sample Error
Author: Malik Magdon-Ismail
Abstract: We define a data dependent permutation complexity for a hypothesis set H, which is similar to a Rademacher complexity or maximum discrepancy. The permutation complexity is based (like the maximum discrepancy) on dependent sampling. We prove a uniform bound on the generalization error, as well as a concentration result which means that the permutation estimate can be efficiently estimated.
3 0.094158776 185 nips-2010-Nonparametric Density Estimation for Stochastic Optimization with an Observable State Variable
Author: Lauren Hannah, Warren Powell, David M. Blei
Abstract: In this paper we study convex stochastic optimization problems where a noisy objective function value is observed after a decision is made. There are many stochastic optimization problems whose behavior depends on an exogenous state variable which affects the shape of the objective function. Currently, there is no general purpose algorithm to solve this class of problems. We use nonparametric density estimation to take observations from the joint state-outcome distribution and use them to infer the optimal decision for a given query state s. We propose two solution methods that depend on the problem characteristics: function-based and gradient-based optimization. We examine two weighting schemes, kernel based weights and Dirichlet process based weights, for use with the solution methods. The weights and solution methods are tested on a synthetic multi-product newsvendor problem and the hour ahead wind commitment problem. Our results show that in some cases Dirichlet process weights offer substantial benefits over kernel based weights and more generally that nonparametric estimation methods provide good solutions to otherwise intractable problems. 1
4 0.074061304 195 nips-2010-Online Learning in The Manifold of Low-Rank Matrices
Author: Uri Shalit, Daphna Weinshall, Gal Chechik
Abstract: When learning models that are represented in matrix forms, enforcing a low-rank constraint can dramatically improve the memory and run time complexity, while providing a natural regularization of the model. However, naive approaches for minimizing functions over the set of low-rank matrices are either prohibitively time consuming (repeated singular value decomposition of the matrix) or numerically unstable (optimizing a factored representation of the low rank matrix). We build on recent advances in optimization over manifolds, and describe an iterative online learning procedure, consisting of a gradient step, followed by a second-order retraction back to the manifold. While the ideal retraction is hard to compute, and so is the projection operator that approximates it, we describe another second-order retraction that can be computed efficiently, with run time and memory complexity of O ((n + m)k) for a rank-k matrix of dimension m × n, given rank-one gradients. We use this algorithm, LORETA, to learn a matrixform similarity measure over pairs of documents represented as high dimensional vectors. LORETA improves the mean average precision over a passive- aggressive approach in a factorized model, and also improves over a full model trained over pre-selected features using the same memory requirements. LORETA also showed consistent improvement over standard methods in a large (1600 classes) multi-label image classification task. 1
5 0.072132945 277 nips-2010-Two-Layer Generalization Analysis for Ranking Using Rademacher Average
Author: Wei Chen, Tie-yan Liu, Zhi-ming Ma
Abstract: This paper is concerned with the generalization analysis on learning to rank for information retrieval (IR). In IR, data are hierarchically organized, i.e., consisting of queries and documents. Previous generalization analysis for ranking, however, has not fully considered this structure, and cannot explain how the simultaneous change of query number and document number in the training data will affect the performance of the learned ranking model. In this paper, we propose performing generalization analysis under the assumption of two-layer sampling, i.e., the i.i.d. sampling of queries and the conditional i.i.d sampling of documents per query. Such a sampling can better describe the generation mechanism of real data, and the corresponding generalization analysis can better explain the real behaviors of learning to rank algorithms. However, it is challenging to perform such analysis, because the documents associated with different queries are not identically distributed, and the documents associated with the same query become no longer independent after represented by features extracted from query-document matching. To tackle the challenge, we decompose the expected risk according to the two layers, and make use of the new concept of two-layer Rademacher average. The generalization bounds we obtained are quite intuitive and are in accordance with previous empirical studies on the performances of ranking algorithms. 1
6 0.06739983 159 nips-2010-Lifted Inference Seen from the Other Side : The Tractable Features
7 0.053550281 240 nips-2010-Simultaneous Object Detection and Ranking with Weak Supervision
8 0.049759332 250 nips-2010-Spectral Regularization for Support Estimation
9 0.049437597 268 nips-2010-The Neural Costs of Optimal Control
10 0.037955042 48 nips-2010-Collaborative Filtering in a Non-Uniform World: Learning with the Weighted Trace Norm
11 0.037176095 129 nips-2010-Inter-time segment information sharing for non-homogeneous dynamic Bayesian networks
12 0.034863915 186 nips-2010-Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification
13 0.034217294 110 nips-2010-Guaranteed Rank Minimization via Singular Value Projection
14 0.033102248 149 nips-2010-Learning To Count Objects in Images
15 0.032843899 275 nips-2010-Transduction with Matrix Completion: Three Birds with One Stone
16 0.032560021 88 nips-2010-Extensions of Generalized Binary Search to Group Identification and Exponential Costs
17 0.032054279 72 nips-2010-Efficient algorithms for learning kernels from multiple similarity matrices with general convex loss functions
18 0.029723017 74 nips-2010-Empirical Bernstein Inequalities for U-Statistics
19 0.028463563 70 nips-2010-Efficient Optimization for Discriminative Latent Class Models
20 0.027579302 119 nips-2010-Implicit encoding of prior probabilities in optimal neural populations
topicId topicWeight
[(0, 0.091), (1, 0.027), (2, 0.013), (3, -0.013), (4, -0.015), (5, 0.03), (6, 0.009), (7, -0.013), (8, -0.0), (9, 0.011), (10, 0.004), (11, -0.028), (12, -0.045), (13, -0.004), (14, 0.031), (15, 0.026), (16, 0.037), (17, -0.027), (18, 0.017), (19, 0.083), (20, -0.009), (21, 0.04), (22, 0.001), (23, -0.092), (24, 0.082), (25, -0.081), (26, -0.048), (27, -0.044), (28, 0.0), (29, 0.061), (30, -0.112), (31, -0.038), (32, 0.026), (33, -0.144), (34, -0.033), (35, 0.098), (36, 0.138), (37, 0.05), (38, 0.0), (39, 0.019), (40, -0.14), (41, -0.02), (42, -0.018), (43, 0.069), (44, -0.083), (45, -0.11), (46, -0.042), (47, 0.029), (48, -0.013), (49, -0.069)]
simIndex simValue paperId paperTitle
same-paper 1 0.9299075 9 nips-2010-A New Probabilistic Model for Rank Aggregation
Author: Tao Qin, Xiubo Geng, Tie-yan Liu
Abstract: This paper is concerned with rank aggregation, which aims to combine multiple input rankings to get a better ranking. A popular approach to rank aggregation is based on probabilistic models on permutations, e.g., the Luce model and the Mallows model. However, these models have their limitations in either poor expressiveness or high computational complexity. To avoid these limitations, in this paper, we propose a new model, which is defined with a coset-permutation distance, and models the generation of a permutation as a stagewise process. We refer to the new model as coset-permutation distance based stagewise (CPS) model. The CPS model has rich expressiveness and can therefore be used in versatile applications, because many different permutation distances can be used to induce the coset-permutation distance. The complexity of the CPS model is low because of the stagewise decomposition of the permutation probability and the efficient computation of most coset-permutation distances. We apply the CPS model to supervised rank aggregation, derive the learning and inference algorithms, and empirically study their effectiveness and efficiency. Experiments on public datasets show that the derived algorithms based on the CPS model can achieve state-ofthe-art ranking accuracy, and are much more efficient than previous algorithms.
2 0.53211528 277 nips-2010-Two-Layer Generalization Analysis for Ranking Using Rademacher Average
Author: Wei Chen, Tie-yan Liu, Zhi-ming Ma
Abstract: This paper is concerned with the generalization analysis on learning to rank for information retrieval (IR). In IR, data are hierarchically organized, i.e., consisting of queries and documents. Previous generalization analysis for ranking, however, has not fully considered this structure, and cannot explain how the simultaneous change of query number and document number in the training data will affect the performance of the learned ranking model. In this paper, we propose performing generalization analysis under the assumption of two-layer sampling, i.e., the i.i.d. sampling of queries and the conditional i.i.d sampling of documents per query. Such a sampling can better describe the generation mechanism of real data, and the corresponding generalization analysis can better explain the real behaviors of learning to rank algorithms. However, it is challenging to perform such analysis, because the documents associated with different queries are not identically distributed, and the documents associated with the same query become no longer independent after represented by features extracted from query-document matching. To tackle the challenge, we decompose the expected risk according to the two layers, and make use of the new concept of two-layer Rademacher average. The generalization bounds we obtained are quite intuitive and are in accordance with previous empirical studies on the performances of ranking algorithms. 1
3 0.45938769 159 nips-2010-Lifted Inference Seen from the Other Side : The Tractable Features
Author: Abhay Jha, Vibhav Gogate, Alexandra Meliou, Dan Suciu
Abstract: Lifted Inference algorithms for representations that combine first-order logic and graphical models have been the focus of much recent research. All lifted algorithms developed to date are based on the same underlying idea: take a standard probabilistic inference algorithm (e.g., variable elimination, belief propagation etc.) and improve its efficiency by exploiting repeated structure in the first-order model. In this paper, we propose an approach from the other side in that we use techniques from logic for probabilistic inference. In particular, we define a set of rules that look only at the logical representation to identify models for which exact efficient inference is possible. Our rules yield new tractable classes that could not be solved efficiently by any of the existing techniques. 1
4 0.43862236 74 nips-2010-Empirical Bernstein Inequalities for U-Statistics
Author: Thomas Peel, Sandrine Anthoine, Liva Ralaivola
Abstract: We present original empirical Bernstein inequalities for U-statistics with bounded symmetric kernels q. They are expressed with respect to empirical estimates of either the variance of q or the conditional variance that appears in the Bernsteintype inequality for U-statistics derived by Arcones [2]. Our result subsumes other existing empirical Bernstein inequalities, as it reduces to them when U-statistics of order 1 are considered. In addition, it is based on a rather direct argument using two applications of the same (non-empirical) Bernstein inequality for U-statistics. We discuss potential applications of our new inequalities, especially in the realm of learning ranking/scoring functions. In the process, we exhibit an efficient procedure to compute the variance estimates for the special case of bipartite ranking that rests on a sorting argument. We also argue that our results may provide test set bounds and particularly interesting empirical racing algorithms for the problem of online learning of scoring functions. 1
5 0.43636155 205 nips-2010-Permutation Complexity Bound on Out-Sample Error
Author: Malik Magdon-Ismail
Abstract: We define a data dependent permutation complexity for a hypothesis set H, which is similar to a Rademacher complexity or maximum discrepancy. The permutation complexity is based (like the maximum discrepancy) on dependent sampling. We prove a uniform bound on the generalization error, as well as a concentration result which means that the permutation estimate can be efficiently estimated.
6 0.42360795 110 nips-2010-Guaranteed Rank Minimization via Singular Value Projection
7 0.40586799 195 nips-2010-Online Learning in The Manifold of Low-Rank Matrices
8 0.3878668 198 nips-2010-Optimal Web-Scale Tiering as a Flow Problem
9 0.36752895 289 nips-2010-b-Bit Minwise Hashing for Estimating Three-Way Similarities
10 0.35958731 107 nips-2010-Global seismic monitoring as probabilistic inference
11 0.35186097 75 nips-2010-Empirical Risk Minimization with Approximations of Probabilistic Grammars
12 0.34435722 48 nips-2010-Collaborative Filtering in a Non-Uniform World: Learning with the Weighted Trace Norm
13 0.31494999 129 nips-2010-Inter-time segment information sharing for non-homogeneous dynamic Bayesian networks
14 0.31332317 125 nips-2010-Inference and communication in the game of Password
15 0.30560872 72 nips-2010-Efficient algorithms for learning kernels from multiple similarity matrices with general convex loss functions
16 0.30533558 49 nips-2010-Computing Marginal Distributions over Continuous Markov Networks for Statistical Relational Learning
17 0.30215186 281 nips-2010-Using body-anchored priors for identifying actions in single images
18 0.2929067 220 nips-2010-Random Projection Trees Revisited
19 0.28986627 216 nips-2010-Probabilistic Inference and Differential Privacy
20 0.28973114 136 nips-2010-Large-Scale Matrix Factorization with Missing Data under Additional Constraints
topicId topicWeight
[(13, 0.015), (17, 0.025), (23, 0.335), (27, 0.043), (30, 0.04), (35, 0.024), (45, 0.197), (50, 0.03), (52, 0.018), (77, 0.016), (78, 0.04), (90, 0.098)]
simIndex simValue paperId paperTitle
same-paper 1 0.704759 9 nips-2010-A New Probabilistic Model for Rank Aggregation
Author: Tao Qin, Xiubo Geng, Tie-yan Liu
Abstract: This paper is concerned with rank aggregation, which aims to combine multiple input rankings to get a better ranking. A popular approach to rank aggregation is based on probabilistic models on permutations, e.g., the Luce model and the Mallows model. However, these models have their limitations in either poor expressiveness or high computational complexity. To avoid these limitations, in this paper, we propose a new model, which is defined with a coset-permutation distance, and models the generation of a permutation as a stagewise process. We refer to the new model as coset-permutation distance based stagewise (CPS) model. The CPS model has rich expressiveness and can therefore be used in versatile applications, because many different permutation distances can be used to induce the coset-permutation distance. The complexity of the CPS model is low because of the stagewise decomposition of the permutation probability and the efficient computation of most coset-permutation distances. We apply the CPS model to supervised rank aggregation, derive the learning and inference algorithms, and empirically study their effectiveness and efficiency. Experiments on public datasets show that the derived algorithms based on the CPS model can achieve state-ofthe-art ranking accuracy, and are much more efficient than previous algorithms.
2 0.65179163 265 nips-2010-The LASSO risk: asymptotic results and real world examples
Author: Mohsen Bayati, José Pereira, Andrea Montanari
Abstract: We consider the problem of learning a coefficient vector x0 ∈ RN from noisy linear observation y = Ax0 + w ∈ Rn . In many contexts (ranging from model selection to image processing) it is desirable to construct a sparse estimator x. In this case, a popular approach consists in solving an ℓ1 -penalized least squares problem known as the LASSO or Basis Pursuit DeNoising (BPDN). For sequences of matrices A of increasing dimensions, with independent gaussian entries, we prove that the normalized risk of the LASSO converges to a limit, and we obtain an explicit expression for this limit. Our result is the first rigorous derivation of an explicit formula for the asymptotic mean square error of the LASSO for random instances. The proof technique is based on the analysis of AMP, a recently developed efficient algorithm, that is inspired from graphical models ideas. Through simulations on real data matrices (gene expression data and hospital medical records) we observe that these results can be relevant in a broad array of practical applications.
3 0.55934256 178 nips-2010-Multivariate Dyadic Regression Trees for Sparse Learning Problems
Author: Han Liu, Xi Chen
Abstract: We propose a new nonparametric learning method based on multivariate dyadic regression trees (MDRTs). Unlike traditional dyadic decision trees (DDTs) or classification and regression trees (CARTs), MDRTs are constructed using penalized empirical risk minimization with a novel sparsity-inducing penalty. Theoretically, we show that MDRTs can simultaneously adapt to the unknown sparsity and smoothness of the true regression functions, and achieve the nearly optimal rates of convergence (in a minimax sense) for the class of (α, C)-smooth functions. Empirically, MDRTs can simultaneously conduct function estimation and variable selection in high dimensions. To make MDRTs applicable for large-scale learning problems, we propose a greedy heuristics. The superior performance of MDRTs are demonstrated on both synthetic and real datasets. 1
4 0.55852127 137 nips-2010-Large Margin Learning of Upstream Scene Understanding Models
Author: Jun Zhu, Li-jia Li, Li Fei-fei, Eric P. Xing
Abstract: Upstream supervised topic models have been widely used for complicated scene understanding. However, existing maximum likelihood estimation (MLE) schemes can make the prediction model learning independent of latent topic discovery and result in an imbalanced prediction rule for scene classification. This paper presents a joint max-margin and max-likelihood learning method for upstream scene understanding models, in which latent topic discovery and prediction model estimation are closely coupled and well-balanced. The optimization problem is efficiently solved with a variational EM procedure, which iteratively solves an online loss-augmented SVM. We demonstrate the advantages of the large-margin approach on both an 8-category sports dataset and the 67-class MIT indoor scene dataset for scene categorization.
Author: Li-jia Li, Hao Su, Li Fei-fei, Eric P. Xing
Abstract: Robust low-level image features have been proven to be effective representations for a variety of visual recognition tasks such as object recognition and scene classification; but pixels, or even local image patches, carry little semantic meanings. For high level visual tasks, such low-level image representations are potentially not enough. In this paper, we propose a high-level image representation, called the Object Bank, where an image is represented as a scale-invariant response map of a large number of pre-trained generic object detectors, blind to the testing dataset or visual task. Leveraging on the Object Bank representation, superior performances on high level visual recognition tasks can be achieved with simple off-the-shelf classifiers such as logistic regression and linear SVM. Sparsity algorithms make our representation more efficient and scalable for large scene datasets, and reveal semantically meaningful feature patterns.
6 0.55537027 132 nips-2010-Joint Cascade Optimization Using A Product Of Boosted Classifiers
7 0.55141139 272 nips-2010-Towards Holistic Scene Understanding: Feedback Enabled Cascaded Classification Models
8 0.54611492 282 nips-2010-Variable margin losses for classifier design
9 0.54498345 250 nips-2010-Spectral Regularization for Support Estimation
10 0.54462808 86 nips-2010-Exploiting weakly-labeled Web images to improve object classification: a domain adaptation approach
11 0.54298854 205 nips-2010-Permutation Complexity Bound on Out-Sample Error
12 0.54100913 52 nips-2010-Convex Multiple-Instance Learning by Estimating Likelihood Ratio
13 0.53948003 199 nips-2010-Optimal learning rates for Kernel Conjugate Gradient regression
14 0.5393284 238 nips-2010-Short-term memory in neuronal networks through dynamical compressed sensing
15 0.53664494 135 nips-2010-Label Embedding Trees for Large Multi-Class Tasks
16 0.53582311 243 nips-2010-Smoothness, Low Noise and Fast Rates
17 0.53570127 177 nips-2010-Multitask Learning without Label Correspondences
18 0.53454822 174 nips-2010-Multi-label Multiple Kernel Learning by Stochastic Approximation: Application to Visual Object Recognition
19 0.53425789 290 nips-2010-t-logistic regression
20 0.53381592 154 nips-2010-Learning sparse dynamic linear systems using stable spline kernels and exponential hyperpriors