nips nips2012 nips2012-286 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Hossein Azari, David Parks, Lirong Xia
Abstract: Random utility theory models an agent’s preferences on alternatives by drawing a real-valued score on each alternative (typically independently) from a parameterized distribution, and then ranking the alternatives according to scores. A special case that has received significant attention is the Plackett-Luce model, for which fast inference methods for maximum likelihood estimators are available. This paper develops conditions on general random utility models that enable fast inference within a Bayesian framework through MC-EM, providing concave loglikelihood functions and bounded sets of global maxima solutions. Results on both real-world and simulated data provide support for the scalability of the approach and capability for model selection among general random utility models including Plackett-Luce. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract Random utility theory models an agent’s preferences on alternatives by drawing a real-valued score on each alternative (typically independently) from a parameterized distribution, and then ranking the alternatives according to scores. [sent-8, score-0.888]
2 A special case that has received significant attention is the Plackett-Luce model, for which fast inference methods for maximum likelihood estimators are available. [sent-9, score-0.124]
3 This paper develops conditions on general random utility models that enable fast inference within a Bayesian framework through MC-EM, providing concave loglikelihood functions and bounded sets of global maxima solutions. [sent-10, score-0.456]
4 Results on both real-world and simulated data provide support for the scalability of the approach and capability for model selection among general random utility models including Plackett-Luce. [sent-11, score-0.105]
5 1 Introduction Problems of learning with rank-based error metrics [16] and the adoption of learning for the purpose of rank aggregation in social choice [7, 8, 23, 25, 29, 30] are gaining in prominence in recent years. [sent-12, score-0.19]
6 , judges in crowd-sourcing contests, ranking of movies or user-generated content. [sent-15, score-0.149]
7 In the problem of social choice, users submit ordinal preferences consisting of partial or total ranks on the alternatives and a single rank order must be selected to be representative of the reports. [sent-16, score-0.543]
8 Since Condorcet [6], one approach to this problem is to formulate social choice as the problem of estimating a true underlying world state (e. [sent-17, score-0.152]
9 , a true quality ranking of alternatives), where the individual reports are viewed as noisy data in regard to the true state. [sent-19, score-0.149]
10 In this way, social choice can be framed as a problem of inference. [sent-20, score-0.152]
11 In particular, Condorcet assumed the existence of a true ranking over alternatives, with a voter’s preference between any pair of alternatives a, b generated to agree with the true ranking with probability p > 1/2 and disagree otherwise. [sent-21, score-0.632]
12 Condorcet proposed to choose as the outcome of social choice the ranking that maximizes the likelihood of observing the voters’ preferences. [sent-22, score-0.389]
13 Later, Kemeny’s rule was shown to provide the maximum likelihood estimator (MLE) for this model [32]. [sent-23, score-0.088]
14 This ignores the strength in agents’ preferences (the same probability p is adopted for all pairwise comparisons), and allows for cyclic preferences. [sent-25, score-0.132]
15 To overcome the first criticism, a more recent 2 literature adopts the random utility model (RUM) from economics [26]. [sent-27, score-0.105]
16 In RUM, there is a ground truth utility (or score) associated with each alternative. [sent-31, score-0.213]
17 Given this, an agent independently samples a random utility (Xj ) for each alternative cj with conditional distribution µj (·|θj ). [sent-36, score-0.381]
18 , Xm ) generates a distribution on preference orders, as Pr(π | θ) = Pr(Xπ(1) > Xπ(2) > . [sent-47, score-0.087]
19 A popular RUM is Plackett-Luce (P-L) [18, 21], where the random utility terms are generated according to Gumbel distributions with fixed shape parameter [2, 31]. [sent-53, score-0.138]
20 For P-L, the likelihood function has a simple analytical solution, making MLE inference tractable. [sent-54, score-0.16]
21 In application to social choice, the P-L model has been used to analyze political elections [10]. [sent-57, score-0.116]
22 Although P-L overcomes the two difficulties of the Condorcet-Kemeny approach, it is still quite restricted, by assuming that the random utility terms are distributed as Gumbel, with each alternative is characterized by one parameter, which is the mean of its corresponding distribution. [sent-59, score-0.182]
23 1 Our Contributions In this paper we focus on RUMs in which the random utilities are independently generated with respect to distributions in the exponential family (EF) [20]. [sent-63, score-0.102]
24 In addition, for the Estep we suggest a parallelization over the agents and alternatives and a Rao-Blackwellized method, 2 which further increases the scalability of our method. [sent-73, score-0.326]
25 We generally assume that the data provides total orders on alternatives from voters, but comment on how to extend the method and theory to the case where the input preferences are partial orders. [sent-74, score-0.472]
26 We evaluate our approach on synthetic data as well as two real-world datasets, a public election dataset and one involving rank preferences on sushi. [sent-75, score-0.443]
27 For the two real-world datasets we have studied, we compare RUMs with normal distributions and P-L in terms of four criteria: log-likelihood, predictive log-likelihood, Akaike information criterion (AIC), and Bayesian information criterion (BIC). [sent-77, score-0.234]
28 We observe that when the amount of data is not too small, RUMs with normal distributions fit better than P-L. [sent-78, score-0.17]
29 Specifically, for the loglikelihood, predictive log-likelihood, and AIC criteria, RUMs with normal distributions outperform P-L with 95% confidence in both datasets. [sent-79, score-0.234]
30 2 RUMs and Exponential Families In social choice, each agent i ∈ {1, . [sent-80, score-0.222]
31 This provides the data for an inferential approach to social choice. [sent-84, score-0.116]
32 A voting rule r is a mapping that assigns to each preference-profile a set of winning rankings, r : L(C)n → (2L(C) \ ∅). [sent-88, score-0.106]
33 In particular, in the case of ties the set of winning rankings may include more than a singleton ranking. [sent-89, score-0.144]
34 In the maximum likelihood (MLE) approach to social choice, the preference profile is viewed as data, D = {π 1 , . [sent-90, score-0.291]
35 Given this, the probability (likelihood) of the data given ground truth θ n (and for a particular µ) is Pr(D | θ) = i=1 Pr(π i | θ), where, ∞ P (π|θ) = ∞ ∞ . [sent-94, score-0.108]
36 dxπ(n) xπ(1) =xπ(2) (2) The MLE approach to social choice selects as the winning ranking that which corresponds to the θ that maximizes Pr(D | θ). [sent-100, score-0.354]
37 In the case of multiple parameters that maximize the likelihood then the MLE approach returns a set of rankings, one ranking corresponding to each parameterization. [sent-101, score-0.27]
38 3 Global Optimality and Log-Concavity In this section, we provide a condition on distributions that guarantees that the likelihood function (2) is log-concave in parameters θ. [sent-113, score-0.17]
39 We also provide a condition under which the set of MLE solutions is bounded when any one latent parameter is fixed. [sent-114, score-0.139]
40 The random variables ζj ’s do not need to be identically distributed for all alternatives j; e. [sent-118, score-0.247]
41 We focus on computing solutions (θ) to maximize the log-likelihood function, 3 n log Pr(π i | θ) l(θ; D) = (4) i=1 Theorem 1 For the location family, if for every j ≤ m the probability density function for ζj is log-concave, then l(θ; D) is concave. [sent-121, score-0.117]
42 , gR (θ, ζ) are concave functions in R2m where θ is the vector of m parameters and ζ is a vector of m real numbers that are generated according to a distribution whose pdf is logarithmic concave in Rm . [sent-126, score-0.12]
43 Suppose gr (θ, ζ) = θπi (r) + ζπi (r) − θπi (r+1) − ζπi (r+1) i for r = 1, . [sent-132, score-0.125]
44 , gR (θ, ζ) ≥ 0), θ ∈ Rm (6) i This is because gr (θ, ζ) ≥ 0 is equivalent to that in π i alternative π i (r) is preferred to alternative i π (r + 1) in the RUM sense. [sent-138, score-0.209]
45 To see how this extends to the case where preferences are specified as partial orders, we consider in particular an interpretation where an agent’s report for the ranking of mi alternatives implies that i all other alternatives are worse for the agent, in some undefined order. [sent-139, score-0.83]
46 Given this, define gr (θ, ζ) = i i i i θπi (r) + ζπi (r) − θπi (r+1) − ζπi (r+1) for r = 1, . [sent-140, score-0.125]
47 , mi − 1 and gr (θ, ζ) = θπi (mi ) + ζπi (mi ) − i i θπi (r+1) − ζπi (r+1) for r = mi , . [sent-142, score-0.215]
48 Considering that gr (·)s are linear (hence, concave) and i i i using log concavity of the distributions of ζ i = (ζ1 , ζ2 , . [sent-145, score-0.203]
49 , ζm )’s, we can apply Lemma 1 and prove log-concavity of the likelihood function. [sent-147, score-0.088]
50 It is not hard to verify that pdfs for normal and Gumbel are log-concave under reasonable conditions for their parameters, made explicit in the following corollary. [sent-148, score-0.137]
51 Corollary 1 For the location family where each ζj is a normal distribution with mean zero and with fixed variance, or Gumbel distribution with mean zeros and fixed shape parameter, l(θ; D) is concave. [sent-149, score-0.212]
52 in [24], the set of global maxima solutions to the likelihood function, denoted by SD , is convex since the likelihood function is log-concave. [sent-154, score-0.378]
53 [9] proposed the following necessary and sufficient condition for the set of global m maxima solutions to be bounded (more precisely, unique) when j=1 eθj = 1. [sent-157, score-0.298]
54 Condition 1 Given the data D, in every partition of the alternatives C into two nonempty subsets C1 ∪ C2 , there exists c1 ∈ C1 and c2 ∈ C2 such that there is at least one ranking in D where c1 c2 . [sent-158, score-0.396]
55 We next show that Condition 1 is also a necessary and sufficient condition for the set of global maxima solutions SD to be bounded in location families, when we set one of the values θj to be 0 (w. [sent-159, score-0.339]
56 Then, the set SD of global maxima solutions to l(θ; D) is bounded if and only if the data D satisfies Condition 1. [sent-166, score-0.249]
57 Proof sketch: If Condition 1 does not hold, then SD is unbounded because the parameters for all alternatives in C1 can be increased simultaneously to improve the log-likelihood. [sent-167, score-0.247]
58 4 Lemma 2 If alternative j is preferred to alternative j in at least in one ranking then the difference of their mean parameters θj − θj is bounded from above (∃Q where θj − θj < Q) for all the θ that maximize the likelihood function. [sent-169, score-0.401]
59 Now consider a directed graph GD , where the nodes are the alternatives, and there is an edge between cj to cj if in at least one ranking cj cj . [sent-170, score-0.661]
60 By Condition 1, for any pair j = j , there is a path from cj to cj (and conversely, a path from cj to cj ). [sent-171, score-0.602]
61 To see this, consider building a path between j and j by starting from a partition with C1 = {j} and following an edge from j to j1 in the graph where j1 is an alternatives in C2 for which there must be such an edge, by Condition 1. [sent-172, score-0.292]
62 Now that we have the log concavity and bounded property, we need to declare conditions under which the bounded convex space of estimated parameters corresponds to a unique order. [sent-175, score-0.139]
63 The next theorem provides a necessary and sufficient condition for all global maxima to correspond to the same order on alternatives. [sent-176, score-0.241]
64 Suppose that we order the alternatives based on estimated θ’s (meaning that cj is ranked higher than cj iff θj > θj ). [sent-177, score-0.542]
65 Theorem 3 The order over parameters is strict and is the same across all θ ∈ SD if, for all θ ∈ SD and all alternatives j = j , θj = θj . [sent-178, score-0.247]
66 Proof: Suppose for the sake of contradiction there exist two maxima, θ, θ∗ ∈ SD and a pair of ∗ ∗ alternatives j = j such that θj > θj and θj > θj . [sent-179, score-0.247]
67 Still, it seems hard to motivate the imposition of a prior on parameters in many social choice domains. [sent-191, score-0.152]
68 N In each step of our Gibbs sampler for voter i, we randomly choose a position j in π i and sample xi i (j) according to a TruncatedEF distribution Pr(·| xπi (−j) , θt , π i ), where xπi (−j) = π ( xπi (1) , . [sent-196, score-0.141]
69 For example, a truncated normal distribution is illustrated in Figure 2. [sent-204, score-0.137]
70 Finally, we estij mate E{T (xi,k ) | xi,k , π i , θt } in each step j −j of the Gibbs sampler using M samples as N i,k i,t+1 1 k i t Sj k=1 E{T (xj ) | x−j , π , θ } N N k=1 l Pr(xi ,k | j 1 NM Figure 2: A truncated normal distribution. [sent-206, score-0.186]
71 For the case of the normal distribution with fixed variance, where n i,t+1 t+1 1 . [sent-213, score-0.137]
72 Figure 3: The MC-EM algorithm for normal distribution. [sent-215, score-0.137]
73 5 Experimental Results We evaluate the proposed MC-EM algorithm on synthetic data as well as two real world data sets, namely an election data set and a dataset representing preference orders on sushi. [sent-219, score-0.401]
74 For simulated data we use the Kendall correlation [11] between two rank orders (typically between the true order and the method’s result) as a measure of performance. [sent-220, score-0.121]
75 1 Experiments for Synthetic Data We first generate data from Normal models for the random utility terms, with means θj = j and equal variance for all terms, for different choices of variance (Var = 2, 4). [sent-222, score-0.177]
76 The performance in terms of Kendall correlation for recovering ground truth improves for larger number of agents, which corresponds to more data. [sent-225, score-0.108]
77 See Figure 4, which shows the asymptotic behavior of the maximum likelihood estimator in recovering the true parameters. [sent-226, score-0.088]
78 Right panel: Performance given access to sub-samples of the data in the public election dataset, x-axis: size of sub-samples, y-axis: Kendall Correlation with the order obtained from the full data-set. [sent-231, score-0.268]
79 2 Experiments for Model Robustness We apply our method to a public election dataset collected by Nicolaus Tideman [27], where the voters provided partial orders on candidates. [sent-234, score-0.498]
80 A partial order includes comparisons among a subset of alternative, and the non-mentioned alternatives in the partial order are considered to be ranked lower than the lowest ranked alternative among mentioned alternatives. [sent-235, score-0.455]
81 The total number of votes are n = 280 and the number of alternatives m = 15. [sent-236, score-0.28]
82 For the purpose of our experiments, we adopt the order on alternatives obtained by applying our method on the entire dataset as an assumed ground truth, since no ground truth is given as part of the data. [sent-237, score-0.483]
83 After finding the ground truth by using all 280 votes (and adopting a normal model), we compare the performance of our approach as we vary the amount of data available. [sent-238, score-0.278]
84 For example, the ranking obtained by using half of the data 7 can still achieve a fair estimate to the results with full data, with an average Kendall correlation of greater than 0. [sent-246, score-0.149]
85 3 Experiments for Model Fitness In addition to a public election dataset, we have tested our algorithm on a sushi dataset, where 5000 users give rankings over 10 different kinds of sushi [15]. [sent-249, score-0.531]
86 For each experiment we randomly choose n ∈ {10, 20, 30, 40, 50} rankings, apply our MC-EM for RUMs with normal distributions where variances are also parameters. [sent-250, score-0.226]
87 In the former experiments, both the synthetic data generation and the model for election data, the variances were fixed to 1 and hence we had the theoretical guarantees for the convergence to global optimal solutions by Theorem 1 and Theorem 2. [sent-251, score-0.336]
88 For this reason, we adopt RUMs with normal distributions in which the variance is a parameter that is fit by EM along with the mean. [sent-254, score-0.238]
89 We compute the difference between the normal model and P-L in terms of four criteria: log-likelihood (LL), predictive loglikelihood (predictive LL), AIC, and BIC. [sent-256, score-0.25]
90 For (predictive) log-likelihood, a positive value means that normal model fits better than P-L, whereas for AIC and BIC, a negative number means that normal model fits better than P-L. [sent-257, score-0.274]
91 Predictive likelihood is different from likelihood in the sense that we compute the likelihood of the estimated parameters for a part of the data that is not used for parameter estimation. [sent-258, score-0.264]
92 3 In particular, we compute predictive likelihood for a randomly chosen subset of 100 votes. [sent-259, score-0.152]
93 6) Table 1: Model selection for the sushi dataset and election dataset. [sent-295, score-0.317]
94 Cases where the normal model fits better than P-L statistically with 95% confidence are in bold. [sent-296, score-0.137]
95 When n is not too small (n = 50), RUMs with normal distributions fit better than P-L. [sent-298, score-0.17]
96 Specifically, for log-likelihood, predictive log-likelihood, and AIC, RUMs with normal distributions outperform P-L with 95% confidence in both datasets. [sent-299, score-0.234]
97 3 The use of predictive likelihood allows us to evaluate the performance of the estimated parameters on the rest of the data, and is similar in this sense to cross validation for supervised learning. [sent-315, score-0.152]
98 Preference functions that score rankings and maximum likelihood estimation. [sent-332, score-0.179]
99 A maximum likelihood approach for selecting sets of alternatives. [sent-396, score-0.088]
100 Aggregating preferences in multi-issue domains by using eo maximum likelihood estimators. [sent-425, score-0.186]
wordName wordTfidf (topN-words)
[('rums', 0.394), ('alternatives', 0.247), ('exj', 0.21), ('rum', 0.21), ('election', 0.192), ('aic', 0.184), ('pr', 0.172), ('sd', 0.163), ('gumbel', 0.162), ('mle', 0.153), ('ranking', 0.149), ('normal', 0.137), ('lirong', 0.131), ('cj', 0.128), ('gr', 0.125), ('condorcet', 0.116), ('social', 0.116), ('maxima', 0.114), ('agent', 0.106), ('utility', 0.105), ('preferences', 0.098), ('em', 0.094), ('rankings', 0.091), ('xj', 0.089), ('likelihood', 0.088), ('preference', 0.087), ('xia', 0.086), ('sushi', 0.086), ('orders', 0.083), ('kendall', 0.08), ('agents', 0.079), ('conitzer', 0.079), ('nicolaus', 0.079), ('seas', 0.079), ('public', 0.076), ('sj', 0.071), ('kemeny', 0.07), ('bic', 0.066), ('voters', 0.064), ('predictive', 0.064), ('vincent', 0.062), ('concave', 0.06), ('ll', 0.06), ('monte', 0.058), ('ground', 0.057), ('variances', 0.056), ('gibbs', 0.056), ('carlo', 0.053), ('winning', 0.053), ('voting', 0.053), ('gi', 0.053), ('azari', 0.053), ('parkes', 0.053), ('tideman', 0.053), ('truncatedef', 0.053), ('truth', 0.051), ('harvard', 0.051), ('loglikelihood', 0.049), ('condition', 0.049), ('sampler', 0.049), ('bounded', 0.047), ('ford', 0.046), ('thurstone', 0.046), ('tyler', 0.046), ('ariel', 0.046), ('voter', 0.046), ('xi', 0.046), ('mi', 0.045), ('global', 0.045), ('concavity', 0.045), ('path', 0.045), ('partial', 0.044), ('solutions', 0.043), ('criticism', 0.043), ('alternative', 0.042), ('location', 0.041), ('craig', 0.04), ('ex', 0.039), ('ranked', 0.039), ('dataset', 0.039), ('rank', 0.038), ('mallows', 0.037), ('analytical', 0.036), ('inference', 0.036), ('variance', 0.036), ('choice', 0.036), ('overcomes', 0.035), ('utilities', 0.035), ('rg', 0.035), ('cyclic', 0.034), ('family', 0.034), ('lemma', 0.034), ('maximize', 0.033), ('votes', 0.033), ('des', 0.033), ('theorem', 0.033), ('distributions', 0.033), ('panel', 0.033), ('adopt', 0.032), ('james', 0.032)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999964 286 nips-2012-Random Utility Theory for Social Choice
Author: Hossein Azari, David Parks, Lirong Xia
Abstract: Random utility theory models an agent’s preferences on alternatives by drawing a real-valued score on each alternative (typically independently) from a parameterized distribution, and then ranking the alternatives according to scores. A special case that has received significant attention is the Plackett-Luce model, for which fast inference methods for maximum likelihood estimators are available. This paper develops conditions on general random utility models that enable fast inference within a Bayesian framework through MC-EM, providing concave loglikelihood functions and bounded sets of global maxima solutions. Results on both real-world and simulated data provide support for the scalability of the approach and capability for model selection among general random utility models including Plackett-Luce. 1
2 0.15487561 169 nips-2012-Label Ranking with Partial Abstention based on Thresholded Probabilistic Models
Author: Weiwei Cheng, Willem Waegeman, Volkmar Welker, Eyke Hüllermeier
Abstract: Several machine learning methods allow for abstaining from uncertain predictions. While being common for settings like conventional classification, abstention has been studied much less in learning to rank. We address abstention for the label ranking setting, allowing the learner to declare certain pairs of labels as being incomparable and, thus, to predict partial instead of total orders. In our method, such predictions are produced via thresholding the probabilities of pairwise preferences between labels, as induced by a predicted probability distribution on the set of all rankings. We formally analyze this approach for the Mallows and the Plackett-Luce model, showing that it produces proper partial orders as predictions and characterizing the expressiveness of the induced class of partial orders. These theoretical results are complemented by experiments demonstrating the practical usefulness of the approach. 1
3 0.13528612 74 nips-2012-Collaborative Gaussian Processes for Preference Learning
Author: Neil Houlsby, Ferenc Huszar, Zoubin Ghahramani, Jose M. Hernández-lobato
Abstract: We present a new model based on Gaussian processes (GPs) for learning pairwise preferences expressed by multiple users. Inference is simplified by using a preference kernel for GPs which allows us to combine supervised GP learning of user preferences with unsupervised dimensionality reduction for multi-user systems. The model not only exploits collaborative information from the shared structure in user behavior, but may also incorporate user features if they are available. Approximate inference is implemented using a combination of expectation propagation and variational Bayes. Finally, we present an efficient active learning strategy for querying preferences. The proposed technique performs favorably on real-world data against state-of-the-art multi-user preference learning algorithms. 1
4 0.10739545 288 nips-2012-Rational inference of relative preferences
Author: Nisheeth Srivastava, Paul R. Schrater
Abstract: Statistical decision theory axiomatically assumes that the relative desirability of different options that humans perceive is well described by assigning them optionspecific scalar utility functions. However, this assumption is refuted by observed human behavior, including studies wherein preferences have been shown to change systematically simply through variation in the set of choice options presented. In this paper, we show that interpreting desirability as a relative comparison between available options at any particular decision instance results in a rational theory of value-inference that explains heretofore intractable violations of rational choice behavior in human subjects. Complementarily, we also characterize the conditions under which a rational agent selecting optimal options indicated by dynamic value inference in our framework will behave identically to one whose preferences are encoded using a static ordinal utility function. 1
5 0.1000898 323 nips-2012-Statistical Consistency of Ranking Methods in A Rank-Differentiable Probability Space
Author: Yanyan Lan, Jiafeng Guo, Xueqi Cheng, Tie-yan Liu
Abstract: This paper is concerned with the statistical consistency of ranking methods. Recently, it was proven that many commonly used pairwise ranking methods are inconsistent with the weighted pairwise disagreement loss (WPDL), which can be viewed as the true loss of ranking, even in a low-noise setting. This result is interesting but also surprising, given that the pairwise ranking methods have been shown very effective in practice. In this paper, we argue that the aforementioned result might not be conclusive, depending on what kind of assumptions are used. We give a new assumption that the labels of objects to rank lie in a rank-differentiable probability space (RDPS), and prove that the pairwise ranking methods become consistent with WPDL under this assumption. What is especially inspiring is that RDPS is actually not stronger than but similar to the low-noise setting. Our studies provide theoretical justifications of some empirical findings on pairwise ranking methods that are unexplained before, which bridge the gap between theory and applications.
6 0.091061004 165 nips-2012-Iterative ranking from pair-wise comparisons
7 0.090344943 141 nips-2012-GenDeR: A Generic Diversified Ranking Algorithm
8 0.090237767 278 nips-2012-Probabilistic n-Choose-k Models for Classification and Ranking
9 0.080258459 153 nips-2012-How Prior Probability Influences Decision Making: A Unifying Probabilistic Model
10 0.073155887 85 nips-2012-Convergence and Energy Landscape for Cheeger Cut Clustering
11 0.072805613 281 nips-2012-Provable ICA with Unknown Gaussian Noise, with Implications for Gaussian Mixtures and Autoencoders
12 0.071018018 60 nips-2012-Bayesian nonparametric models for ranked data
13 0.070936456 316 nips-2012-Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models
14 0.070024922 75 nips-2012-Collaborative Ranking With 17 Parameters
15 0.068251096 82 nips-2012-Continuous Relaxations for Discrete Hamiltonian Monte Carlo
16 0.068132259 126 nips-2012-FastEx: Hash Clustering with Exponential Families
17 0.06560427 351 nips-2012-Transelliptical Component Analysis
18 0.063604347 349 nips-2012-Training sparse natural image models with a fast Gibbs sampler of an extended state space
19 0.061789226 111 nips-2012-Efficient Sampling for Bipartite Matching Problems
20 0.060204651 138 nips-2012-Fully Bayesian inference for neural models with negative-binomial spiking
topicId topicWeight
[(0, 0.194), (1, 0.004), (2, 0.023), (3, -0.021), (4, -0.091), (5, -0.063), (6, -0.004), (7, 0.085), (8, 0.02), (9, 0.097), (10, -0.109), (11, -0.015), (12, -0.067), (13, -0.014), (14, -0.003), (15, -0.025), (16, 0.057), (17, -0.036), (18, -0.035), (19, 0.002), (20, 0.04), (21, -0.007), (22, -0.034), (23, 0.084), (24, 0.017), (25, -0.03), (26, -0.03), (27, 0.003), (28, -0.019), (29, 0.058), (30, 0.074), (31, -0.11), (32, 0.037), (33, 0.016), (34, 0.021), (35, -0.046), (36, -0.015), (37, -0.113), (38, 0.117), (39, 0.012), (40, -0.027), (41, 0.01), (42, -0.039), (43, 0.083), (44, -0.029), (45, 0.023), (46, 0.093), (47, -0.186), (48, -0.044), (49, -0.1)]
simIndex simValue paperId paperTitle
same-paper 1 0.9156276 286 nips-2012-Random Utility Theory for Social Choice
Author: Hossein Azari, David Parks, Lirong Xia
Abstract: Random utility theory models an agent’s preferences on alternatives by drawing a real-valued score on each alternative (typically independently) from a parameterized distribution, and then ranking the alternatives according to scores. A special case that has received significant attention is the Plackett-Luce model, for which fast inference methods for maximum likelihood estimators are available. This paper develops conditions on general random utility models that enable fast inference within a Bayesian framework through MC-EM, providing concave loglikelihood functions and bounded sets of global maxima solutions. Results on both real-world and simulated data provide support for the scalability of the approach and capability for model selection among general random utility models including Plackett-Luce. 1
2 0.75368243 169 nips-2012-Label Ranking with Partial Abstention based on Thresholded Probabilistic Models
Author: Weiwei Cheng, Willem Waegeman, Volkmar Welker, Eyke Hüllermeier
Abstract: Several machine learning methods allow for abstaining from uncertain predictions. While being common for settings like conventional classification, abstention has been studied much less in learning to rank. We address abstention for the label ranking setting, allowing the learner to declare certain pairs of labels as being incomparable and, thus, to predict partial instead of total orders. In our method, such predictions are produced via thresholding the probabilities of pairwise preferences between labels, as induced by a predicted probability distribution on the set of all rankings. We formally analyze this approach for the Mallows and the Plackett-Luce model, showing that it produces proper partial orders as predictions and characterizing the expressiveness of the induced class of partial orders. These theoretical results are complemented by experiments demonstrating the practical usefulness of the approach. 1
3 0.72565657 288 nips-2012-Rational inference of relative preferences
Author: Nisheeth Srivastava, Paul R. Schrater
Abstract: Statistical decision theory axiomatically assumes that the relative desirability of different options that humans perceive is well described by assigning them optionspecific scalar utility functions. However, this assumption is refuted by observed human behavior, including studies wherein preferences have been shown to change systematically simply through variation in the set of choice options presented. In this paper, we show that interpreting desirability as a relative comparison between available options at any particular decision instance results in a rational theory of value-inference that explains heretofore intractable violations of rational choice behavior in human subjects. Complementarily, we also characterize the conditions under which a rational agent selecting optimal options indicated by dynamic value inference in our framework will behave identically to one whose preferences are encoded using a static ordinal utility function. 1
4 0.65689522 141 nips-2012-GenDeR: A Generic Diversified Ranking Algorithm
Author: Jingrui He, Hanghang Tong, Qiaozhu Mei, Boleslaw Szymanski
Abstract: Diversified ranking is a fundamental task in machine learning. It is broadly applicable in many real world problems, e.g., information retrieval, team assembling, product search, etc. In this paper, we consider a generic setting where we aim to diversify the top-k ranking list based on an arbitrary relevance function and an arbitrary similarity function among all the examples. We formulate it as an optimization problem and show that in general it is NP-hard. Then, we show that for a large volume of the parameter space, the proposed objective function enjoys the diminishing returns property, which enables us to design a scalable, greedy algorithm to find the (1 − 1/e) near-optimal solution. Experimental results on real data sets demonstrate the effectiveness of the proposed algorithm.
5 0.54755086 323 nips-2012-Statistical Consistency of Ranking Methods in A Rank-Differentiable Probability Space
Author: Yanyan Lan, Jiafeng Guo, Xueqi Cheng, Tie-yan Liu
Abstract: This paper is concerned with the statistical consistency of ranking methods. Recently, it was proven that many commonly used pairwise ranking methods are inconsistent with the weighted pairwise disagreement loss (WPDL), which can be viewed as the true loss of ranking, even in a low-noise setting. This result is interesting but also surprising, given that the pairwise ranking methods have been shown very effective in practice. In this paper, we argue that the aforementioned result might not be conclusive, depending on what kind of assumptions are used. We give a new assumption that the labels of objects to rank lie in a rank-differentiable probability space (RDPS), and prove that the pairwise ranking methods become consistent with WPDL under this assumption. What is especially inspiring is that RDPS is actually not stronger than but similar to the low-noise setting. Our studies provide theoretical justifications of some empirical findings on pairwise ranking methods that are unexplained before, which bridge the gap between theory and applications.
6 0.53104401 278 nips-2012-Probabilistic n-Choose-k Models for Classification and Ranking
7 0.52203071 165 nips-2012-Iterative ranking from pair-wise comparisons
8 0.51633376 60 nips-2012-Bayesian nonparametric models for ranked data
9 0.51303273 276 nips-2012-Probabilistic Event Cascades for Alzheimer's disease
10 0.4875111 302 nips-2012-Scaling MPE Inference for Constrained Continuous Markov Random Fields with Consensus Optimization
11 0.48453182 196 nips-2012-Learning with Partially Absorbing Random Walks
12 0.4574953 34 nips-2012-Active Learning of Multi-Index Function Models
13 0.45524937 111 nips-2012-Efficient Sampling for Bipartite Matching Problems
14 0.44668281 74 nips-2012-Collaborative Gaussian Processes for Preference Learning
15 0.44601384 75 nips-2012-Collaborative Ranking With 17 Parameters
16 0.44405597 115 nips-2012-Efficient high dimensional maximum entropy modeling via symmetric partition functions
17 0.44286388 46 nips-2012-Assessing Blinding in Clinical Trials
18 0.42468256 294 nips-2012-Repulsive Mixtures
19 0.422984 230 nips-2012-Multiple Choice Learning: Learning to Produce Multiple Structured Outputs
20 0.41979456 126 nips-2012-FastEx: Hash Clustering with Exponential Families
topicId topicWeight
[(0, 0.02), (21, 0.012), (38, 0.101), (39, 0.013), (42, 0.014), (55, 0.016), (74, 0.04), (76, 0.614), (80, 0.065), (92, 0.027)]
simIndex simValue paperId paperTitle
Author: Assaf Glazer, Michael Lindenbaum, Shaul Markovitch
Abstract: We propose an efficient, generalized, nonparametric, statistical KolmogorovSmirnov test for detecting distributional change in high-dimensional data. To implement the test, we introduce a novel, hierarchical, minimum-volume sets estimator to represent the distributions to be tested. Our work is motivated by the need to detect changes in data streams, and the test is especially efficient in this context. We provide the theoretical foundations of our test and show its superiority over existing methods. 1
2 0.99068433 33 nips-2012-Active Learning of Model Evidence Using Bayesian Quadrature
Author: Michael Osborne, Roman Garnett, Zoubin Ghahramani, David K. Duvenaud, Stephen J. Roberts, Carl E. Rasmussen
Abstract: Numerical integration is a key component of many problems in scientific computing, statistical modelling, and machine learning. Bayesian Quadrature is a modelbased method for numerical integration which, relative to standard Monte Carlo methods, offers increased sample efficiency and a more robust estimate of the uncertainty in the estimated integral. We propose a novel Bayesian Quadrature approach for numerical integration when the integrand is non-negative, such as the case of computing the marginal likelihood, predictive distribution, or normalising constant of a probabilistic model. Our approach approximately marginalises the quadrature model’s hyperparameters in closed form, and introduces an active learning scheme to optimally select function evaluations, as opposed to using Monte Carlo samples. We demonstrate our method on both a number of synthetic benchmarks and a real scientific problem from astronomy. 1
same-paper 3 0.98959142 286 nips-2012-Random Utility Theory for Social Choice
Author: Hossein Azari, David Parks, Lirong Xia
Abstract: Random utility theory models an agent’s preferences on alternatives by drawing a real-valued score on each alternative (typically independently) from a parameterized distribution, and then ranking the alternatives according to scores. A special case that has received significant attention is the Plackett-Luce model, for which fast inference methods for maximum likelihood estimators are available. This paper develops conditions on general random utility models that enable fast inference within a Bayesian framework through MC-EM, providing concave loglikelihood functions and bounded sets of global maxima solutions. Results on both real-world and simulated data provide support for the scalability of the approach and capability for model selection among general random utility models including Plackett-Luce. 1
4 0.98837006 311 nips-2012-Shifting Weights: Adapting Object Detectors from Image to Video
Author: Kevin Tang, Vignesh Ramanathan, Li Fei-fei, Daphne Koller
Abstract: Typical object detectors trained on images perform poorly on video, as there is a clear distinction in domain between the two types of data. In this paper, we tackle the problem of adapting object detectors learned from images to work well on videos. We treat the problem as one of unsupervised domain adaptation, in which we are given labeled data from the source domain (image), but only unlabeled data from the target domain (video). Our approach, self-paced domain adaptation, seeks to iteratively adapt the detector by re-training the detector with automatically discovered target domain examples, starting with the easiest first. At each iteration, the algorithm adapts by considering an increased number of target domain examples, and a decreased number of source domain examples. To discover target domain examples from the vast amount of video data, we introduce a simple, robust approach that scores trajectory tracks instead of bounding boxes. We also show how rich and expressive features specific to the target domain can be incorporated under the same framework. We show promising results on the 2011 TRECVID Multimedia Event Detection [1] and LabelMe Video [2] datasets that illustrate the benefit of our approach to adapt object detectors to video. 1
5 0.98762053 28 nips-2012-A systematic approach to extracting semantic information from functional MRI data
Author: Francisco Pereira, Matthew Botvinick
Abstract: This paper introduces a novel classification method for functional magnetic resonance imaging datasets with tens of classes. The method is designed to make predictions using information from as many brain locations as possible, instead of resorting to feature selection, and does this by decomposing the pattern of brain activation into differently informative sub-regions. We provide results over a complex semantic processing dataset that show that the method is competitive with state-of-the-art feature selection and also suggest how the method may be used to perform group or exploratory analyses of complex class structure. 1
6 0.98424798 169 nips-2012-Label Ranking with Partial Abstention based on Thresholded Probabilistic Models
7 0.9836151 205 nips-2012-MCMC for continuous-time discrete-state systems
8 0.95912349 164 nips-2012-Iterative Thresholding Algorithm for Sparse Inverse Covariance Estimation
9 0.95340794 247 nips-2012-Nonparametric Reduced Rank Regression
10 0.94861829 307 nips-2012-Semi-Crowdsourced Clustering: Generalizing Crowd Labeling by Robust Distance Metric Learning
11 0.90132838 338 nips-2012-The Perturbed Variation
12 0.88774651 142 nips-2012-Generalization Bounds for Domain Adaptation
13 0.88719499 318 nips-2012-Sparse Approximate Manifolds for Differential Geometric MCMC
14 0.8777954 41 nips-2012-Ancestor Sampling for Particle Gibbs
15 0.87453288 264 nips-2012-Optimal kernel choice for large-scale two-sample tests
16 0.87351054 99 nips-2012-Dip-means: an incremental clustering method for estimating the number of clusters
17 0.87277985 291 nips-2012-Reducing statistical time-series problems to binary classification
18 0.87166625 327 nips-2012-Structured Learning of Gaussian Graphical Models
19 0.87145621 203 nips-2012-Locating Changes in Highly Dependent Data with Unknown Number of Change Points
20 0.86875647 317 nips-2012-Smooth-projected Neighborhood Pursuit for High-dimensional Nonparanormal Graph Estimation