nips nips2013 nips2013-128 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Hossein Azari Soufiani, William Chen, David C. Parkes, Lirong Xia
Abstract: In this paper we propose a class of efficient Generalized Method-of-Moments (GMM) algorithms for computing parameters of the Plackett-Luce model, where the data consists of full rankings over alternatives. Our technique is based on breaking the full rankings into pairwise comparisons, and then computing parameters that satisfy a set of generalized moment conditions. We identify conditions for the output of GMM to be unique, and identify a general class of consistent and inconsistent breakings. We then show by theory and experiments that our algorithms run significantly faster than the classical Minorize-Maximization (MM) algorithm, while achieving competitive statistical efficiency. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract In this paper we propose a class of efficient Generalized Method-of-Moments (GMM) algorithms for computing parameters of the Plackett-Luce model, where the data consists of full rankings over alternatives. [sent-11, score-0.158]
2 Our technique is based on breaking the full rankings into pairwise comparisons, and then computing parameters that satisfy a set of generalized moment conditions. [sent-12, score-0.864]
3 We identify conditions for the output of GMM to be unique, and identify a general class of consistent and inconsistent breakings. [sent-13, score-0.103]
4 1 Introduction In many applications, we need to aggregate the preferences of agents over a set of alternatives to produce a joint ranking. [sent-15, score-0.124]
5 For example, in systems for ranking the quality of products, restaurants, or other services, we can generate an aggregate rank through feedback from individual users. [sent-16, score-0.185]
6 This idea of rank aggregation also plays an important role in multiagent systems, meta-search engines [4], belief merging [5], crowdsourcing [15], and many other e-commerce applications. [sent-17, score-0.1]
7 A standard approach towards rank aggregation is to treat input rankings as data generated from a probabilistic model, and then learn the MLE of the input data. [sent-18, score-0.196]
8 This line of research is sometimes referred to as learning to rank [11]. [sent-24, score-0.049]
9 [16] proposed a rank aggregation algorithm, called Rank Centrality (RC), based on computing the stationary distribution of a Markov chain whose transition matrix is defined according to the data (pairwise comparisons among alternatives). [sent-26, score-0.205]
10 The authors describe the approach as being model independent, and prove that for data generated according to BTL, the output of RC converges to the ground truth, and the performance of RC is almost identical to the performance of 1 MLE for BTL. [sent-27, score-0.063]
11 In this paper, we take a generalized method-of-moments (GMM) point of view towards rank aggregation. [sent-30, score-0.072]
12 The main technical contribution of this paper is a class of GMMs for parameter estimation under the PL model, which generalizes BTL and the input consists of full rankings instead of pairwise comparisons as in the case of BTL and RC algorithm. [sent-32, score-0.315]
13 Our algorithms first break full rankings into pairwise comparisons, and then solve the generalized moment conditions to find the parameters. [sent-33, score-0.27]
14 Each of our GMMs is characterized by a way of breaking full rankings. [sent-34, score-0.675]
15 We characterize conditions for the output of the algorithm to be unique, and we also obtain some general characterizations that help us to determine which method of breaking leads to a consistent GMM. [sent-35, score-0.716]
16 Specifically, full breaking (which uses all pairwise comparisons in the ranking) is consistent, but adjacent breaking (which only uses pairwise comparisons in adjacent positions) is inconsistent. [sent-36, score-1.872]
17 We also compare statistical efficiency and running time of these methods experimentally using both synthetic and real-world data, showing that all GMMs run much faster than the MM algorithm. [sent-38, score-0.064]
18 For the synthetic data, we observe that many consistent GMMs converge as fast as the MM algorithm, while there exists a clear tradeoff between computational complexity and statistical efficiency among consistent GMMs. [sent-39, score-0.193]
19 However, we note that our algorithms aggregate full rankings under PL, while the RC algorithm aggregates pairwise comparisons. [sent-41, score-0.274]
20 Moreover, by taking a GMM point of view, we prove the consistency of our algorithms on top of theories for GMMs, while Negahban et al. [sent-43, score-0.027]
21 , dn } denote the data, where each dj is a full ranking over C. [sent-51, score-0.172]
22 The PL model is a parametric model where each alternative ci is m parameterized by γi ∈ (0, 1), such that i=1 γi = 1. [sent-52, score-0.117]
23 Let Ω i=1 ∗ Given γ ∈ Ω, the probability for a ranking d = [ci1 ci2 · · · cim ] is defined as follows. [sent-59, score-0.11]
24 γim−1 γi γi PrPL (d|γ) = m 1 × m2 × ··· × γim−1 + γim l=1 γil l=2 γil In the BTL model, the data is composed of pairwise comparisons instead of rankings, and the γ i1 model is parameterized in the same way as PL, such that PrBTL (ci1 ci2 |γ) = . [sent-60, score-0.157]
25 A GMM is consistent if and only if for any γ ∗ ∈ Ω, GMMg (D, W) converges in probability to γ ∗ as n → ∞ and the data is drawn i. [sent-67, score-0.085]
26 2 It is well-known that GMMg (D, W) is consistent if it satisfies some regularity conditions plus the following condition [7]: Condition 1. [sent-72, score-0.102]
27 MLE as a consistent GMM: Suppose the likelihood function is twice-differentiable, then the MLE is a consistent GMM where g(d, γ) = γ log Pr(d|γ) and Wn = I. [sent-75, score-0.13]
28 [16] proposed the Rank Centrality (RC) algorithm that aggregates pairwise comparisons DP = {Y1 , . [sent-78, score-0.177]
29 1 Let aij denote the number of ci cj in DP and it is assumed that for any i = j, aij + aji = k. [sent-82, score-0.285]
30 Let dmax denote the maximum pairwise defeats for an alternative. [sent-83, score-0.07]
31 Let X ci cj (Y ) = 1 0 if Y = [ci otherwise cj ] ∗ and PRC (Y ) = X ci cl l=i X − cj ci if i = j if i = j . [sent-85, score-0.743]
32 It is not hard to check that the output of RC is the output of GMMgRC . [sent-87, score-0.038]
33 Moreover, GMMgRC satisfies Condition 1 under the BTL model, and as we will show later in Corollary 4, GMMgRC is consistent for BTL. [sent-88, score-0.065]
34 3 Generalized Method-of-Moments for the Plakett-Luce model In this section we introduce our GMMs for rank aggregation under PL. [sent-89, score-0.1]
35 For any full ranking d over C, we let • X ci cj (d) = 1 0 ci d cj otherwise • P (d) be an m × m matrix where P (d)ij = • gF (d, γ) = P (d) · γ and P (D) = 1 n d∈D − X ci cl l=i X cj ci (d) if i = j (d) if i = j P (d) For example, let m = 3, D = {[c1 c2 c3 ], [c2 c3 c1 ]}. [sent-93, score-1.032]
36 Moreover, we will show in Corollary 3 that GMMgF is consistent for PL. [sent-99, score-0.065]
37 In the above example, we count all pairwise comparisons in a full ranking d to build P (d), and define g = P (D) · γ to be linear in γ. [sent-100, score-0.329]
38 In general, we may consider some subset of pairwise comparisons. [sent-101, score-0.07]
39 Intuitively, a breaking is an undirected graph over the m positions in a ranking, such that for any full ranking d, the pairwise comparisons between alternatives in the ith position and jth position are counted to construct PG (d) if and only if {i, j} ∈ G. [sent-103, score-0.984]
40 A breaking is a non-empty undirected graph G whose vertices are {1, . [sent-105, score-0.613]
41 Given any breaking G, any full ranking d over C, and any ci , cj ∈ C, we let 1 The BTL model in [16] is slightly different from that in this paper. [sent-109, score-1.024]
42 3 c • XGi cj (d) = 1 0 {Pos(ci , d), Pos(cj , d)} ∈ G and ci otherwise d cj , where Pos(ci , d) is the posi- tion of ci in d. [sent-111, score-0.478]
43 c • PG (d) be an m × m matrix where PG (d)ij = − XGi cl l=i XG cj ci (d) if i = j (d) if i = j • gG (d, γ) = PG (d) · γ • GMMG (D) be the GMM method that solves Equation (1) for gG and Wn = I. [sent-112, score-0.265]
44 T P We are now ready to present our GMM algorithm (Algorithm 1) parameterized by a breaking G. [sent-131, score-0.613]
45 B 4 Algorithm 1: GMMG (D) Input: A breaking G and data D = {d1 , . [sent-134, score-0.613]
46 For any breaking G and any data D, there exists γ ∈ Ω such that PG (D) · γ = 0. [sent-145, score-0.613]
47 Among the following three conditions, 1 and 2 are equivalent for any breaking G and any data D. [sent-154, score-0.63]
48 Moreover, conditions 1 and 2 are equivalent to condition 3 if and only if G is connected. [sent-155, score-0.054]
49 For the full breaking, adjacent breaking, and any top-k breaking, the three statements in Theorem 2 are equivalent for any data D. [sent-163, score-0.214]
50 [6] identified a necessary and sufficient condition on data D for the MLE under PL to be unique, which is equivalent to condition 1 in Theorem 2. [sent-166, score-0.053]
51 For the full breaking GF , |GMMGF (D)| = 1 if and only if |MLEP L (D)| = 1. [sent-169, score-0.675]
52 2 Consistency We say a breaking G is consistent (for PL), if GMMG is consistent (for PL). [sent-171, score-0.743]
53 Below, we show that some breakings defined in the last subsection are consistent. [sent-172, score-0.238]
54 A breaking G is consistent if and only if Ed|γ ∗ [g(d, γ ∗ )] = 0, which is equivalent to the following equalities: Pr(ci cj |{Pos(ci , d), Pos(cj , d)} ∈ G) γ∗ for all i = j, = i. [sent-175, score-0.817]
55 (2) ∗ Pr(cj ci |{Pos(ci ), Pos(cj )} ∈ G) γj Theorem 4. [sent-176, score-0.117]
56 Continuing, we show that position-k breakings are consistent, then use this and Theorem 4 as building blocks to prove additional consistency results. [sent-182, score-0.248]
57 For any k ≥ 1, the position-k breaking Gk is consistent. [sent-184, score-0.613]
58 The full breaking GF is consistent; for any k, Gk is consistent, and for any k ≥ 2, T Gk is consistent. [sent-188, score-0.675]
59 Adjacent breaking GA is consistent if and only if all components in γ ∗ are the same. [sent-190, score-0.678]
60 5 Lastly, the technique developed in this section can also provide an independent proof that the RC algorithm is consistent for BTL, which is implied by the main theorem in [16]: Corollary 4. [sent-191, score-0.065]
61 By checking similar conditions as we did in the proof of Theorem 3, we can prove that GM MgRC is consistent for BTL. [sent-194, score-0.084]
62 The results in this section suggest that if we want to learn the parameters of PL, we should use consistent breakings, including full breaking, top-k breakings, bottom-k breakings, and position-k breakings. [sent-195, score-0.127]
63 The adjacent breaking seems quite natural, but it is not consistent, thus will not provide a good estimate to the parameters of PL. [sent-196, score-0.748]
64 • GMM (Algorithm 1) with full breaking: O(m2 n + m2. [sent-203, score-0.062]
65 In particular, the GMM with adjacent breaking and top-k breaking for constant k’s are the fastest. [sent-213, score-1.361]
66 However, we recall that the GMM with adjacent breaking is not consistent, while the other algorithms are consistent. [sent-214, score-0.748]
67 We would expect that as data size grows, the GMM with adjacent breaking will provide a relatively poor estimation to γ ∗ compared to the other methods. [sent-215, score-0.748]
68 As it can be seen above the coefficient for n is linear in m for top-k breaking and quadratic for full breaking while it is cubic in m for the MM algorithm. [sent-218, score-1.288]
69 Therefore, it is natural to conjecture that for the same data, GMMGk with large k converges faster than GMMGk with small k. [sent-221, score-0.061]
70 In other words, T T we expect to see the following time-efficiency tradeoff among GMMGk for different k’s, which is T verified by the experimental results in the next section. [sent-222, score-0.039]
71 4 Experiments The running time and statistical efficiency of MM and our GMMs are examined for both synthetic data and a real-world sushi dataset [9]. [sent-225, score-0.135]
72 • Generating the ground truth: for m ≤ 300, the ground truth γ ∗ is generated from the Dirichlet distribution Dir(1). [sent-227, score-0.07]
73 • Generating data: given a ground truth γ ∗ , we generate up to 1000 full rankings from PL. [sent-228, score-0.204]
74 We implemented MM [8] for 1, 3, 10 iterations, as well as GMMs with full breaking, adjacent breaking, and top-k breaking for all k ≤ m − 1. [sent-229, score-0.81]
75 2 • Kendall Rank Correlation Coefficient: Let K(γ, γ ∗ ) denote the Kendall tau distance between the ranking over components in γ and the ranking over components in γ ∗ . [sent-233, score-0.22]
76 The Kendall correlation K(γ,γ ∗ ) is 1 − 2 m(m−1)/2 . [sent-234, score-0.049]
77 The multiple repetitions for the statistical efficiency experiments in Figure 3 and experiments for sushi data in Figure 5 have been done using the odyssey cluster. [sent-237, score-0.091]
78 1 Synthetic Data In this subsection we focus on comparisons among MM, GMM-F (full breaking), and GMM-A (adjacent breaking). [sent-240, score-0.122]
79 For the Kendall correlation criterion, GMM-F (full breaking) has the best performance and GMM-A (adjacent breaking) has the worst performance. [sent-245, score-0.049]
80 The only exception is between GMM-F (full breaking) and MM for Kendall correlation at n = 1000. [sent-248, score-0.049]
81 000 250 500 750 1000 n (agents) 250 500 750 1000 n (agents) Figure 3: The MSE and Kendall correlation of MM (10 iterations), GMM-F (full breaking), and GMM-A (adjacent breaking). [sent-268, score-0.049]
82 2 Time-Efficiency Tradeoff among Top-k Breakings Results on the running time and statistical efficiency for top-k breakings are shown in Figure 4. [sent-271, score-0.259]
83 We recall that top-1 is equivalent to position-1, and top-(m − 1) is equivalent to the full breaking. [sent-272, score-0.096]
84 For n = 100, MSE comparisons between successive top-k breakings are statistically significant at 95% level from (top-1, top-2) to (top-6, top-7). [sent-273, score-0.308]
85 The comparisons in running time are all significant at 95% confidence level. [sent-274, score-0.107]
86 On average, we observe that top-k breakings with smaller k run faster, while top-k breakings with larger k have higher statistical efficiency in both MSE and Kendall correlation. [sent-275, score-0.442]
87 3 Experiments for Real Data In the sushi dataset [9], there are 10 kinds of sushi (m = 10) and the amount of data n is varied, randomly sampling with replacement. [sent-278, score-0.182]
88 We set the ground truth to be the output of MM applied to all 5000 data points. [sent-279, score-0.065]
89 For the running time, we observe the same as for the synthetic data: GMM (adjacent breaking) runs faster than GMM (full breaking), which runs faster than MM (The results on running time can be found in supplementary material B). [sent-280, score-0.104]
90 Comparisons for MSE and Kendall correlation are shown in Figure 5. [sent-281, score-0.049]
91 0000 1 2 3 4 5 6 7 8 9 1 2 k (Top k Breaking) 3 4 5 6 7 8 9 1 k (Top k Breaking) 2 3 4 5 6 7 8 9 k (Top k Breaking) Figure 4: Comparison of GMM with top-k breakings as k is varied. [sent-297, score-0.221]
92 0000 0 1000 2000 3000 4000 5000 1000 2000 n (agents) 3000 n (agents) 4000 5000 1000 2000 3000 4000 5000 n (agents) Figure 5: The MSE and Kendall correlation criteria and computation time for MM (10 iterations), GMM-F (full breaking), and GMM-A (adjacent breaking) on sushi data. [sent-308, score-0.14]
93 Differences between performances are all statistically significant with 95% confidence (with exception of Kendall correlation and both GMM methods for n = 200, where p = 0. [sent-310, score-0.049]
94 This is different from comparisons for synthetic data (Figure 3). [sent-312, score-0.111]
95 We believe that the main reason is because PL does not fit sushi data well, which is a fact recently observed by Azari et al. [sent-313, score-0.091]
96 Therefore, we cannot expect that GMM converges to the output of MM on the sushi dataset, since the consistency results (Corollary 3) assumes that the data is generated under PL. [sent-315, score-0.157]
97 5 Future Work We plan to work on the connection between consistent breakings and preference elicitation. [sent-316, score-0.286]
98 For example, even though the theory in this paper is developed for full ranks, the notion of top-k and bottom-k breaking are implicitly allowing some partial order settings. [sent-317, score-0.675]
99 More specifically, top-k breaking can be achieved from partial orders that include full rankings for the top-k alternatives. [sent-318, score-0.771]
100 Essai sur l’application de l’analyse a la probabilit´ des d´ cisions rendues a la e e ` ` pluralit´ des voix. [sent-339, score-0.046]
wordName wordTfidf (topN-words)
[('breaking', 0.613), ('gmm', 0.359), ('gmmg', 0.251), ('breakings', 0.221), ('mm', 0.21), ('gmms', 0.168), ('btl', 0.162), ('rc', 0.145), ('kendall', 0.138), ('adjacent', 0.135), ('cj', 0.122), ('ci', 0.117), ('pl', 0.114), ('gf', 0.113), ('ranking', 0.11), ('rankings', 0.096), ('sushi', 0.091), ('pg', 0.089), ('comparisons', 0.087), ('gk', 0.084), ('mse', 0.078), ('pos', 0.075), ('pairwise', 0.07), ('consistent', 0.065), ('gm', 0.065), ('wn', 0.063), ('full', 0.062), ('prc', 0.059), ('agents', 0.056), ('aggregation', 0.051), ('rank', 0.049), ('correlation', 0.049), ('azari', 0.048), ('gmmgf', 0.044), ('gmmgk', 0.044), ('gmmgrc', 0.044), ('prpl', 0.044), ('alternatives', 0.042), ('lirong', 0.039), ('mle', 0.039), ('harvard', 0.037), ('parkes', 0.036), ('ciency', 0.035), ('negahban', 0.035), ('kdmax', 0.029), ('mgrc', 0.029), ('prbtl', 0.029), ('yiling', 0.029), ('consistency', 0.027), ('corollary', 0.027), ('cl', 0.026), ('ani', 0.026), ('centrality', 0.026), ('hossein', 0.026), ('sou', 0.026), ('xgi', 0.026), ('agent', 0.026), ('aggregate', 0.026), ('dence', 0.025), ('dp', 0.025), ('synthetic', 0.024), ('ground', 0.024), ('bellevue', 0.024), ('devavrat', 0.024), ('ford', 0.024), ('gg', 0.024), ('seas', 0.024), ('usa', 0.024), ('aij', 0.023), ('generalized', 0.023), ('des', 0.023), ('mallows', 0.023), ('im', 0.022), ('truth', 0.022), ('tahoe', 0.021), ('xia', 0.021), ('conjecture', 0.021), ('tradeoff', 0.021), ('gl', 0.02), ('faster', 0.02), ('running', 0.02), ('aggregates', 0.02), ('converges', 0.02), ('nv', 0.019), ('output', 0.019), ('conditions', 0.019), ('pr', 0.018), ('condition', 0.018), ('among', 0.018), ('lake', 0.018), ('reveals', 0.018), ('il', 0.017), ('equivalent', 0.017), ('intervals', 0.017), ('rq', 0.017), ('wa', 0.017), ('subsection', 0.017), ('calculated', 0.015), ('ij', 0.015), ('ga', 0.015)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999958 128 nips-2013-Generalized Method-of-Moments for Rank Aggregation
Author: Hossein Azari Soufiani, William Chen, David C. Parkes, Lirong Xia
Abstract: In this paper we propose a class of efficient Generalized Method-of-Moments (GMM) algorithms for computing parameters of the Plackett-Luce model, where the data consists of full rankings over alternatives. Our technique is based on breaking the full rankings into pairwise comparisons, and then computing parameters that satisfy a set of generalized moment conditions. We identify conditions for the output of GMM to be unique, and identify a general class of consistent and inconsistent breakings. We then show by theory and experiments that our algorithms run significantly faster than the classical Minorize-Maximization (MM) algorithm, while achieving competitive statistical efficiency. 1
2 0.14165442 167 nips-2013-Learning the Local Statistics of Optical Flow
Author: Dan Rosenbaum, Daniel Zoran, Yair Weiss
Abstract: Motivated by recent progress in natural image statistics, we use newly available datasets with ground truth optical flow to learn the local statistics of optical flow and compare the learned models to prior models assumed by computer vision researchers. We find that a Gaussian mixture model (GMM) with 64 components provides a significantly better model for local flow statistics when compared to commonly used models. We investigate the source of the GMM’s success and show it is related to an explicit representation of flow boundaries. We also learn a model that jointly models the local intensity pattern and the local optical flow. In accordance with the assumptions often made in computer vision, the model learns that flow boundaries are more likely at intensity boundaries. However, when evaluated on a large dataset, this dependency is very weak and the benefit of conditioning flow estimation on the local intensity pattern is marginal. 1
3 0.121314 129 nips-2013-Generalized Random Utility Models with Multiple Types
Author: Hossein Azari Soufiani, Hansheng Diao, Zhenyu Lai, David C. Parkes
Abstract: We propose a model for demand estimation in multi-agent, differentiated product settings and present an estimation algorithm that uses reversible jump MCMC techniques to classify agents’ types. Our model extends the popular setup in Berry, Levinsohn and Pakes (1995) to allow for the data-driven classification of agents’ types using agent-level data. We focus on applications involving data on agents’ ranking over alternatives, and present theoretical conditions that establish the identifiability of the model and uni-modality of the likelihood/posterior. Results on both real and simulated data provide support for the scalability of our approach. 1
4 0.064634323 223 nips-2013-On the Relationship Between Binary Classification, Bipartite Ranking, and Binary Class Probability Estimation
Author: Harikrishna Narasimhan, Shivani Agarwal
Abstract: We investigate the relationship between three fundamental problems in machine learning: binary classification, bipartite ranking, and binary class probability estimation (CPE). It is known that a good binary CPE model can be used to obtain a good binary classification model (by thresholding at 0.5), and also to obtain a good bipartite ranking model (by using the CPE model directly as a ranking model); it is also known that a binary classification model does not necessarily yield a CPE model. However, not much is known about other directions. Formally, these relationships involve regret transfer bounds. In this paper, we introduce the notion of weak regret transfer bounds, where the mapping needed to transform a model from one problem to another depends on the underlying probability distribution (and in practice, must be estimated from data). We then show that, in this weaker sense, a good bipartite ranking model can be used to construct a good classification model (by thresholding at a suitable point), and more surprisingly, also to construct a good binary CPE model (by calibrating the scores of the ranking model). 1
5 0.055567253 46 nips-2013-Bayesian Estimation of Latently-grouped Parameters in Undirected Graphical Models
Author: Jie Liu, David Page
Abstract: In large-scale applications of undirected graphical models, such as social networks and biological networks, similar patterns occur frequently and give rise to similar parameters. In this situation, it is beneficial to group the parameters for more efficient learning. We show that even when the grouping is unknown, we can infer these parameter groups during learning via a Bayesian approach. We impose a Dirichlet process prior on the parameters. Posterior inference usually involves calculating intractable terms, and we propose two approximation algorithms, namely a Metropolis-Hastings algorithm with auxiliary variables and a Gibbs sampling algorithm with “stripped” Beta approximation (Gibbs SBA). Simulations show that both algorithms outperform conventional maximum likelihood estimation (MLE). Gibbs SBA’s performance is close to Gibbs sampling with exact likelihood calculation. Models learned with Gibbs SBA also generalize better than the models learned by MLE on real-world Senate voting data. 1
6 0.049991395 83 nips-2013-Deep Fisher Networks for Large-Scale Image Classification
7 0.04962106 116 nips-2013-Fantope Projection and Selection: A near-optimal convex relaxation of sparse PCA
8 0.047763739 16 nips-2013-A message-passing algorithm for multi-agent trajectory planning
9 0.046020657 26 nips-2013-Adaptive Market Making via Online Learning
10 0.044688862 344 nips-2013-Using multiple samples to learn mixture models
11 0.043933444 257 nips-2013-Projected Natural Actor-Critic
12 0.043309789 118 nips-2013-Fast Determinantal Point Process Sampling with Application to Clustering
13 0.042880118 215 nips-2013-On Decomposing the Proximal Map
14 0.040253174 109 nips-2013-Estimating LASSO Risk and Noise Level
15 0.03859251 229 nips-2013-Online Learning of Nonparametric Mixture Models via Sequential Variational Approximation
16 0.03829721 313 nips-2013-Stochastic Majorization-Minimization Algorithms for Large-Scale Optimization
17 0.034914248 113 nips-2013-Exact and Stable Recovery of Pairwise Interaction Tensors
18 0.034318954 291 nips-2013-Sensor Selection in High-Dimensional Gaussian Trees with Nuisances
19 0.033288192 357 nips-2013-k-Prototype Learning for 3D Rigid Structures
20 0.031631283 190 nips-2013-Mid-level Visual Element Discovery as Discriminative Mode Seeking
topicId topicWeight
[(0, 0.088), (1, 0.025), (2, 0.004), (3, 0.019), (4, 0.012), (5, 0.02), (6, 0.02), (7, -0.015), (8, -0.012), (9, 0.013), (10, 0.021), (11, 0.035), (12, 0.02), (13, -0.023), (14, -0.057), (15, 0.075), (16, 0.047), (17, -0.017), (18, 0.049), (19, 0.035), (20, -0.062), (21, -0.052), (22, -0.022), (23, -0.03), (24, -0.032), (25, -0.05), (26, 0.046), (27, 0.06), (28, 0.025), (29, 0.054), (30, -0.027), (31, 0.059), (32, 0.039), (33, -0.071), (34, -0.105), (35, 0.016), (36, 0.126), (37, 0.009), (38, -0.048), (39, 0.039), (40, 0.012), (41, 0.033), (42, 0.043), (43, -0.058), (44, -0.013), (45, 0.051), (46, -0.12), (47, 0.082), (48, -0.117), (49, 0.063)]
simIndex simValue paperId paperTitle
same-paper 1 0.92394966 128 nips-2013-Generalized Method-of-Moments for Rank Aggregation
Author: Hossein Azari Soufiani, William Chen, David C. Parkes, Lirong Xia
Abstract: In this paper we propose a class of efficient Generalized Method-of-Moments (GMM) algorithms for computing parameters of the Plackett-Luce model, where the data consists of full rankings over alternatives. Our technique is based on breaking the full rankings into pairwise comparisons, and then computing parameters that satisfy a set of generalized moment conditions. We identify conditions for the output of GMM to be unique, and identify a general class of consistent and inconsistent breakings. We then show by theory and experiments that our algorithms run significantly faster than the classical Minorize-Maximization (MM) algorithm, while achieving competitive statistical efficiency. 1
2 0.57022965 129 nips-2013-Generalized Random Utility Models with Multiple Types
Author: Hossein Azari Soufiani, Hansheng Diao, Zhenyu Lai, David C. Parkes
Abstract: We propose a model for demand estimation in multi-agent, differentiated product settings and present an estimation algorithm that uses reversible jump MCMC techniques to classify agents’ types. Our model extends the popular setup in Berry, Levinsohn and Pakes (1995) to allow for the data-driven classification of agents’ types using agent-level data. We focus on applications involving data on agents’ ranking over alternatives, and present theoretical conditions that establish the identifiability of the model and uni-modality of the likelihood/posterior. Results on both real and simulated data provide support for the scalability of our approach. 1
3 0.46679264 167 nips-2013-Learning the Local Statistics of Optical Flow
Author: Dan Rosenbaum, Daniel Zoran, Yair Weiss
Abstract: Motivated by recent progress in natural image statistics, we use newly available datasets with ground truth optical flow to learn the local statistics of optical flow and compare the learned models to prior models assumed by computer vision researchers. We find that a Gaussian mixture model (GMM) with 64 components provides a significantly better model for local flow statistics when compared to commonly used models. We investigate the source of the GMM’s success and show it is related to an explicit representation of flow boundaries. We also learn a model that jointly models the local intensity pattern and the local optical flow. In accordance with the assumptions often made in computer vision, the model learns that flow boundaries are more likely at intensity boundaries. However, when evaluated on a large dataset, this dependency is very weak and the benefit of conditioning flow estimation on the local intensity pattern is marginal. 1
4 0.44586498 16 nips-2013-A message-passing algorithm for multi-agent trajectory planning
Author: Jose Bento, Nate Derbinsky, Javier Alonso-Mora, Jonathan S. Yedidia
Abstract: We describe a novel approach for computing collision-free global trajectories for p agents with specified initial and final configurations, based on an improved version of the alternating direction method of multipliers (ADMM). Compared with existing methods, our approach is naturally parallelizable and allows for incorporating different cost functionals with only minor adjustments. We apply our method to classical challenging instances and observe that its computational requirements scale well with p for several cost functionals. We also show that a specialization of our algorithm can be used for local motion planning by solving the problem of joint optimization in velocity space. 1
5 0.42115876 290 nips-2013-Scoring Workers in Crowdsourcing: How Many Control Questions are Enough?
Author: Qiang Liu, Alex Ihler, Mark Steyvers
Abstract: We study the problem of estimating continuous quantities, such as prices, probabilities, and point spreads, using a crowdsourcing approach. A challenging aspect of combining the crowd’s answers is that workers’ reliabilities and biases are usually unknown and highly diverse. Control items with known answers can be used to evaluate workers’ performance, and hence improve the combined results on the target items with unknown answers. This raises the problem of how many control items to use when the total number of items each workers can answer is limited: more control items evaluates the workers better, but leaves fewer resources for the target items that are of direct interest, and vice versa. We give theoretical results for this problem under different scenarios, and provide a simple rule of thumb for crowdsourcing practitioners. As a byproduct, we also provide theoretical analysis of the accuracy of different consensus methods. 1
6 0.39586517 215 nips-2013-On Decomposing the Proximal Map
7 0.36719576 71 nips-2013-Convergence of Monte Carlo Tree Search in Simultaneous Move Games
8 0.34018803 56 nips-2013-Better Approximation and Faster Algorithm Using the Proximal Average
9 0.33866826 164 nips-2013-Learning and using language via recursive pragmatic reasoning about other agents
10 0.33821869 228 nips-2013-Online Learning of Dynamic Parameters in Social Networks
11 0.33317143 25 nips-2013-Adaptive Anonymity via $b$-Matching
12 0.32852739 328 nips-2013-The Total Variation on Hypergraphs - Learning on Hypergraphs Revisited
13 0.3256785 300 nips-2013-Solving the multi-way matching problem by permutation synchronization
14 0.32048106 332 nips-2013-Tracking Time-varying Graphical Structure
15 0.31990132 282 nips-2013-Robust Multimodal Graph Matching: Sparse Coding Meets Graph Matching
16 0.31747434 202 nips-2013-Multiclass Total Variation Clustering
17 0.31316561 248 nips-2013-Point Based Value Iteration with Optimal Belief Compression for Dec-POMDPs
18 0.31083047 85 nips-2013-Deep content-based music recommendation
19 0.30070692 265 nips-2013-Reconciling "priors" & "priors" without prejudice?
20 0.29758459 357 nips-2013-k-Prototype Learning for 3D Rigid Structures
topicId topicWeight
[(16, 0.022), (33, 0.109), (34, 0.158), (41, 0.036), (49, 0.024), (56, 0.083), (70, 0.017), (85, 0.023), (89, 0.033), (93, 0.035), (95, 0.013), (99, 0.344)]
simIndex simValue paperId paperTitle
same-paper 1 0.80479681 128 nips-2013-Generalized Method-of-Moments for Rank Aggregation
Author: Hossein Azari Soufiani, William Chen, David C. Parkes, Lirong Xia
Abstract: In this paper we propose a class of efficient Generalized Method-of-Moments (GMM) algorithms for computing parameters of the Plackett-Luce model, where the data consists of full rankings over alternatives. Our technique is based on breaking the full rankings into pairwise comparisons, and then computing parameters that satisfy a set of generalized moment conditions. We identify conditions for the output of GMM to be unique, and identify a general class of consistent and inconsistent breakings. We then show by theory and experiments that our algorithms run significantly faster than the classical Minorize-Maximization (MM) algorithm, while achieving competitive statistical efficiency. 1
2 0.79533648 151 nips-2013-Learning Chordal Markov Networks by Constraint Satisfaction
Author: Jukka Corander, Tomi Janhunen, Jussi Rintanen, Henrik Nyman, Johan Pensar
Abstract: We investigate the problem of learning the structure of a Markov network from data. It is shown that the structure of such networks can be described in terms of constraints which enables the use of existing solver technology with optimization capabilities to compute optimal networks starting from initial scores computed from the data. To achieve efficient encodings, we develop a novel characterization of Markov network structure using a balancing condition on the separators between cliques forming the network. The resulting translations into propositional satisfiability and its extensions such as maximum satisfiability, satisfiability modulo theories, and answer set programming, enable us to prove optimal certain networks which have been previously found by stochastic search. 1
3 0.73133504 169 nips-2013-Learning to Prune in Metric and Non-Metric Spaces
Author: Leonid Boytsov, Bilegsaikhan Naidan
Abstract: Our focus is on approximate nearest neighbor retrieval in metric and non-metric spaces. We employ a VP-tree and explore two simple yet effective learning-toprune approaches: density estimation through sampling and “stretching” of the triangle inequality. Both methods are evaluated using data sets with metric (Euclidean) and non-metric (KL-divergence and Itakura-Saito) distance functions. Conditions on spaces where the VP-tree is applicable are discussed. The VP-tree with a learned pruner is compared against the recently proposed state-of-the-art approaches: the bbtree, the multi-probe locality sensitive hashing (LSH), and permutation methods. Our method was competitive against state-of-the-art methods and, in most cases, was more efficient for the same rank approximation quality. 1
4 0.7081185 100 nips-2013-Dynamic Clustering via Asymptotics of the Dependent Dirichlet Process Mixture
Author: Trevor Campbell, Miao Liu, Brian Kulis, Jonathan P. How, Lawrence Carin
Abstract: This paper presents a novel algorithm, based upon the dependent Dirichlet process mixture model (DDPMM), for clustering batch-sequential data containing an unknown number of evolving clusters. The algorithm is derived via a lowvariance asymptotic analysis of the Gibbs sampling algorithm for the DDPMM, and provides a hard clustering with convergence guarantees similar to those of the k-means algorithm. Empirical results from a synthetic test with moving Gaussian clusters and a test with real ADS-B aircraft trajectory data demonstrate that the algorithm requires orders of magnitude less computational time than contemporary probabilistic and hard clustering algorithms, while providing higher accuracy on the examined datasets. 1
5 0.69258869 283 nips-2013-Robust Sparse Principal Component Regression under the High Dimensional Elliptical Model
Author: Fang Han, Han Liu
Abstract: In this paper we focus on the principal component regression and its application to high dimension non-Gaussian data. The major contributions are two folds. First, in low dimensions and under the Gaussian model, by borrowing the strength from recent development in minimax optimal principal component estimation, we first time sharply characterize the potential advantage of classical principal component regression over least square estimation. Secondly, we propose and analyze a new robust sparse principal component regression on high dimensional elliptically distributed data. The elliptical distribution is a semiparametric generalization of the Gaussian, including many well known distributions such as multivariate Gaussian, rank-deficient Gaussian, t, Cauchy, and logistic. It allows the random vector to be heavy tailed and have tail dependence. These extra flexibilities make it very suitable for modeling finance and biomedical imaging data. Under the elliptical model, we prove that our method can estimate the regression coefficients in the optimal parametric rate and therefore is a good alternative to the Gaussian based methods. Experiments on synthetic and real world data are conducted to illustrate the empirical usefulness of the proposed method. 1
6 0.65498579 313 nips-2013-Stochastic Majorization-Minimization Algorithms for Large-Scale Optimization
7 0.53125429 328 nips-2013-The Total Variation on Hypergraphs - Learning on Hypergraphs Revisited
8 0.52927595 143 nips-2013-Integrated Non-Factorized Variational Inference
9 0.52494621 346 nips-2013-Variational Inference for Mahalanobis Distance Metrics in Gaussian Process Regression
10 0.52404201 38 nips-2013-Approximate Dynamic Programming Finally Performs Well in the Game of Tetris
11 0.52301323 219 nips-2013-On model selection consistency of penalized M-estimators: a geometric theory
12 0.52295053 48 nips-2013-Bayesian Inference and Learning in Gaussian Process State-Space Models with Particle MCMC
13 0.52253234 122 nips-2013-First-order Decomposition Trees
14 0.52200681 239 nips-2013-Optimistic policy iteration and natural actor-critic: A unifying view and a non-optimality result
15 0.52173287 175 nips-2013-Linear Convergence with Condition Number Independent Access of Full Gradients
16 0.52086097 202 nips-2013-Multiclass Total Variation Clustering
17 0.52084792 39 nips-2013-Approximate Gaussian process inference for the drift function in stochastic differential equations
18 0.5206821 278 nips-2013-Reward Mapping for Transfer in Long-Lived Agents
19 0.5188669 210 nips-2013-Noise-Enhanced Associative Memories
20 0.51846927 173 nips-2013-Least Informative Dimensions