nips nips2008 nips2008-106 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Srikanth Jagabathula, Devavrat Shah
Abstract: Motivated by applications like elections, web-page ranking, revenue maximization etc., we consider the question of inferring popular rankings using constrained data. More specifically, we consider the problem of inferring a probability distribution over the group of permutations using its first order marginals. We first prove that it is not possible to recover more than O(n) permutations over n elements with the given information. We then provide a simple and novel algorithm that can recover up to O(n) permutations under a natural stochastic model; in this sense, the algorithm is optimal. In certain applications, the interest is in recovering only the most popular (or mode) ranking. As a second result, we provide an algorithm based on the Fourier Transform over the symmetric group to recover the mode under a natural majority condition; the algorithm turns out to be a maximum weight matching on an appropriately defined weighted bipartite graph. The questions considered are also thematically related to Fourier Transforms over the symmetric group and the currently popular topic of compressed sensing. 1
Reference: text
sentIndex sentText sentNum sentScore
1 Inferring rankings under constrained sensing Srikanth Jagabathula Devavrat Shah Laboratory of Information and Decision Systems, Massachusetts Institute of Technology, Cambridge, MA 02139. [sent-1, score-0.258]
2 , we consider the question of inferring popular rankings using constrained data. [sent-4, score-0.304]
3 More specifically, we consider the problem of inferring a probability distribution over the group of permutations using its first order marginals. [sent-5, score-0.452]
4 We first prove that it is not possible to recover more than O(n) permutations over n elements with the given information. [sent-6, score-0.347]
5 We then provide a simple and novel algorithm that can recover up to O(n) permutations under a natural stochastic model; in this sense, the algorithm is optimal. [sent-7, score-0.352]
6 As a second result, we provide an algorithm based on the Fourier Transform over the symmetric group to recover the mode under a natural majority condition; the algorithm turns out to be a maximum weight matching on an appropriately defined weighted bipartite graph. [sent-9, score-0.677]
7 The questions considered are also thematically related to Fourier Transforms over the symmetric group and the currently popular topic of compressed sensing. [sent-10, score-0.525]
8 1 Introduction We consider the question of determining a real-valued function on the space of permutations of n elements with very limited observations. [sent-11, score-0.358]
9 Each person (or voter) has certain ranking of these candidates in mind (consciously or sub-consciously). [sent-16, score-0.231]
10 For example, let 50% of people believe in A > B > C, 30% of people believe in B > A > C and 20% of people believe in C > A > B. [sent-20, score-0.177]
11 However, in the interview we may not be able to ask them their complete ranking of all candidates. [sent-25, score-0.162]
12 Clearly, such restricted information cannot lead to any useful 1 inference of prevalent ordering of candidates in the population if there are too many of them (for large n). [sent-33, score-0.3]
13 Now, in a real world scenario, it is likely that people decide rankings of candidates based on a few issues such as war, abortion, economy and gay marriage. [sent-34, score-0.288]
14 That is, an individual will decide the ranking of the candidates based on the opinions of candidates on these issues. [sent-35, score-0.313]
15 In this paper, we are interested in inferring such few prevalent rankings of candidates and their popularity based on the restricted (or partial) information as explained above. [sent-37, score-0.466]
16 Thematically, this question is similar to the pursuit of compressed sensing. [sent-38, score-0.216]
17 However, as we explain in Section 2, standard compressed sensing does not apply under this setting. [sent-39, score-0.251]
18 We also discuss a natural relation between the available information and the Fourier coefficients of the Fourier transformation based on group representation (see Proposition 1). [sent-40, score-0.249]
19 It turns out that the problem we consider is equivalent to that of recovery of a function over a symmetric group using the first order Fourier coefficients. [sent-41, score-0.375]
20 Thus, our problem is thematically related to the recovery of functions over non-commutative groups using a limited set of Fourier coefficients. [sent-42, score-0.333]
21 As we show in Section 2, a naive recovery by setting the unknown Fourier coefficients to zero yields a very bad result. [sent-43, score-0.181]
22 In many applications, one is specifically interested in finding out the most popular ranking (or mode) rather than all the prevalent rankings. [sent-45, score-0.221]
23 We establish that under the natural majority condition, our algorithm finds the correct mode (see Theorem 2). [sent-47, score-0.239]
24 Interestingly enough, our algorithm to find an estimate of the mode corresponds to finding a maximum weight matching in a weighted bipartite graph of n nodes. [sent-48, score-0.332]
25 We start describing the setup, the problem statement, and the relation to compressed sensing and Fourier transform based approaches in Section 2. [sent-50, score-0.412]
26 Sn is also known as the symmetric group of degree n. [sent-59, score-0.223]
27 Let f : Sn → [0, 1] denote a mapping from the symmetric group to the interval [0, 1]. [sent-60, score-0.194]
28 Without loss of generality we assume that the permutations are labeled such that pk ≤ pm for k < m. [sent-65, score-0.402]
29 The set of permutations for which f (·) is non-zero will be called the support of f (·); also, the cardinality of the support will be called sparsity of f and is denoted by K i. [sent-67, score-0.368]
30 Each permutation σ will σ be represented by its corresponding permutation matrix denoted by P σ i. [sent-70, score-0.619]
31 We use the terms permutation and permutation matrix interchangeably. [sent-74, score-0.619]
32 We think of permutations as complete matchings in a bipartite graph. [sent-75, score-0.394]
33 Specifically, we consider an n × n bipartite graph and each permutation corresponds to a complete matching in the graph. [sent-76, score-0.48]
34 The edges in a permutation will refer to the edges in the corresponding bipartite matching. [sent-77, score-0.435]
35 For 1 ≤ i, j ≤ n, let qij := f (σ) (1) σ∈Sn :σ(j)=i Let Q denote both the matrix (qij )n×n and the vector (qij )n2 ×1 . [sent-78, score-0.179]
36 In the election example, it is easy to see that qij denotes the fraction of voters that have ranked candidate j in the ith position. [sent-82, score-0.272]
37 We will first prove, using information theoretic techniques, that recovery is asymptotically reliable (average probability of error goes to zero as 2 n → ∞) only if K = O(n). [sent-85, score-0.242]
38 We then provide a novel algorithm that recovers prevalent rankings and their popularity exactly under minimal (essentially necessary) conditions; under a natural stochastic model, this algorithm recovers up to O(n) permutations. [sent-86, score-0.374]
39 It is often the case that the full knowledge of functional values at all permutations is not required. [sent-88, score-0.3]
40 Specifically, in scenarios such as ranked elections, interest is in finding the most likely permutation i. [sent-89, score-0.34]
41 Theorem 2 proves that the max-weight matching yields the most likely permutation under natural majority assumption. [sent-92, score-0.452]
42 As we shall show soon, the matrix Q is related to the first two Fourier coefficients of the Fourier Transform of the distribution over the permutation group. [sent-95, score-0.363]
43 Thus, the problem we are considering can be restated as that of reconstructing a distribution over the permutation group from its first two Fourier coefficients. [sent-96, score-0.501]
44 Reconstructing distributions over the permutation group from a limited number of Fourier coefficients has several applications. [sent-97, score-0.46]
45 Specifically, there has been some recent work on multi-object tracking (see [4] and [3]), in which they approach the daunting task of maintaining a distribution over the permutation group by approximating it using the first few Fourier coefficients. [sent-98, score-0.445]
46 We will now discuss the Fourier Transform of a function on the permutation group, which provides another possible approach for recovery of f . [sent-100, score-0.466]
47 However, as established in Theorem 2 this leads to recovery of mode or most likely assignment of f under natural majority condition. [sent-104, score-0.42]
48 Next, some details on what the Fourier transform (an interested reader is requested to check [5] for missing details) based approach is, how Q can be used to obtain an approximation of f and why it does not recover f exactly. [sent-105, score-0.171]
49 The details relevant to recovery of mode of f will be associated with Theorem 2. [sent-106, score-0.32]
50 We can obtain a solution to the set of linear equations in (8) using the Fourier Transforms at symmetric group representations. [sent-108, score-0.194]
51 For a function h : G → R on ˆ group G, its Fourier Transform at a representation ρ of G is defined as hρ = σ h(σ)ρ(σ). [sent-109, score-0.173]
52 The collection of Fourier Transforms of h(·) at a complete set of inequivalent irreducible representations of G completely determine the function. [sent-110, score-0.273]
53 This follows from the following expression for the inverse Fourier Transform: 1 ˆ h(σ) = dρk Tr hTk ρk (σ) (2) ρ |G| k where |G| denotes the cardinality of G, dρk denotes the degree of representation ρk and k indexes over the complete set of inequivalent irreducible representations of G. [sent-111, score-0.379]
54 The trivial representation of a group is the 1-dimensional representation ρ0 (σ) = 1, ∀ σ ∈ G. [sent-112, score-0.215]
55 The above naturally suggests an approximation based on a limited number of Fourier coefficients with respect to a certain subset of irreducible representations. [sent-115, score-0.195]
56 We will show that, indeed, the information matrix Q corresponds to the Fourier coefficient with respect to the first-order representation of the symmetric group Sn . [sent-116, score-0.285]
57 It is known that [5] the first order permutation representation of Sn , denoted by τ1 , has a degree n and maps every permutation σ to its corresponding permutation matrix P σ . [sent-118, score-0.975]
58 Even though τ1 is not an irreducible representation, it is known that [5] that every representation of a group is equivalent to the direct sum of irreducible representations. [sent-122, score-0.475]
59 In particular, τ1 can be decomposed into τ1 = ρ0 ⊕ ρ1 ; where ρ0 is the aforementioned trivial representation of degree 1 and ρ1 is an irreducible representation of degree n − 1. [sent-123, score-0.293]
60 Thus, Q is related to the Fourier Transforms of the irreducible representations ρ0 and ρ1 . [sent-125, score-0.183]
61 Then, its natural Fourier approximation obtained by looking at the Fourier ˜ coefficients of the relevant irreducible representations is given by the function f : Sn → R defined as: Q, P σ n−2 ˜ f (σ) = (n − 1) − (3) N N ˜ ˜ for σ ∈ Sn , with N = n! [sent-129, score-0.218]
62 Unfortunately, the solution is not sparse and the “mass” is distributed over all the permutations yielding values of O(1/N ) for all permutations. [sent-138, score-0.296]
63 2 Relation to Compressed Sensing Here we discuss the relation of the above stated question to the recently popular topic of compressed sensing. [sent-141, score-0.283]
64 This is primarily because in the standard compressed sensing setup, samples are chosen as “random projections” while here samples are highly constrained and provide information matrix Q. [sent-144, score-0.334]
65 As discussed earlier, this is a reasonable assumption in our case because: (a) the total number of permutations N can be very large even for a reasonably sized n and (b) most functions f (·) that arise in practice are determined by a small (when compared to N ) number of parameters. [sent-151, score-0.266]
66 Under a restriction on the isometry constants of the matrix A, Candes and Tao prove that the solution f is the unique minimizer to the LP: min x ℓ1 s. [sent-152, score-0.226]
67 4 Ax = Q (9) Unfortunately, the approach of Candes and Tao cannot be directly applied to our problem because the isometry constants of the matrix A do not satisfy the required conditions. [sent-154, score-0.196]
68 Here, id refers to the identity permutation and the permutations are represented using the cycle notation. [sent-160, score-0.597]
69 Therefore, the compressed sensing approach of Candes and Tao does not guarantee the unique reconstruction of f if f ℓ0 ≥ 4. [sent-169, score-0.294]
70 The main result of this paper is about the exact recovery of f from the given constrained information matrix Q = (qij ) under the hypothesis that f is sparse or has small f ℓ0 . [sent-171, score-0.294]
71 In other words, each permutation has at least one edge that is different from all the others. [sent-188, score-0.33]
72 Theorem 1 asserts that when properties P1 and P2 are satisfied, exact recovery is possible. [sent-195, score-0.181]
73 Let’s go back to the counter-example we mentioned before: For any n ≥ 4 consider the 4 permutations σ1 = id, σ2 = (12), σ3 = (34) and σ4 = (12)(34). [sent-198, score-0.266]
74 Under the random model, we assume that the function f with sparsity K is constructed as follows: Choose K permutations uniformly at random and let them have any non-trivial real functional values chosen uniformly at random from a bounded interval and then normalized. [sent-211, score-0.367]
75 Given the matrix Q = Af , and no additional information, the recovery will be asymptotically reliable only if K ≤ 4n. [sent-215, score-0.291]
76 First note that a trivial bound of (n − 1)2 can be readily obtained as follows: Since Q is doubly stochastic, it can be written as a convex combination of permutation matrices [7], which form a space of dimension (n − 1)2 . [sent-216, score-0.325]
77 σ∈Sn σ∈Sn σ∈Sn ˜ The mode of f , or maximizer of P σ , Q is essentially the maximum weight matching in a weighted bipartite graph: consider a complete bipartite graph G = ((V1 , V2 ), E) with V1 = V2 = {1, . [sent-235, score-0.46]
78 , n} and E = V1 × V2 with edge (i, j) ∈ E having weight qij . [sent-238, score-0.213]
79 Then, weight of a matching (equivalently permutation σ) is indeed P σ , Q . [sent-239, score-0.39]
80 Without loss of generality assume that the permutations are labeled such that pi ≤ pj for i < j. [sent-255, score-0.318]
81 Let ei denote the edge (u, v) such that qi = qei = quv , where recall that quv = f (σk ) = k:σk (u)=v pk . [sent-261, score-0.418]
82 k:σk (u)=v Let Ak denote the set of edges corresponding to permutation σk , 1 ≤ k ≤ L. [sent-262, score-0.316]
83 By property P2, there exists at least one qi such that it is equal to pk , for each 1 ≤ k ≤ L. [sent-271, score-0.208]
84 The property P1 ensures that whenever qi = pk(i) , the condition in the “if” statement of the pseudocode is not satisfied. [sent-272, score-0.169]
85 Note that the condition in the “if” statement being true implies that edge ei is present in all the permutations σj such that j ∈ J. [sent-274, score-0.473]
86 Therefore, when the condition is satisfied, the only permutations that contain edge ei are σj , j ∈ J. [sent-276, score-0.405]
87 When the condition in the “if” statement fails, again from properties P1 and P2 it follows that edge ei is contained only in permutation σk(i) . [sent-277, score-0.492]
88 The algorithm we have proposed is use the mode of ˜ f , as an estimate of mode of f . [sent-282, score-0.278]
89 6 Conclusion In summary, we considered the problem of inferring popular rankings from highly constrained information. [sent-289, score-0.256]
90 7 Specifically, we considered the problem of inferring a sparse normalized function on the symmetric group using only the first order information about the function. [sent-291, score-0.279]
91 In the election example this first order information corresponds to the fraction of people who have ranked candidate i in the j th position. [sent-292, score-0.201]
92 We provide a novel algorithm to precisely recover the permutations and the associated popularity under minimal, and essentially necessary, conditions. [sent-293, score-0.379]
93 We also provide an algorithm, based on Fourier transform approximation, to determine the most popular ranking (mode of the function). [sent-295, score-0.254]
94 The question considered is thematically related to harmonic analysis of functions over the symmetric group and also the currently popular topic of compressed sensing. [sent-300, score-0.544]
95 On the other hand, the parallels to the to the standard compressed sensing setup are limited because the available information is highly constrained. [sent-302, score-0.335]
96 Thus, the existing approaches of compressed sensing cannot be applied to the problem. [sent-303, score-0.251]
97 We concentrated on the recovery of the distribution from its first order marginals. [sent-305, score-0.181]
98 A possible next step would be to consider recovery under different forms of partial information. [sent-306, score-0.181]
99 More specifically, practical applications motivate considering the recovery of distribution from pair-wise information: probability of candidate i being ranked above candidate j. [sent-307, score-0.324]
100 Understanding recovery of distributions with the above considerations are natural next steps. [sent-309, score-0.216]
wordName wordTfidf (topN-words)
[('fourier', 0.457), ('sn', 0.362), ('permutation', 0.285), ('permutations', 0.266), ('recovery', 0.181), ('irreducible', 0.151), ('mode', 0.139), ('compressed', 0.139), ('pk', 0.136), ('group', 0.131), ('qij', 0.13), ('transform', 0.12), ('candidates', 0.117), ('rankings', 0.112), ('sensing', 0.112), ('isometry', 0.111), ('thematically', 0.108), ('candes', 0.099), ('bipartite', 0.088), ('prevalent', 0.087), ('ranking', 0.079), ('ak', 0.075), ('tao', 0.074), ('qi', 0.072), ('statement', 0.068), ('matching', 0.067), ('sparsity', 0.067), ('coef', 0.066), ('majority', 0.065), ('ei', 0.065), ('shah', 0.065), ('transforms', 0.063), ('symmetric', 0.063), ('popularity', 0.062), ('tr', 0.062), ('people', 0.059), ('popular', 0.055), ('inferring', 0.055), ('ranked', 0.055), ('pj', 0.052), ('recover', 0.051), ('bayati', 0.05), ('devavrat', 0.05), ('elections', 0.05), ('gambling', 0.05), ('inequivalent', 0.05), ('quv', 0.05), ('matrix', 0.049), ('cients', 0.048), ('question', 0.048), ('theorem', 0.047), ('id', 0.046), ('edge', 0.045), ('limited', 0.044), ('candidate', 0.044), ('election', 0.043), ('interview', 0.043), ('restated', 0.043), ('revenue', 0.043), ('reconstruction', 0.043), ('reconstructing', 0.042), ('representation', 0.042), ('relation', 0.041), ('proposition', 0.04), ('doubly', 0.04), ('complete', 0.04), ('setup', 0.04), ('recovers', 0.039), ('cally', 0.038), ('weight', 0.038), ('constants', 0.036), ('arg', 0.036), ('natural', 0.035), ('cardinality', 0.035), ('person', 0.035), ('constrained', 0.034), ('functional', 0.034), ('population', 0.034), ('restricted', 0.033), ('possess', 0.032), ('representations', 0.032), ('lemma', 0.032), ('edges', 0.031), ('reliable', 0.031), ('ax', 0.031), ('counter', 0.031), ('rank', 0.031), ('asymptotically', 0.03), ('sparse', 0.03), ('motivating', 0.03), ('prove', 0.03), ('recovered', 0.029), ('degree', 0.029), ('questions', 0.029), ('af', 0.029), ('ordering', 0.029), ('tracking', 0.029), ('pursuit', 0.029), ('condition', 0.029), ('shall', 0.029)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 106 nips-2008-Inferring rankings under constrained sensing
Author: Srikanth Jagabathula, Devavrat Shah
Abstract: Motivated by applications like elections, web-page ranking, revenue maximization etc., we consider the question of inferring popular rankings using constrained data. More specifically, we consider the problem of inferring a probability distribution over the group of permutations using its first order marginals. We first prove that it is not possible to recover more than O(n) permutations over n elements with the given information. We then provide a simple and novel algorithm that can recover up to O(n) permutations under a natural stochastic model; in this sense, the algorithm is optimal. In certain applications, the interest is in recovering only the most popular (or mode) ranking. As a second result, we provide an algorithm based on the Fourier Transform over the symmetric group to recover the mode under a natural majority condition; the algorithm turns out to be a maximum weight matching on an appropriately defined weighted bipartite graph. The questions considered are also thematically related to Fourier Transforms over the symmetric group and the currently popular topic of compressed sensing. 1
2 0.19967514 44 nips-2008-Characteristic Kernels on Groups and Semigroups
Author: Kenji Fukumizu, Arthur Gretton, Bernhard Schölkopf, Bharath K. Sriperumbudur
Abstract: Embeddings of random variables in reproducing kernel Hilbert spaces (RKHSs) may be used to conduct statistical inference based on higher order moments. For sufficiently rich (characteristic) RKHSs, each probability distribution has a unique embedding, allowing all statistical properties of the distribution to be taken into consideration. Necessary and sufficient conditions for an RKHS to be characteristic exist for Rn . In the present work, conditions are established for an RKHS to be characteristic on groups and semigroups. Illustrative examples are provided, including characteristic kernels on periodic domains, rotation matrices, and Rn . + 1
3 0.1127287 113 nips-2008-Kernelized Sorting
Author: Novi Quadrianto, Le Song, Alex J. Smola
Abstract: Object matching is a fundamental operation in data analysis. It typically requires the definition of a similarity measure between the classes of objects to be matched. Instead, we develop an approach which is able to perform matching by requiring a similarity measure only within each of the classes. This is achieved by maximizing the dependency between matched pairs of observations by means of the Hilbert Schmidt Independence Criterion. This problem can be cast as one of maximizing a quadratic assignment problem with special structure and we present a simple algorithm for finding a locally optimal solution. 1
4 0.10771056 179 nips-2008-Phase transitions for high-dimensional joint support recovery
Author: Sahand Negahban, Martin J. Wainwright
Abstract: Given a collection of r ≥ 2 linear regression problems in p dimensions, suppose that the regression coefficients share partially common supports. This set-up suggests the use of 1 / ∞ -regularized regression for joint estimation of the p × r matrix of regression coefficients. We analyze the high-dimensional scaling of 1 / ∞ -regularized quadratic programming, considering both consistency rates in ∞ -norm, and also how the minimal sample size n required for performing variable selection grows as a function of the model dimension, sparsity, and overlap between the supports. We begin by establishing bounds on the ∞ error as well sufficient conditions for exact variable selection for fixed design matrices, as well as designs drawn randomly from general Gaussian matrices. These results show that the high-dimensional scaling of 1 / ∞ -regularization is qualitatively similar to that of ordinary 1 -regularization. Our second set of results applies to design matrices drawn from standard Gaussian ensembles, for which we provide a sharp set of necessary and sufficient conditions: the 1 / ∞ -regularized method undergoes a phase transition characterized by the rescaled sample size θ1,∞ (n, p, s, α) = n/{(4 − 3α)s log(p − (2 − α) s)}. More precisely, for any δ > 0, the probability of successfully recovering both supports converges to 1 for scalings such that θ1,∞ ≥ 1 + δ, and converges to 0 for scalings for which θ1,∞ ≤ 1 − δ. An implication of this threshold is that use of 1,∞ -regularization yields improved statistical efficiency if the overlap parameter is large enough (α > 2/3), but performs worse than a naive Lasso-based approach for moderate to small overlap (α < 2/3). We illustrate the close agreement between these theoretical predictions, and the actual behavior in simulations. 1
5 0.1061428 215 nips-2008-Sparse Signal Recovery Using Markov Random Fields
Author: Volkan Cevher, Marco F. Duarte, Chinmay Hegde, Richard Baraniuk
Abstract: Compressive Sensing (CS) combines sampling and compression into a single subNyquist linear measurement process for sparse and compressible signals. In this paper, we extend the theory of CS to include signals that are concisely represented in terms of a graphical model. In particular, we use Markov Random Fields (MRFs) to represent sparse signals whose nonzero coefficients are clustered. Our new model-based recovery algorithm, dubbed Lattice Matching Pursuit (LaMP), stably recovers MRF-modeled signals using many fewer measurements and computations than the current state-of-the-art algorithms.
6 0.10243719 202 nips-2008-Robust Regression and Lasso
7 0.10087827 190 nips-2008-Reconciling Real Scores with Binary Comparisons: A New Logistic Based Model for Ranking
8 0.092455953 135 nips-2008-Model Selection in Gaussian Graphical Models: High-Dimensional Consistency of \boldmath$\ell 1$-regularized MLE
9 0.081104234 93 nips-2008-Global Ranking Using Continuous Conditional Random Fields
10 0.081070602 216 nips-2008-Sparse probabilistic projections
11 0.074097253 99 nips-2008-High-dimensional support union recovery in multivariate regression
12 0.07373628 165 nips-2008-On the Reliability of Clustering Stability in the Large Sample Regime
13 0.071487583 30 nips-2008-Bayesian Experimental Design of Magnetic Resonance Imaging Sequences
14 0.067310125 51 nips-2008-Convergence and Rate of Convergence of a Manifold-Based Dimension Reduction Algorithm
15 0.066409171 21 nips-2008-An Homotopy Algorithm for the Lasso with Online Observations
16 0.0645006 198 nips-2008-Resolution Limits of Sparse Coding in High Dimensions
17 0.063993111 238 nips-2008-Theory of matching pursuit
18 0.063565552 34 nips-2008-Bayesian Network Score Approximation using a Metagraph Kernel
19 0.062924564 170 nips-2008-Online Optimization in X-Armed Bandits
20 0.057346709 175 nips-2008-PSDBoost: Matrix-Generation Linear Programming for Positive Semidefinite Matrices Learning
topicId topicWeight
[(0, -0.185), (1, -0.038), (2, -0.08), (3, 0.077), (4, 0.072), (5, -0.055), (6, 0.0), (7, -0.038), (8, 0.017), (9, 0.038), (10, -0.073), (11, -0.145), (12, 0.017), (13, 0.077), (14, 0.002), (15, 0.047), (16, 0.102), (17, 0.009), (18, 0.06), (19, 0.059), (20, 0.002), (21, -0.025), (22, 0.031), (23, -0.016), (24, 0.058), (25, 0.002), (26, -0.15), (27, 0.034), (28, 0.153), (29, 0.012), (30, 0.146), (31, -0.037), (32, -0.018), (33, -0.041), (34, 0.029), (35, 0.011), (36, 0.11), (37, -0.011), (38, 0.104), (39, 0.149), (40, -0.053), (41, -0.082), (42, -0.232), (43, 0.063), (44, 0.079), (45, -0.073), (46, -0.151), (47, 0.045), (48, 0.079), (49, 0.035)]
simIndex simValue paperId paperTitle
same-paper 1 0.96181154 106 nips-2008-Inferring rankings under constrained sensing
Author: Srikanth Jagabathula, Devavrat Shah
Abstract: Motivated by applications like elections, web-page ranking, revenue maximization etc., we consider the question of inferring popular rankings using constrained data. More specifically, we consider the problem of inferring a probability distribution over the group of permutations using its first order marginals. We first prove that it is not possible to recover more than O(n) permutations over n elements with the given information. We then provide a simple and novel algorithm that can recover up to O(n) permutations under a natural stochastic model; in this sense, the algorithm is optimal. In certain applications, the interest is in recovering only the most popular (or mode) ranking. As a second result, we provide an algorithm based on the Fourier Transform over the symmetric group to recover the mode under a natural majority condition; the algorithm turns out to be a maximum weight matching on an appropriately defined weighted bipartite graph. The questions considered are also thematically related to Fourier Transforms over the symmetric group and the currently popular topic of compressed sensing. 1
2 0.64624661 44 nips-2008-Characteristic Kernels on Groups and Semigroups
Author: Kenji Fukumizu, Arthur Gretton, Bernhard Schölkopf, Bharath K. Sriperumbudur
Abstract: Embeddings of random variables in reproducing kernel Hilbert spaces (RKHSs) may be used to conduct statistical inference based on higher order moments. For sufficiently rich (characteristic) RKHSs, each probability distribution has a unique embedding, allowing all statistical properties of the distribution to be taken into consideration. Necessary and sufficient conditions for an RKHS to be characteristic exist for Rn . In the present work, conditions are established for an RKHS to be characteristic on groups and semigroups. Illustrative examples are provided, including characteristic kernels on periodic domains, rotation matrices, and Rn . + 1
3 0.54967242 215 nips-2008-Sparse Signal Recovery Using Markov Random Fields
Author: Volkan Cevher, Marco F. Duarte, Chinmay Hegde, Richard Baraniuk
Abstract: Compressive Sensing (CS) combines sampling and compression into a single subNyquist linear measurement process for sparse and compressible signals. In this paper, we extend the theory of CS to include signals that are concisely represented in terms of a graphical model. In particular, we use Markov Random Fields (MRFs) to represent sparse signals whose nonzero coefficients are clustered. Our new model-based recovery algorithm, dubbed Lattice Matching Pursuit (LaMP), stably recovers MRF-modeled signals using many fewer measurements and computations than the current state-of-the-art algorithms.
4 0.52321237 190 nips-2008-Reconciling Real Scores with Binary Comparisons: A New Logistic Based Model for Ranking
Author: Nir Ailon
Abstract: The problem of ranking arises ubiquitously in almost every aspect of life, and in particular in Machine Learning/Information Retrieval. A statistical model for ranking predicts how humans rank subsets V of some universe U . In this work we define a statistical model for ranking that satisfies certain desirable properties. The model automatically gives rise to a logistic regression based approach to learning how to rank, for which the score and comparison based approaches are dual views. This offers a new generative approach to ranking which can be used for IR. There are two main contexts for this work. The first is the theory of econometrics and study of statistical models explaining human choice of alternatives. In this context, we will compare our model with other well known models. The second context is the problem of ranking in machine learning, usually arising in the context of information retrieval. Here, much work has been done in the discriminative setting, where different heuristics are used to define ranking risk functions. Our model is built rigorously and axiomatically based on very simple desirable properties defined locally for comparisons, and automatically implies the existence of a global score function serving as a natural model parameter which can be efficiently fitted to pairwise comparison judgment data by solving a convex optimization problem. 1
5 0.45015636 30 nips-2008-Bayesian Experimental Design of Magnetic Resonance Imaging Sequences
Author: Hannes Nickisch, Rolf Pohmann, Bernhard Schölkopf, Matthias Seeger
Abstract: We show how improved sequences for magnetic resonance imaging can be found through optimization of Bayesian design scores. Combining approximate Bayesian inference and natural image statistics with high-performance numerical computation, we propose the first Bayesian experimental design framework for this problem of high relevance to clinical and brain research. Our solution requires large-scale approximate inference for dense, non-Gaussian models. We propose a novel scalable variational inference algorithm, and show how powerful methods of numerical mathematics can be modified to compute primitives in our framework. Our approach is evaluated on raw data from a 3T MR scanner. 1
6 0.43625465 165 nips-2008-On the Reliability of Clustering Stability in the Large Sample Regime
7 0.40562648 238 nips-2008-Theory of matching pursuit
8 0.37261567 149 nips-2008-Near-minimax recursive density estimation on the binary hypercube
9 0.36827856 113 nips-2008-Kernelized Sorting
10 0.35906819 224 nips-2008-Structured ranking learning using cumulative distribution networks
11 0.35900316 182 nips-2008-Posterior Consistency of the Silverman g-prior in Bayesian Model Choice
12 0.35600317 75 nips-2008-Estimating vector fields using sparse basis field expansions
13 0.35493311 99 nips-2008-High-dimensional support union recovery in multivariate regression
14 0.35166565 188 nips-2008-QUIC-SVD: Fast SVD Using Cosine Trees
15 0.35010564 179 nips-2008-Phase transitions for high-dimensional joint support recovery
17 0.33586144 93 nips-2008-Global Ranking Using Continuous Conditional Random Fields
18 0.3301796 170 nips-2008-Online Optimization in X-Armed Bandits
19 0.32563812 202 nips-2008-Robust Regression and Lasso
20 0.32369861 51 nips-2008-Convergence and Rate of Convergence of a Manifold-Based Dimension Reduction Algorithm
topicId topicWeight
[(6, 0.073), (7, 0.085), (12, 0.043), (15, 0.017), (28, 0.192), (31, 0.276), (57, 0.059), (59, 0.024), (63, 0.027), (71, 0.012), (77, 0.044), (78, 0.016), (83, 0.062)]
simIndex simValue paperId paperTitle
1 0.80370623 210 nips-2008-Signal-to-Noise Ratio Analysis of Policy Gradient Algorithms
Author: John W. Roberts, Russ Tedrake
Abstract: Policy gradient (PG) reinforcement learning algorithms have strong (local) convergence guarantees, but their learning performance is typically limited by a large variance in the estimate of the gradient. In this paper, we formulate the variance reduction problem by describing a signal-to-noise ratio (SNR) for policy gradient algorithms, and evaluate this SNR carefully for the popular Weight Perturbation (WP) algorithm. We confirm that SNR is a good predictor of long-term learning performance, and that in our episodic formulation, the cost-to-go function is indeed the optimal baseline. We then propose two modifications to traditional model-free policy gradient algorithms in order to optimize the SNR. First, we examine WP using anisotropic sampling distributions, which introduces a bias into the update but increases the SNR; this bias can be interpreted as following the natural gradient of the cost function. Second, we show that non-Gaussian distributions can also increase the SNR, and argue that the optimal isotropic distribution is a ‘shell’ distribution with a constant magnitude and uniform distribution in direction. We demonstrate that both modifications produce substantial improvements in learning performance in challenging policy gradient experiments. 1
same-paper 2 0.79487062 106 nips-2008-Inferring rankings under constrained sensing
Author: Srikanth Jagabathula, Devavrat Shah
Abstract: Motivated by applications like elections, web-page ranking, revenue maximization etc., we consider the question of inferring popular rankings using constrained data. More specifically, we consider the problem of inferring a probability distribution over the group of permutations using its first order marginals. We first prove that it is not possible to recover more than O(n) permutations over n elements with the given information. We then provide a simple and novel algorithm that can recover up to O(n) permutations under a natural stochastic model; in this sense, the algorithm is optimal. In certain applications, the interest is in recovering only the most popular (or mode) ranking. As a second result, we provide an algorithm based on the Fourier Transform over the symmetric group to recover the mode under a natural majority condition; the algorithm turns out to be a maximum weight matching on an appropriately defined weighted bipartite graph. The questions considered are also thematically related to Fourier Transforms over the symmetric group and the currently popular topic of compressed sensing. 1
3 0.66134775 79 nips-2008-Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning
Author: Francis R. Bach
Abstract: For supervised and unsupervised learning, positive definite kernels allow to use large and potentially infinite dimensional feature spaces with a computational cost that only depends on the number of observations. This is usually done through the penalization of predictor functions by Euclidean or Hilbertian norms. In this paper, we explore penalizing by sparsity-inducing norms such as the ℓ1 -norm or the block ℓ1 -norm. We assume that the kernel decomposes into a large sum of individual basis kernels which can be embedded in a directed acyclic graph; we show that it is then possible to perform kernel selection through a hierarchical multiple kernel learning framework, in polynomial time in the number of selected kernels. This framework is naturally applied to non linear variable selection; our extensive simulations on synthetic datasets and datasets from the UCI repository show that efficiently exploring the large feature space through sparsity-inducing norms leads to state-of-the-art predictive performance.
Author: Garvesh Raskutti, Bin Yu, Martin J. Wainwright, Pradeep K. Ravikumar
Abstract: We consider the problem of estimating the graph structure associated with a Gaussian Markov random field (GMRF) from i.i.d. samples. We study the performance of study the performance of the ℓ1 -regularized maximum likelihood estimator in the high-dimensional setting, where the number of nodes in the graph p, the number of edges in the graph s and the maximum node degree d, are allowed to grow as a function of the number of samples n. Our main result provides sufficient conditions on (n, p, d) for the ℓ1 -regularized MLE estimator to recover all the edges of the graph with high probability. Under some conditions on the model covariance, we show that model selection can be achieved for sample sizes n = Ω(d2 log(p)), with the error decaying as O(exp(−c log(p))) for some constant c. We illustrate our theoretical results via simulations and show good correspondences between the theoretical predictions and behavior in simulations.
5 0.65787452 205 nips-2008-Semi-supervised Learning with Weakly-Related Unlabeled Data : Towards Better Text Categorization
Author: Liu Yang, Rong Jin, Rahul Sukthankar
Abstract: The cluster assumption is exploited by most semi-supervised learning (SSL) methods. However, if the unlabeled data is merely weakly related to the target classes, it becomes questionable whether driving the decision boundary to the low density regions of the unlabeled data will help the classification. In such case, the cluster assumption may not be valid; and consequently how to leverage this type of unlabeled data to enhance the classification accuracy becomes a challenge. We introduce “Semi-supervised Learning with Weakly-Related Unlabeled Data” (SSLW), an inductive method that builds upon the maximum-margin approach, towards a better usage of weakly-related unlabeled information. Although the SSLW could improve a wide range of classification tasks, in this paper, we focus on text categorization with a small training pool. The key assumption behind this work is that, even with different topics, the word usage patterns across different corpora tends to be consistent. To this end, SSLW estimates the optimal wordcorrelation matrix that is consistent with both the co-occurrence information derived from the weakly-related unlabeled documents and the labeled documents. For empirical evaluation, we present a direct comparison with a number of stateof-the-art methods for inductive semi-supervised learning and text categorization. We show that SSLW results in a significant improvement in categorization accuracy, equipped with a small training set and an unlabeled resource that is weakly related to the test domain.
6 0.65571821 194 nips-2008-Regularized Learning with Networks of Features
7 0.65550953 62 nips-2008-Differentiable Sparse Coding
8 0.65449017 4 nips-2008-A Scalable Hierarchical Distributed Language Model
9 0.65401489 231 nips-2008-Temporal Dynamics of Cognitive Control
10 0.65397543 21 nips-2008-An Homotopy Algorithm for the Lasso with Online Observations
11 0.6533457 196 nips-2008-Relative Margin Machines
12 0.65328795 63 nips-2008-Dimensionality Reduction for Data in Multiple Feature Representations
13 0.65195251 245 nips-2008-Unlabeled data: Now it helps, now it doesn't
14 0.65184104 14 nips-2008-Adaptive Forward-Backward Greedy Algorithm for Sparse Learning with Linear Models
15 0.65144283 202 nips-2008-Robust Regression and Lasso
16 0.65129477 176 nips-2008-Partially Observed Maximum Entropy Discrimination Markov Networks
17 0.65113837 179 nips-2008-Phase transitions for high-dimensional joint support recovery
18 0.65071929 143 nips-2008-Multi-label Multiple Kernel Learning
19 0.65046149 175 nips-2008-PSDBoost: Matrix-Generation Linear Programming for Positive Semidefinite Matrices Learning
20 0.65030634 113 nips-2008-Kernelized Sorting