nips nips2013 nips2013-352 knowledge-graph by maker-knowledge-mining

352 nips-2013-What do row and column marginals reveal about your dataset?

Source: pdf

Author: Behzad Golshan, John Byers, Evimaria Terzi

Abstract: Numerous datasets ranging from group memberships within social networks to purchase histories on e-commerce sites are represented by binary matrices. While this data is often either proprietary or sensitive, aggregated data, notably row and column marginals, is often viewed as much less sensitive, and may be furnished for analysis. Here, we investigate how these data can be exploited to make inferences about the underlying matrix H. Instead of assuming a generative model for H, we view the input marginals as constraints on the dataspace of possible realizations of H and compute the probability density function of particular entries H(i, j) of interest. We do this for all the cells of H simultaneously, without generating realizations, but rather via implicitly sampling the datasets that satisfy the input marginals. The end result is an efﬁcient algorithm with asymptotic running time the same as that required by standard sampling techniques to generate a single dataset from the same dataspace. Our experimental evaluation demonstrates the efﬁciency and the efﬁcacy of our framework in multiple settings. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 What do row and column marginals reveal about your dataset? [sent-1, score-0.506]

2 edu Abstract Numerous datasets ranging from group memberships within social networks to purchase histories on e-commerce sites are represented by binary matrices. [sent-9, score-0.121]

3 While this data is often either proprietary or sensitive, aggregated data, notably row and column marginals, is often viewed as much less sensitive, and may be furnished for analysis. [sent-10, score-0.312]

4 Instead of assuming a generative model for H, we view the input marginals as constraints on the dataspace of possible realizations of H and compute the probability density function of particular entries H(i, j) of interest. [sent-12, score-0.687]

5 We do this for all the cells of H simultaneously, without generating realizations, but rather via implicitly sampling the datasets that satisfy the input marginals. [sent-13, score-0.212]

6 The end result is an efﬁcient algorithm with asymptotic running time the same as that required by standard sampling techniques to generate a single dataset from the same dataspace. [sent-14, score-0.124]

7 Likewise, information about the groups that social network users participate in, the “Likes” they make, and the other users they “follow” can also be represented using large binary matrices. [sent-17, score-0.122]

8 , the binary matrix itself) is often viewed as proprietary or as sensitive information. [sent-20, score-0.137]

9 Examples include revealing the popularity of a set of products by reporting total purchases, revealing the popularity of a group by reporting the size of its membership, or specifying the in- and out-degree distributions across all users. [sent-22, score-0.104]

10 Here, we tackle the following question: “Given the row and column marginals of a hidden binary matrix H, what can one infer about H? [sent-23, score-0.7]

11 , least squares or maximum likelihood, assume a generative model for the hidden matrix and output an estimate H of H. [sent-27, score-0.173]

12 Rather, we approach the above question by viewing the row and column marginals as constraints that induce a dataspace X , deﬁned by the set of all matrices satisfying the input constraints. [sent-31, score-0.95]

13 Then, we explore this dataspace not by estimating H at large, but rather by computing the entry-wise PDF P(i, j), for every entry (i, j), where we deﬁne P(i, j) to be the probability that cell (i, j) takes on value 1 in the datasets in X . [sent-32, score-0.451]

14 A natural way to compute entry-wise PDFs is by sampling datasets from the induced dataspace X . [sent-34, score-0.396]

15 However, this dataspace can be vast, and existing techniques for sampling binary tables with ﬁxed marginals [6, 9] fail to scale. [sent-35, score-0.681]

16 In this paper, we propose a new efﬁcient algorithm for computing entry-wise PDFs by implicitly sampling the dataspace X . [sent-36, score-0.386]

17 Our technique can compute the entry-wise PDFs of all entries in running time the same as that required for state-of-the art sampling techniques to generate just a single sample from X . [sent-37, score-0.154]

18 Related work: To the best of our knowledge, we are the ﬁrst to introduce the notion of entrywise PDFs for binary matrices and to develop implicit sampling techniques for computing them efﬁciently. [sent-39, score-0.236]

19 However, our work is related to the problem of sampling from the space of binary matrices with ﬁxed marginals, studied extensively in many domains [2, 6, 7, 9, 21], primarily due to its applications in statistical signiﬁcance testing [14, 17, 20]. [sent-40, score-0.196]

20 Existing sampling techniques all rely on explicitly sampling the underlying dataspaces (either using MCMC or importance sampling) and while these methods can be used to compute entry-wise PDFs, they are inefﬁcient for large datasets. [sent-41, score-0.139]

21 Finally, considerable work has focused on counting binary matrices with ﬁxed marginals [1, 8, 10, 23]. [sent-44, score-0.372]

22 The input to our problem consists of the dimensionality of H and its row and column marginals provided as a pair of vectors r, c . [sent-47, score-0.506]

23 That is, r and c are n-dimensional and m-dimensional integer vectors respectively; entry r(i) stores the number of 1s in the ith row of H, and similarly for c(i). [sent-48, score-0.155]

24 More speciﬁcally, can we reason about entries H(i, j) without access to H but only its row and column marginals? [sent-51, score-0.344]

25 2 we introduce our dataspace exploration framework that overcomes these drawbacks. [sent-56, score-0.347]

26 Maximum-likelihood (ML): The ML approach assumes that the hidden matrix H is generated by a model that only depends on the observed marginals. [sent-62, score-0.14]

27 Then, the goal is to ﬁnd the model parameters that provide an estimate H of H while maximizing the likelihood of the observed row and column marginals. [sent-63, score-0.281]

28 For example, all tables with row and column marginals r and c are 0-error solutions; yet there may be exponentially many such matrices. [sent-69, score-0.551]

29 For the rest of this paper, we consider this latter approach with J(H) = (H(i, j) − h)2 , where h is the average value over all entries of H. [sent-71, score-0.097]

30 Moreover, these techniques are based on assumptions about the generative model of the hidden data. [sent-74, score-0.121]

31 We translate the high-level Problem 1 into the following question: Given r, c , what is the probability that the entry H(i, j) of the hidden dataset takes on value 1? [sent-79, score-0.186]

32 (1) H ∈X Here, Pr(H ) encodes the prior probability distribution over all hidden matrices in X . [sent-81, score-0.181]

33 Throughout the rest of the discussion we will adopt Matlab notation for matrices: for any matrix M , we will use M (i, :) to refer to the i-th row, and M (:, j) to refer to the j-th column of M . [sent-89, score-0.277]

34 3 Basic Techniques First, we review some basic facts and observations about r, c and the dataspace X r,c . [sent-90, score-0.308]

35 Validity of marginals: Given r, c we can decide in polynomial time whether |X r,c | > 0 either by verifying the Gale-Ryser condition [5] or by constructing a binary matrix with the input row and column marginals, as proposed by Erd¨ s, Gallai, and Hakimi [18, 11]. [sent-91, score-0.42]

36 The second option has the o comparative advantage that if |X r,c | > 0, then it also outputs a binary matrix from X r,c . [sent-92, score-0.106]

37 Given the row and column marginals of a binary matrix as r, c we can decide in polynomial time whether |X r,c | = 1 and if so, completely recover the hidden matrix H. [sent-94, score-0.785]

38 The binary matrices that can be uniquely recovered are called nested matrices and have the property that in their representation as bipartite graphs they do not have any switch boxes [16]: a pair of edges (u, v) and (u , v ) for which neither (u, v ) nor (u , v) exist. [sent-95, score-0.265]

39 3 Explicit sampling: One way of approximating P(i, j) for large dataspaces is to ﬁrst obtain a uniform sample of N binary matrices from X : X1 , . [sent-96, score-0.188]

40 To recap, explicit sampling methods are impractical for large datasets; moreover, their accuracy depends on the number of samples N and the size of the dataspace X , which itself is hard to estimate. [sent-105, score-0.357]

41 SimpleIS computes the P matrix one column at a time, in arbitrary order. [sent-108, score-0.243]

42 The Input: r, c Propose step associates with every row i, weight Output: Estimate of the PDF matrix P w(i) that is proportional to the row marginal of row 1: w = Propose(r) i. [sent-111, score-0.322]

43 The Adjust step takes as input 3: 4: px = Adjust(w, x) the column sum x = c(j) of the jth column and the 5: P(:, j) = px raw probabilities w and adjusts these probabilities into the ﬁnal probabilities px such that for column j we have that i px (i) = x. [sent-117, score-1.164]

44 This adjustment is not a simple normalization, but it computes the ﬁnal values of px (i) by implicitly considering all possible realizations of the jth column with column sum x and computing the probability that the ith cell of that column is equal to 1. [sent-118, score-0.844]

45 Formally, if we use x to denote the binary vector that represents one realization of the j-th column of the hidden matrix, then px (i) is computed as: n px (i) := Pr x(i) = 1 | x(i) = x . [sent-119, score-0.582]

46 i∈N Using this deﬁnition, px (i) is then derived as follows: px (i) = w(i)R(x − 1, N \ {i}) . [sent-126, score-0.214]

47 4 0 4 To answer this question, consider a hidden 5 × 5 binary table 0 4 with r = (4, 4, 2, 2, 2) and c = (2, 3, 1, 4, 4) and assume that 0 2 SimpleIS starts by computing the entry-wise PDFs of the ﬁrst 1 2 column. [sent-134, score-0.142]

48 While evaluating Equation (2), SimpleIS generates all 1 2 possible columns of matrices with row marginals r and a column 2 3 1 4 4 with column sum 2 – ignoring the values of the rest of the column marginals. [sent-135, score-1.051]

49 Thus, the realization of the ﬁrst column shown in the matrix on the right is taken into consideration by SimpleIS, despite the fact that it cannot lead to a matrix that respects r and c. [sent-136, score-0.33]

50 This is because four more 1s need to be placed in the empty cells of the ﬁrst two rows which in turn would lead to a violation of the column marginal of the third column. [sent-137, score-0.346]

51 This situation occurs exactly because SimpleIS never considers the constraints imposed by the rest of the column marginals when aggregating the possible realizations of column j. [sent-138, score-0.722]

52 Ultimately, the SimpleIS algorithm results in estimates px (i) that reﬂect an entry i in a column being equal to 1 conditioned over all matrices with row marginals r and a single column with column sum x [6]. [sent-139, score-1.153]

53 2 The IS algorithm In the IS algorithm, we remedy this weakness of SimpleIS by taking into account the constraints imposed by all the column marginals when aggregating the realization of a particular column j. [sent-142, score-0.665]

54 Referring again to the previous example, the input vectors r and c impose the following constraints on any matrix in X r,c : column 1 must have least one 1 in the ﬁrst two rows and (exactly) two 1s in the ﬁrst ﬁve rows. [sent-143, score-0.27]

55 A knot is a subvector of a column characterized by three integer values [s, e] | b , where s and e are the starting and ending indices deﬁning the subvector, and b is a lower bound on the number of 1s that must be placed in the ﬁrst e rows of the column. [sent-146, score-0.513]

56 Interestingly, given r, c , the knots of any column j of the hidden matrix can be identiﬁed in linear time using an algorithm that recursively applies the Gale-Ryser condition on realizability of bipartite graphs. [sent-147, score-0.589]

57 At a high level, IS (Algorithm 2) identiﬁes the knots within each column and uses them to achieve a better estimation of P. [sent-150, score-0.408]

58 Here, the process of obtaining the ﬁnal probabilities is more complicated since it requires: (a) identifying the knots of every column j (line 3), (b) computing the entry-wise PDFs for the entries in every knot (denoted by qk ) (lines 4-7), Algorithm 2 The IS algorithm. [sent-151, score-0.826]

59 and (c) creating the jth column of P by putting Input: r, c the computed entry-wise PDFs back together Output: Estimate of the PDF matrix P (line 8). [sent-152, score-0.281]

60 Note that we use wk to refer to the 1: w = Propose(r) vector of raw probabilities associated with cells 2: for j = 1 . [sent-153, score-0.186]

61 Also, vector pk,x is used to store the 3: FindKnots(j, r, c) adjusted probabilities of cells in knot k given 4: for each knot k ∈ {1 . [sent-157, score-0.593]

62 5: for x: number of 1s in knot k do Step (a) is described by Chen et al. [sent-161, score-0.234]

63 ; ql ] siders all the knots of the jth column sequentially. [sent-166, score-0.446]

64 Assume that the kth knot of this column is given by the tuple [sk , ek ] | bk . [sent-167, score-0.451]

65 If we know the value of x, then we can simply use the Adjust routine to adjust the raw probabilities wk . [sent-169, score-0.153]

66 But since the value of x may vary over different realizations of column j, we need to compute the probability distribution of different values of x. [sent-170, score-0.249]

67 For this, we ﬁrst observe that if we know that y 1s have been placed prior to knot k, then we can compute lower and upper bounds on x as: Lk|y = max {0, bk − y} , Uk|y = min {ek − sk + 1, c(j) − y} . [sent-171, score-0.267]

68 Clearly, the number of 1s in the knot must be an integer value in the interval [Lk|y , Uk|y ]. [sent-172, score-0.234]

69 We observe that for every knot k and for every value of y, Qk (y) can be computed by dynamic programming as: y Qk−1 (z)Pk−1 (y − z|z). [sent-180, score-0.234]

70 Qk (y) = (6) z=0 Running time and speedups: If there is a single knot in every column, SimpleIS and IS are identical. [sent-181, score-0.234]

71 For a column j with knots, IS requires O( 2 c(j) + nc(j)2 ) – or worst-case O(n3 ) time. [sent-182, score-0.191]

72 Thus, sequential implementation of IS has running time O(n3 m). [sent-183, score-0.121]

73 Moreover, IS treats each column independently, and thus it is parallelizable. [sent-185, score-0.191]

74 Finally, since the entry-wise PDFs for two columns with the same column marginals are identical, our method needs to only compute the PDFs of columns with distinct column marginals. [sent-186, score-0.679]

75 Further speedups can be achieved for large datasets by binning columns with similar marginals into a bin with a representative column sum. [sent-187, score-0.526]

76 Discussion: We point out here that IS is highly motivated by Sequential – the most practical algorithm to date for sampling (almost) uniformly matrices from dataspace X r,c . [sent-189, score-0.45]

77 Since exhaustive enumeration is not an option, it is very hard to obtain ground-truth values of P for arbitrary matrices, so we focus on two speciﬁc types of matrices: blocked matrices and matrices with knots. [sent-229, score-0.297]

78 An n × n blocked matrix has marginals r = (1, n − 1, . [sent-230, score-0.313]

79 Any table with these marginals either has a value of 1 in entry (1, 1) or it has two distinct entries with value 1 in the ﬁrst row and the ﬁrst column (excluding cell (1, 1)). [sent-237, score-0.673]

80 Also note that given a realization of the ﬁrst row and the column, the rest of the table is fully determined. [sent-238, score-0.159]

81 This implies that there are exactly (2n − 1) tables with such marginals and the entry-wise PDFs are: P(1, 1) = 1/(2n − 1); for i = 1, P(1, i) = P(i, 1) = (n − 1)/(2n − 1); and for i = 1 and j = 1, P(i, j) = 2n/(2n − 1). [sent-239, score-0.27]

82 The matrices with knots are binary matrices that are generated by diagonally concatenating smaller matrices for which the ground-truth is computed known through exhaustive enumeration. [sent-240, score-0.572]

83 However, as our next experiment indicates, the success of Rasch hinges on the fact that the marginals of this experiment do not introduce many knots. [sent-246, score-0.275]

84 The results on matrices with many knots are shown in Figure 1(b). [sent-247, score-0.31]

85 This is mainly due to the fact that the matrices we create for this experiment have a lot of knots, and as SimpleIS, LS and Rasch are all knotoblivious, they produce estimates with large errors. [sent-251, score-0.118]

86 On the other hand, Sequential, MCMC and IS take knots into account, and therefore they perform much better than the rest. [sent-252, score-0.217]

87 Looking at the running times (Figure 1(c)), we observe that the running time of our methods is clearly better than the running time of all the other algorithms for larger values of n. [sent-253, score-0.126]

88 For this experiment we use the following real-world datasets as hidden matrices. [sent-256, score-0.152]

89 7 D BLP: The rows of this hidden matrix correspond to authors and the columns correspond to conferences in DBLP. [sent-257, score-0.203]

90 o N OLA: This hidden matrix records the membership of 15, 965 Facebook users from New Orleans across 92 different groups [22]. [sent-262, score-0.174]

91 We start with an experiment that addresses the following question: “Can entry-wise PDFs help us identify the values of the cells of the hidden matrix? [sent-265, score-0.208]

92 We then address the question: “Can entry-wise PDFs guide us towards effectively querying the hidden matrix H so that its entries are more accurately identiﬁed? [sent-269, score-0.226]

93 At each iteration, we query 10% of unknown cells and we compute the entry-wise PDFs P after having these entries ﬁxed. [sent-271, score-0.158]

94 Interestingly, using I NFORMATIVE F IX we can fully recover the hidden matrix of the N OLA dataset by just querying 30% of entries. [sent-278, score-0.196]

95 Thus, the values of entry-wise PDFs can be used to guide adaptive exploration of the hidden datasets. [sent-279, score-0.127]

96 For this dataset we observe that for 80% reduction to the number of columns of the dataset introduces an average error of size of only 1. [sent-290, score-0.128]

97 6 Conclusions We started with a simple question: “Given the row and column marginals of a hidden binary matrix H, what can we infer about the matrix itself? [sent-292, score-0.752]

98 ” We demonstrated that existing optimization-based approaches for addressing this question fail to provide a detailed intuition about the possible values of particular cells of H. [sent-293, score-0.138]

99 From the technical point of view, we developed IS, a parallelizable algorithm that efﬁciently and accurately approximates the values of the entrywise PDFs for all cells simultaneously. [sent-295, score-0.135]

100 The key characteristic of IS is that it computes the entrywise PDFs without generating any of the matrices in the dataspace deﬁned by the input row and column marginals, and did so by implicitly sampling from the dataspace. [sent-296, score-0.8]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('pdfs', 0.451), ('simpleis', 0.431), ('dataspace', 0.308), ('knot', 0.234), ('marginals', 0.225), ('knots', 0.217), ('column', 0.191), ('rasch', 0.165), ('blp', 0.144), ('ls', 0.113), ('ola', 0.109), ('px', 0.107), ('cells', 0.095), ('matrices', 0.093), ('qk', 0.091), ('row', 0.09), ('hidden', 0.088), ('pdf', 0.085), ('mcmc', 0.083), ('sequential', 0.079), ('ix', 0.066), ('entry', 0.065), ('entries', 0.063), ('adjust', 0.062), ('realizations', 0.058), ('binary', 0.054), ('enumeration', 0.053), ('matrix', 0.052), ('sampling', 0.049), ('pk', 0.045), ('tables', 0.045), ('question', 0.043), ('gionis', 0.043), ('running', 0.042), ('analyst', 0.041), ('bekessy', 0.041), ('byers', 0.041), ('contigency', 0.041), ('dataspaces', 0.041), ('evimaria', 0.041), ('informativefix', 0.041), ('mannila', 0.041), ('nen', 0.041), ('nformative', 0.041), ('randomfix', 0.041), ('realizability', 0.041), ('entrywise', 0.04), ('boston', 0.04), ('exploration', 0.039), ('cell', 0.039), ('datasets', 0.039), ('pr', 0.039), ('jth', 0.038), ('andom', 0.036), ('kashtan', 0.036), ('tkdd', 0.036), ('behzad', 0.036), ('columns', 0.036), ('blocked', 0.036), ('realization', 0.035), ('speedups', 0.035), ('raw', 0.035), ('users', 0.034), ('rest', 0.034), ('milo', 0.033), ('polynomial', 0.033), ('generative', 0.033), ('placed', 0.033), ('dataset', 0.033), ('proprietary', 0.031), ('lk', 0.031), ('probabilities', 0.03), ('ml', 0.029), ('implicitly', 0.029), ('reporting', 0.029), ('pt', 0.028), ('subvector', 0.028), ('purchase', 0.028), ('aggregates', 0.028), ('instantiations', 0.028), ('rows', 0.027), ('insight', 0.027), ('erd', 0.027), ('revealed', 0.026), ('error', 0.026), ('ek', 0.026), ('wk', 0.026), ('experiment', 0.025), ('boxes', 0.025), ('hyv', 0.024), ('percentage', 0.023), ('popularity', 0.023), ('aggregating', 0.023), ('dence', 0.023), ('uk', 0.023), ('querying', 0.023), ('exhaustive', 0.022), ('panels', 0.022), ('cacy', 0.022), ('ultimately', 0.022)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999988 352 nips-2013-What do row and column marginals reveal about your dataset?

Author: Behzad Golshan, John Byers, Evimaria Terzi

2 0.092685491 184 nips-2013-Marginals-to-Models Reducibility

Author: Tim Roughgarden, Michael Kearns

Abstract: We consider a number of classical and new computational problems regarding marginal distributions, and inference in models specifying a full joint distribution. We prove general and efﬁcient reductions between a number of these problems, which demonstrate that algorithmic progress in inference automatically yields progress for “pure data” problems. Our main technique involves formulating the problems as linear programs, and proving that the dual separation oracle required by the ellipsoid method is provided by the target problem. This technique may be of independent interest in probabilistic inference. 1

3 0.082860842 179 nips-2013-Low-Rank Matrix and Tensor Completion via Adaptive Sampling

Author: Akshay Krishnamurthy, Aarti Singh

Abstract: We study low rank matrix and tensor completion and propose novel algorithms that employ adaptive sampling schemes to obtain strong performance guarantees. Our algorithms exploit adaptivity to identify entries that are highly informative for learning the column space of the matrix (tensor) and consequently, our results hold even when the row space is highly coherent, in contrast with previous analyses. In the absence of noise, we show that one can exactly recover a n ⇥ n matrix of rank r from merely ⌦(nr3/2 log(r)) matrix entries. We also show that one can recover an order T tensor using ⌦(nrT 1/2 T 2 log(r)) entries. For noisy recovery, our algorithm consistently estimates a low rank matrix corrupted with noise using ⌦(nr3/2 polylog(n)) entries. We complement our study with simulations that verify our theory and demonstrate the scalability of our algorithms. 1

4 0.076824643 206 nips-2013-Near-Optimal Entrywise Sampling for Data Matrices

Author: Dimitris Achlioptas, Zohar S. Karnin, Edo Liberty

Abstract: We consider the problem of selecting non-zero entries of a matrix A in order to produce a sparse sketch of it, B, that minimizes A B 2 . For large m n matrices, such that n m (for example, representing n observations over m attributes) we give sampling distributions that exhibit four important properties. First, they have closed forms computable from minimal information regarding A. Second, they allow sketching of matrices whose non-zeros are presented to the algorithm in arbitrary order as a stream, with O 1 computation per non-zero. Third, the resulting sketch matrices are not only sparse, but their non-zero entries are highly compressible. Lastly, and most importantly, under mild assumptions, our distributions are provably competitive with the optimal ofﬂine distribution. Note that the probabilities in the optimal ofﬂine distribution may be complex functions of all the entries in the matrix. Therefore, regardless of computational complexity, the optimal distribution might be impossible to compute in the streaming model. 1

5 0.071071416 107 nips-2013-Embed and Project: Discrete Sampling with Universal Hashing

Author: Stefano Ermon, Carla P. Gomes, Ashish Sabharwal, Bart Selman

Abstract: We consider the problem of sampling from a probability distribution deﬁned over a high-dimensional discrete set, speciﬁed for instance by a graphical model. We propose a sampling algorithm, called PAWS, based on embedding the set into a higher-dimensional space which is then randomly projected using universal hash functions to a lower-dimensional subspace and explored using combinatorial search methods. Our scheme can leverage fast combinatorial optimization tools as a blackbox and, unlike MCMC methods, samples produced are guaranteed to be within an (arbitrarily small) constant factor of the true probability distribution. We demonstrate that by using state-of-the-art combinatorial search tools, PAWS can efﬁciently sample from Ising grids with strong interactions and from software veriﬁcation instances, while MCMC and variational methods fail in both cases. 1

6 0.070102811 123 nips-2013-Flexible sampling of discrete data correlations without the marginal distributions

7 0.063875571 146 nips-2013-Large Scale Distributed Sparse Precision Estimation

8 0.060394526 155 nips-2013-Learning Hidden Markov Models from Non-sequence Data via Tensor Decomposition

9 0.056170877 251 nips-2013-Predicting Parameters in Deep Learning

10 0.05414119 114 nips-2013-Extracting regions of interest from biological images with convolutional sparse block coding

11 0.053706679 185 nips-2013-Matrix Completion From any Given Set of Observations

12 0.05038859 258 nips-2013-Projecting Ising Model Parameters for Fast Mixing

13 0.050086413 186 nips-2013-Matrix factorization with binary components

14 0.049446911 4 nips-2013-A Comparative Framework for Preconditioned Lasso Algorithms

15 0.048562951 108 nips-2013-Error-Minimizing Estimates and Universal Entry-Wise Error Bounds for Low-Rank Matrix Completion

16 0.048335742 68 nips-2013-Confidence Intervals and Hypothesis Testing for High-Dimensional Statistical Models

17 0.047960278 213 nips-2013-Nonparametric Multi-group Membership Model for Dynamic Networks

18 0.046828881 282 nips-2013-Robust Multimodal Graph Matching: Sparse Coding Meets Graph Matching

19 0.042798426 160 nips-2013-Learning Stochastic Feedforward Neural Networks

20 0.042152155 306 nips-2013-Speeding up Permutation Testing in Neuroimaging

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.138), (1, 0.046), (2, 0.003), (3, 0.042), (4, 0.013), (5, 0.011), (6, 0.041), (7, -0.025), (8, 0.0), (9, 0.001), (10, 0.043), (11, -0.022), (12, 0.003), (13, 0.01), (14, -0.038), (15, 0.04), (16, -0.053), (17, -0.04), (18, -0.029), (19, 0.011), (20, -0.044), (21, -0.052), (22, -0.044), (23, -0.025), (24, 0.062), (25, -0.015), (26, -0.037), (27, -0.054), (28, -0.04), (29, -0.003), (30, 0.07), (31, -0.02), (32, 0.005), (33, 0.05), (34, -0.035), (35, -0.001), (36, -0.003), (37, 0.053), (38, -0.011), (39, -0.027), (40, -0.077), (41, -0.021), (42, 0.037), (43, -0.025), (44, 0.031), (45, -0.026), (46, -0.03), (47, -0.024), (48, 0.01), (49, -0.004)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.90857291 352 nips-2013-What do row and column marginals reveal about your dataset?

Author: Behzad Golshan, John Byers, Evimaria Terzi

2 0.6290586 206 nips-2013-Near-Optimal Entrywise Sampling for Data Matrices

Author: Dimitris Achlioptas, Zohar S. Karnin, Edo Liberty

3 0.60630262 186 nips-2013-Matrix factorization with binary components

Author: Martin Slawski, Matthias Hein, Pavlo Lutsik

Abstract: Motivated by an application in computational biology, we consider low-rank matrix factorization with {0, 1}-constraints on one of the factors and optionally convex constraints on the second one. In addition to the non-convexity shared with other matrix factorization schemes, our problem is further complicated by a combinatorial constraint set of size 2m·r , where m is the dimension of the data points and r the rank of the factorization. Despite apparent intractability, we provide − in the line of recent work on non-negative matrix factorization by Arora et al. (2012)− an algorithm that provably recovers the underlying factorization in the exact case with O(mr2r + mnr + r2 n) operations for n datapoints. To obtain this result, we use theory around the Littlewood-Offord lemma from combinatorics.

4 0.59651619 332 nips-2013-Tracking Time-varying Graphical Structure

Author: Erich Kummerfeld, David Danks

Abstract: Structure learning algorithms for graphical models have focused almost exclusively on stable environments in which the underlying generative process does not change; that is, they assume that the generating model is globally stationary. In real-world environments, however, such changes often occur without warning or signal. Real-world data often come from generating models that are only locally stationary. In this paper, we present LoSST, a novel, heuristic structure learning algorithm that tracks changes in graphical model structure or parameters in a dynamic, real-time manner. We show by simulation that the algorithm performs comparably to batch-mode learning when the generating graphical structure is globally stationary, and signiﬁcantly better when it is only locally stationary. 1

5 0.59267342 107 nips-2013-Embed and Project: Discrete Sampling with Universal Hashing

Author: Stefano Ermon, Carla P. Gomes, Ashish Sabharwal, Bart Selman

6 0.58782643 134 nips-2013-Graphical Models for Inference with Missing Data

7 0.57954371 306 nips-2013-Speeding up Permutation Testing in Neuroimaging

8 0.57852381 342 nips-2013-Unsupervised Spectral Learning of Finite State Transducers

9 0.57388967 108 nips-2013-Error-Minimizing Estimates and Universal Entry-Wise Error Bounds for Low-Rank Matrix Completion

10 0.5738194 245 nips-2013-Pass-efficient unsupervised feature selection

11 0.56621218 123 nips-2013-Flexible sampling of discrete data correlations without the marginal distributions

12 0.55528951 146 nips-2013-Large Scale Distributed Sparse Precision Estimation

13 0.55240148 179 nips-2013-Low-Rank Matrix and Tensor Completion via Adaptive Sampling

14 0.54284436 153 nips-2013-Learning Feature Selection Dependencies in Multi-task Learning

15 0.54219896 199 nips-2013-More data speeds up training time in learning halfspaces over sparse vectors

16 0.53224784 354 nips-2013-When in Doubt, SWAP: High-Dimensional Sparse Recovery from Correlated Measurements

17 0.52954149 296 nips-2013-Sinkhorn Distances: Lightspeed Computation of Optimal Transport

18 0.52405906 307 nips-2013-Speedup Matrix Completion with Side Information: Application to Multi-Label Learning

19 0.52004313 36 nips-2013-Annealing between distributions by averaging moments

20 0.51894403 185 nips-2013-Matrix Completion From any Given Set of Observations

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(2, 0.016), (16, 0.036), (19, 0.019), (33, 0.131), (34, 0.096), (41, 0.038), (49, 0.023), (56, 0.098), (66, 0.278), (70, 0.033), (85, 0.061), (89, 0.023), (93, 0.047), (95, 0.03)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.74199718 352 nips-2013-What do row and column marginals reveal about your dataset?

Author: Behzad Golshan, John Byers, Evimaria Terzi

2 0.71726221 107 nips-2013-Embed and Project: Discrete Sampling with Universal Hashing

Author: Stefano Ermon, Carla P. Gomes, Ashish Sabharwal, Bart Selman

3 0.69316584 353 nips-2013-When are Overcomplete Topic Models Identifiable? Uniqueness of Tensor Tucker Decompositions with Structured Sparsity

Author: Anima Anandkumar, Daniel Hsu, Majid Janzamin, Sham M. Kakade

Abstract: Overcomplete latent representations have been very popular for unsupervised feature learning in recent years. In this paper, we specify which overcomplete models can be identiﬁed given observable moments of a certain order. We consider probabilistic admixture or topic models in the overcomplete regime, where the number of latent topics can greatly exceed the size of the observed word vocabulary. While general overcomplete topic models are not identiﬁable, we establish generic identiﬁability under a constraint, referred to as topic persistence. Our sufﬁcient conditions for identiﬁability involve a novel set of “higher order” expansion conditions on the topic-word matrix or the population structure of the model. This set of higher-order expansion conditions allow for overcomplete models, and require the existence of a perfect matching from latent topics to higher order observed words. We establish that random structured topic models are identiﬁable w.h.p. in the overcomplete regime. Our identiﬁability results allow for general (non-degenerate) distributions for modeling the topic proportions, and thus, we can handle arbitrarily correlated topics in our framework. Our identiﬁability results imply uniqueness of a class of tensor decompositions with structured sparsity which is contained in the class of Tucker decompositions, but is more general than the Candecomp/Parafac (CP) decomposition. Keywords: Overcomplete representation, admixture models, generic identiﬁability, tensor decomposition.

4 0.69192672 80 nips-2013-Data-driven Distributionally Robust Polynomial Optimization

Author: Martin Mevissen, Emanuele Ragnoli, Jia Yuan Yu

Abstract: We consider robust optimization for polynomial optimization problems where the uncertainty set is a set of candidate probability density functions. This set is a ball around a density function estimated from data samples, i.e., it is data-driven and random. Polynomial optimization problems are inherently hard due to nonconvex objectives and constraints. However, we show that by employing polynomial and histogram density estimates, we can introduce robustness with respect to distributional uncertainty sets without making the problem harder. We show that the optimum to the distributionally robust problem is the limit of a sequence of tractable semideﬁnite programming relaxations. We also give ﬁnite-sample consistency guarantees for the data-driven uncertainty sets. Finally, we apply our model and solution method in a water network optimization problem. 1

5 0.59821248 285 nips-2013-Robust Transfer Principal Component Analysis with Rank Constraints

Author: Yuhong Guo

Abstract: Principal component analysis (PCA), a well-established technique for data analysis and processing, provides a convenient form of dimensionality reduction that is effective for cleaning small Gaussian noises presented in the data. However, the applicability of standard principal component analysis in real scenarios is limited by its sensitivity to large errors. In this paper, we tackle the challenge problem of recovering data corrupted with errors of high magnitude by developing a novel robust transfer principal component analysis method. Our method is based on the assumption that useful information for the recovery of a corrupted data matrix can be gained from an uncorrupted related data matrix. Speciﬁcally, we formulate the data recovery problem as a joint robust principal component analysis problem on the two data matrices, with common principal components shared across matrices and individual principal components speciﬁc to each data matrix. The formulated optimization problem is a minimization problem over a convex objective function but with non-convex rank constraints. We develop an efﬁcient proximal projected gradient descent algorithm to solve the proposed optimization problem with convergence guarantees. Our empirical results over image denoising tasks show the proposed method can effectively recover images with random large errors, and signiﬁcantly outperform both standard PCA and robust PCA with rank constraints. 1

6 0.59762388 201 nips-2013-Multi-Task Bayesian Optimization

7 0.59582579 45 nips-2013-BIG & QUIC: Sparse Inverse Covariance Estimation for a Million Variables

8 0.59458202 104 nips-2013-Efficient Online Inference for Bayesian Nonparametric Relational Models

9 0.59439743 333 nips-2013-Trading Computation for Communication: Distributed Stochastic Dual Coordinate Ascent

10 0.59417069 282 nips-2013-Robust Multimodal Graph Matching: Sparse Coding Meets Graph Matching

11 0.59222782 173 nips-2013-Least Informative Dimensions

12 0.59201604 350 nips-2013-Wavelets on Graphs via Deep Learning

13 0.58998567 99 nips-2013-Dropout Training as Adaptive Regularization

14 0.58985484 318 nips-2013-Structured Learning via Logistic Regression

15 0.58967334 182 nips-2013-Manifold-based Similarity Adaptation for Label Propagation

16 0.58862859 35 nips-2013-Analyzing the Harmonic Structure in Graph-Based Learning

17 0.58807969 97 nips-2013-Distributed Submodular Maximization: Identifying Representative Elements in Massive Data

18 0.58778191 207 nips-2013-Near-optimal Anomaly Detection in Graphs using Lovasz Extended Scan Statistic

19 0.5877164 5 nips-2013-A Deep Architecture for Matching Short Texts

20 0.58721107 52 nips-2013-Bayesian inference as iterated random functions with applications to sequential inference in graphical models