nips nips2013 nips2013-185 knowledge-graph by maker-knowledge-mining

185 nips-2013-Matrix Completion From any Given Set of Observations

Source: pdf

Author: Troy Lee, Adi Shraibman

Abstract: In the matrix completion problem the aim is to recover an unknown real matrix from a subset of its entries. This problem comes up in many application areas, and has received a great deal of attention in the context of the netﬂix prize. A central approach to this problem is to output a matrix of lowest possible complexity (e.g. rank or trace norm) that agrees with the partially speciﬁed matrix. The performance of this approach under the assumption that the revealed entries are sampled randomly has received considerable attention (e.g. [1, 2, 3, 4, 5, 6, 7, 8]). In practice, often the set of revealed entries is not chosen at random and these results do not apply. We are therefore left with no guarantees on the performance of the algorithm we are using. We present a means to obtain performance guarantees with respect to any set of initial observations. The ﬁrst step remains the same: ﬁnd a matrix of lowest possible complexity that agrees with the partially speciﬁed matrix. We give a new way to interpret the output of this algorithm by next ﬁnding a probability distribution over the non-revealed entries with respect to which a bound on the generalization error can be proven. The more complex the set of revealed entries according to a certain measure, the better the bound on the generalization error. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 com Abstract In the matrix completion problem the aim is to recover an unknown real matrix from a subset of its entries. [sent-4, score-0.727]

2 A central approach to this problem is to output a matrix of lowest possible complexity (e. [sent-6, score-0.439]

3 rank or trace norm) that agrees with the partially speciﬁed matrix. [sent-8, score-0.46]

4 The performance of this approach under the assumption that the revealed entries are sampled randomly has received considerable attention (e. [sent-9, score-0.29]

5 In practice, often the set of revealed entries is not chosen at random and these results do not apply. [sent-12, score-0.26]

6 We present a means to obtain performance guarantees with respect to any set of initial observations. [sent-14, score-0.153]

7 The ﬁrst step remains the same: ﬁnd a matrix of lowest possible complexity that agrees with the partially speciﬁed matrix. [sent-15, score-0.484]

8 We give a new way to interpret the output of this algorithm by next ﬁnding a probability distribution over the non-revealed entries with respect to which a bound on the generalization error can be proven. [sent-16, score-0.463]

9 The more complex the set of revealed entries according to a certain measure, the better the bound on the generalization error. [sent-17, score-0.388]

10 1 Introduction In the matrix completion problem we observe a subset of the entries of a target matrix Y , and our aim is to retrieve the rest of the matrix. [sent-18, score-1.017]

11 Obviously some restriction on the target matrix Y is unavoidable as otherwise it is impossible to retrieve even one missing entry; usually, it is assumed that Y is generated in a way so as to have low complexity according to a measure such as matrix rank. [sent-19, score-0.782]

12 A common scheme for the matrix completion problem is to select a matrix X that minimizes some combination of the complexity of X and the distance between X and Y on the observed part. [sent-20, score-0.794]

13 In particular, one can demand that X agrees with Y on the observed initial sample (i. [sent-21, score-0.31]

14 the distance between X and Y on the observed part is zero). [sent-23, score-0.096]

15 It outputs a matrix with minimal complexity that agrees with Y on the initial sample S. [sent-25, score-0.618]

16 The complexity measure can be rank, or a norm to serve as an efﬁciently computable proxy for the rank such as the trace norm or γ2 norm. [sent-26, score-1.142]

17 When we wish to mention which complexity measure is used we write it explicitly, e. [sent-27, score-0.187]

18 Our framework is suitable using any norm satisfying few simple conditions described in the sequel. [sent-30, score-0.383]

19 The performance of Alg1 under the assumption that the initial subset is picked at random is well understood [1, 2, 3, 4, 5, 6, 7, 8]. [sent-31, score-0.161]

20 One line of research [5, 6, 4] studies conditions under which Alg1(Tr) retrieves the matrix exactly 1 . [sent-33, score-0.323]

21 They 1 There are other papers studying exact matrix completion, e. [sent-34, score-0.24]

22 1 deﬁne what they call an incoherence property which quantiﬁes how spread the singular vectors of Y are. [sent-37, score-0.205]

23 The exact deﬁnition of the incoherence property varies in different results. [sent-38, score-0.155]

24 It is then proved that if there are enough samples relative to the rank of Y and its incoherence property, then Alg1(Tr) retrieves the matrix Y exactly with high probability, assuming the samples are chosen uniformly at random. [sent-39, score-0.555]

25 Note that in this line of research the trace norm is used as the complexity measure in the algorithm. [sent-40, score-0.696]

26 Candes and Recht [5] observed that it is impossible to reconstruct a matrix that has only one entry equal to 1 and zeros everywhere else, unless most of its entries are observed. [sent-42, score-0.518]

27 Thus, exact matrix completion must assume some special property of the target matrix Y . [sent-43, score-0.756]

28 In a second line of research, general results are proved regarding the performance of Alg1. [sent-44, score-0.129]

29 These results are weaker in that they do not prove exact recovery, but rather bounds on the distance between the output matrix X and Y . [sent-45, score-0.41]

30 But these results apply for every matrix Y , they can be generalized for non-uniform probability distributions, and also apply when the complexity measure is the γ2 norm. [sent-46, score-0.394]

31 These results take the following form: Theorem 1 ([2]) Let Y be an n × n real matrix, and P a probability distribution on pairs (i, j) ∈ [n]2 . [sent-47, score-0.116]

32 Choose a sample S of |S| > n log n entries according to P . [sent-48, score-0.228]

33 |S| i,j Where X is the output of the algorithm with sample S, and c is a universal constant. [sent-50, score-0.173]

34 What can we say about the output of the algorithm in this case? [sent-53, score-0.107]

35 In order to answer this question we need to understand what properties of a sample enable generalization. [sent-55, score-0.077]

36 A ﬁrst step in this direction was taken in [9] where the initial subset was chosen deterministically as the set of edges of a good expander (more generally, a good sparsiﬁer). [sent-56, score-0.202]

37 Deterministic guarantees were proved for the algorithm in this case, that resemble the guarantees proved for random sampling. [sent-57, score-0.238]

38 For example: Theorem 2 [9] Let S be the set of edges of a d-regular graph with second eigenvalue For every n × n real matrix Y , if X is the output of Alg1 with initial subset S, then λ 1 (Xij − Yij )2 ≤ cγ2 (Y )2 , n2 i,j d 2 bound λ. [sent-58, score-0.587]

39 This theorem was also generalized to bound the error with respect to any probability distribution. [sent-63, score-0.161]

40 Instead of expanders, sparsiﬁers were used to select the entries to observe for this result. [sent-64, score-0.197]

41 There is an efﬁciently constructible set S ⊂ [n]2 of size at most dn, such that for every n × n real target matrix Y , if X is the output of our algorithm with initial subset S, then 1 Pij (Xij − Yij )2 ≤ cγ2 (Y )2 √ . [sent-66, score-0.607]

42 d i,j The results in [9] still do not answer the practical question of how to reconstruct a matrix from an arbitrary sample. [sent-67, score-0.299]

43 We extend the results of [9] in several ways: 2 The eigenvalues are eigenvalues of the adjacency matrix of the graph. [sent-69, score-0.279]

44 We upper bound the generalization error of Alg1 given any set of initial observations. [sent-71, score-0.222]

45 This bound depends on properties of the set of observed entries. [sent-72, score-0.094]

46 We show there is a probability distribution outside of the observed entries such that the generalization error under this distribution is bounded in terms of the complexity of the observed entries, under a certain complexity measure. [sent-74, score-0.591]

47 The results hold not only for γ2 but also for the trace norm, and in fact any norm satisfying a few basic properties. [sent-76, score-0.577]

48 2 Preliminaries Here we introduce some of the matrix notation and norms that we will be using. [sent-77, score-0.249]

49 For a m-by-n matrix A with m ≥ n let σ1 (A) ≥ · · · ≥ σn (A) denote the singular values of A. [sent-79, score-0.29]

50 The trace norm, denoted A tr , is the 1 norm of the vector of singular values, and the Frobenius norm, denoted A F , is the 2 norm of the vector of singular values. [sent-80, score-1.138]

51 As the rank of a matrix is equal to the number of non-zero singular values, it follows from the Cauchy-Schwarz inequality that A 2 tr ≤ rk(A) . [sent-81, score-0.556]

52 (1) A 2 F This inequality motivates the use of the trace norm as a proxy for rank in rank minimization problems. [sent-82, score-0.832]

53 A problem with the bound of (1) as a complexity measure is that it is not monotone—the bound can be larger on a submatrix of A than on A itself. [sent-83, score-0.354]

54 As taking the Hadamard product of a matrix with a rank one matrix does not increase its rank, a way to ﬁx this problem is to consider instead: A ◦ vuT 2 tr ≤ rk(A) . [sent-84, score-0.684]

55 max u,v A ◦ vuT 2 F u = v =1 When A is a sign matrix, this bound simpliﬁes nicely—for then, A ◦ vuT we are left with max A ◦ vuT 2 ≤ rk(A) . [sent-85, score-0.147]

56 tr u,v F = u v = 1, and u = v =1 This motivates the deﬁnition of the γ2 norm. [sent-86, score-0.197]

57 tr u = v =1 We will also make use of the dual norms of the trace and γ2 norms. [sent-89, score-0.489]

58 Recall that in general for a norm Φ(A) the dual norm Φ∗ is deﬁned as Φ∗ (A) = max B Notice that this means that A, B Φ(B) A, B ≤ Φ∗ (A)Φ(B) . [sent-90, score-0.733]

59 The dual of the trace norm is · the operator norm from The dual of the γ2 norm looks as follows. [sent-91, score-1.345]

60 This is the minimum of the γ2 norm over all matrices which approximate the target matrix in some sense. [sent-95, score-0.672]

61 Then 0,∞ ¯ γ2 (S) = min{γ2 (T ) : T ◦ S ≥ S, T ◦ S = 0} T 0,∞ In words, γ2 (S) is the minimum γ2 norm of a matrix T which is 0 whenever S is zero, and at least 1 whenever S is 1. [sent-99, score-0.56]

62 This can be thought of as a “one-sided error” version of the more familiar ∞ γ2 norm of a sign matrix, which is the minimum γ2 norm of a matrix which agrees in sign with the ∞ target matrix and has all entries of magnitude at least 1. [sent-100, score-1.686]

63 The γ2 bound is also known to be equal to the margin complexity [11]. [sent-101, score-0.181]

64 We can always run Alg1 and get an output matrix X. [sent-103, score-0.316]

65 What we need in order to make intelligent use of X is a way to measure the distance between X and Y . [sent-104, score-0.13]

66 Our ﬁrst observation is that although Y is not known, it is possible to bound the distance between X and Y . [sent-105, score-0.124]

67 This result is stated in the following theorem which generalizes Theorems (2) and (4) of [9] 3 : Theorem 7 Fix a set of entries S ⊂ [m] × [n]. [sent-106, score-0.266]

68 Let P be a probability distribution on pairs (i, j) ∈ [m] × [n], such that there exists a real matrix Q satisfying 1. [sent-107, score-0.393]

69 γ2 (P − Q) ≤ Λ Then for every m × n real target matrix Y , if X is the output of our algorithm with initial subset S, it holds that Pij (Xij − Yij )2 ≤ 4Λγ2 (Y )2 . [sent-110, score-0.607]

70 i,j ∗ Theorem 7 says that γ2 (P − Q) determines, at least to some extent, the expected distance between X and Y with respect to P . [sent-111, score-0.135]

71 This gives us a way to measure the quality of the output of Alg1 for any set S of initial observations. [sent-112, score-0.268]

72 Choose a probability distribution P on the entries of the matrix. [sent-114, score-0.228]

73 Find a real matrix Q such that Qij = 0 when (i, j) ∈ S, and γ2 (P − Q) is minimal. [sent-116, score-0.258]

74 We then know, using Theorem 7, that the expected square distance between X and Y can be bounded in terms of Λ and the complexity of Y . [sent-119, score-0.15]

75 For example if the set of initial observations is contained in a submatrix we cannot expect X to be close to Y outside this submatrix. [sent-121, score-0.228]

76 In such cases it makes sense to restrict P to the submatrix containing S. [sent-122, score-0.078]

77 One approach to ﬁnd a distribution for which we can expect to be close on the unseen entries is to ∗ optimize over probability distributions P such that Theorem 7 gives the best bound. [sent-123, score-0.228]

78 See Section 4 for the corresponding result for the trace norm as well. [sent-126, score-0.509]

79 Output: a matrix X of smallest possible CC(X) under the condition that Xij = Yij for all (i, j) ∈ S. [sent-130, score-0.209]

80 We refer to this algorithm as Alg2, or Alg2(CC) if we wish to state the complexity measure that is used. [sent-133, score-0.187]

81 Globally, our algorithm for matrix completion therefore works in two phases. [sent-140, score-0.402]

82 We ﬁrst use Alg1 to get an output matrix X, and then use Alg2 in order to ﬁnd optimal guarantees regarding the distance between X and Y . [sent-141, score-0.474]

83 The generalization error bounds for this algorithm are proved in Section 4. [sent-142, score-0.127]

84 1 Using a general norm In our description of Alg2 above we have used the norm γ2 . [sent-144, score-0.63]

85 The same idea works for any norm Φ satisfying the property Φ(A ◦ A) ≤ Φ(A)2 . [sent-145, score-0.416]

86 Moreover, if the dual norm can be computed efﬁciently via a linear or semideﬁnite program, then the optimal distribution P for the bound can be found efﬁciently as well. [sent-146, score-0.479]

87 For example for the trace norm the algorithm becomes: Given the sample S run Alg1( · tr ) and get an output matrix X. [sent-147, score-1.008]

88 Then analogously to Theorem 7, we have Pij (Xij − Yij )2 ≤ 4Λ Y 2 tr . [sent-155, score-0.152]

89 i,j Both of these results will follow from a more general theorem which we show in the next section. [sent-156, score-0.069]

90 Let P be a probability distribution on pairs (i, j) ∈ [m] × [n], such that there exists a real matrix Q satisfying 1. [sent-161, score-0.393]

91 Φ∗ (P − Q) ≤ Λ 5 Then for every m × n real target matrix Y , if X is the output of algorithm Alg1(Φ) with initial subset S, it holds that Pij (Xij − Yij )2 ≤ 4Φ(Y )2 Λ. [sent-164, score-0.607]

92 i,j Proof Let R be the matrix where Rij = (Xij − Yij )2 . [sent-165, score-0.209]

93 i,j Both the trace norm and γ2 norm satisfy the condition of the theorem as they are multiplicative under tensor product. [sent-172, score-0.893]

94 5 Analyzing the error bound We now look more closely at the minimal value of the parameter Λ from Theorem 7. [sent-173, score-0.106]

95 The optimal ¯ value of Λ depends only on the set of observed indices S. [sent-174, score-0.081]

96 ∗ Given samples S we want to ﬁnd P, Q so as to minimize γ2 (P − Q) such that P is a probability ¯ and Q has support in S. [sent-176, score-0.072]

97 Taking the dual of this program we ﬁnd 1/Λ =minimize A γ2 (A) ¯ subject to A ≥ S ¯ A◦S =A 1 In words, this says that that Λ is equal to the minimum γ2 norm of a matrix that is zero on all entries 0,∞ ¯ ¯ in S and at least 1 on all entries in S. [sent-179, score-1.193]

98 This says that the ¯ according to the measure γ 0,∞ , the smaller the value more complex the set of unobserved entries S 2 0,∞ ¯ ∞ ¯ ¯ of Λ. [sent-181, score-0.336]

99 Note that in particular, if we consider the sign matrix S−S then γ2 (S) ≥ (γ2 (S−S)−1)/2 ¯ is lower bounded by the margin complexity of S − S. [sent-182, score-0.415]

100 Nuclear norm penalization and optimal rates for noisy low rank matrix completion. [sent-230, score-0.671]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('norm', 0.315), ('pij', 0.296), ('yij', 0.267), ('vut', 0.25), ('qij', 0.244), ('xij', 0.225), ('matrix', 0.209), ('entries', 0.197), ('trace', 0.194), ('completion', 0.193), ('agrees', 0.152), ('tr', 0.152), ('rank', 0.114), ('cc', 0.109), ('output', 0.107), ('dual', 0.103), ('rij', 0.1), ('initial', 0.094), ('incoherence', 0.091), ('ramanujan', 0.088), ('schechtman', 0.088), ('complexity', 0.087), ('sign', 0.086), ('retrieves', 0.081), ('target', 0.081), ('candes', 0.081), ('singular', 0.081), ('submatrix', 0.078), ('says', 0.072), ('combinatorica', 0.07), ('theorem', 0.069), ('satisfying', 0.068), ('generalization', 0.067), ('subset', 0.067), ('sparsi', 0.067), ('measure', 0.067), ('semide', 0.064), ('program', 0.064), ('revealed', 0.063), ('distance', 0.063), ('retrieve', 0.061), ('bound', 0.061), ('proved', 0.06), ('guarantees', 0.059), ('outside', 0.056), ('hadamard', 0.052), ('proxy', 0.05), ('real', 0.049), ('rk', 0.049), ('indices', 0.048), ('nition', 0.048), ('srebro', 0.047), ('obviously', 0.046), ('answer', 0.046), ('motivates', 0.045), ('minimal', 0.045), ('troy', 0.044), ('expanders', 0.044), ('shraibman', 0.044), ('reconstruct', 0.044), ('fix', 0.041), ('foygel', 0.041), ('mendelson', 0.041), ('tel', 0.041), ('koltchinskii', 0.041), ('expander', 0.041), ('minimize', 0.041), ('norms', 0.04), ('linial', 0.038), ('quantum', 0.036), ('lps', 0.036), ('keshavan', 0.036), ('pairs', 0.036), ('lowest', 0.036), ('minimum', 0.036), ('regarding', 0.036), ('impossible', 0.035), ('eigenvalues', 0.035), ('universal', 0.035), ('rennie', 0.033), ('penalization', 0.033), ('unavoidable', 0.033), ('observed', 0.033), ('property', 0.033), ('wish', 0.033), ('line', 0.033), ('margin', 0.033), ('entrywise', 0.032), ('phillips', 0.032), ('technological', 0.032), ('ix', 0.032), ('centre', 0.031), ('matrices', 0.031), ('probability', 0.031), ('sample', 0.031), ('exact', 0.031), ('nicely', 0.03), ('montanari', 0.03), ('tsybakov', 0.03), ('received', 0.03)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000001 185 nips-2013-Matrix Completion From any Given Set of Observations

Author: Troy Lee, Adi Shraibman

2 0.30632511 206 nips-2013-Near-Optimal Entrywise Sampling for Data Matrices

Author: Dimitris Achlioptas, Zohar S. Karnin, Edo Liberty

Abstract: We consider the problem of selecting non-zero entries of a matrix A in order to produce a sparse sketch of it, B, that minimizes A B 2 . For large m n matrices, such that n m (for example, representing n observations over m attributes) we give sampling distributions that exhibit four important properties. First, they have closed forms computable from minimal information regarding A. Second, they allow sketching of matrices whose non-zeros are presented to the algorithm in arbitrary order as a stream, with O 1 computation per non-zero. Third, the resulting sketch matrices are not only sparse, but their non-zero entries are highly compressible. Lastly, and most importantly, under mild assumptions, our distributions are provably competitive with the optimal ofﬂine distribution. Note that the probabilities in the optimal ofﬂine distribution may be complex functions of all the entries in the matrix. Therefore, regardless of computational complexity, the optimal distribution might be impossible to compute in the streaming model. 1

3 0.23697039 179 nips-2013-Low-Rank Matrix and Tensor Completion via Adaptive Sampling

Author: Akshay Krishnamurthy, Aarti Singh

Abstract: We study low rank matrix and tensor completion and propose novel algorithms that employ adaptive sampling schemes to obtain strong performance guarantees. Our algorithms exploit adaptivity to identify entries that are highly informative for learning the column space of the matrix (tensor) and consequently, our results hold even when the row space is highly coherent, in contrast with previous analyses. In the absence of noise, we show that one can exactly recover a n ⇥ n matrix of rank r from merely ⌦(nr3/2 log(r)) matrix entries. We also show that one can recover an order T tensor using ⌦(nrT 1/2 T 2 log(r)) entries. For noisy recovery, our algorithm consistently estimates a low rank matrix corrupted with noise using ⌦(nr3/2 polylog(n)) entries. We complement our study with simulations that verify our theory and demonstrate the scalability of our algorithms. 1

4 0.21149209 11 nips-2013-A New Convex Relaxation for Tensor Completion

Author: Bernardino Romera-Paredes, Massimiliano Pontil

Abstract: We study the problem of learning a tensor from a set of linear measurements. A prominent methodology for this problem is based on a generalization of trace norm regularization, which has been used extensively for learning low rank matrices, to the tensor setting. In this paper, we highlight some limitations of this approach and propose an alternative convex relaxation on the Euclidean ball. We then describe a technique to solve the associated regularization problem, which builds upon the alternating direction method of multipliers. Experiments on one synthetic dataset and two real datasets indicate that the proposed method improves signiﬁcantly over tensor trace norm regularization in terms of estimation error, while remaining computationally tractable. 1

5 0.1984849 222 nips-2013-On the Linear Convergence of the Proximal Gradient Method for Trace Norm Regularization

Author: Ke Hou, Zirui Zhou, Anthony Man-Cho So, Zhi-Quan Luo

Abstract: Motivated by various applications in machine learning, the problem of minimizing a convex smooth loss function with trace norm regularization has received much attention lately. Currently, a popular method for solving such problem is the proximal gradient method (PGM), which is known to have a sublinear rate of convergence. In this paper, we show that for a large class of loss functions, the convergence rate of the PGM is in fact linear. Our result is established without any strong convexity assumption on the loss function. A key ingredient in our proof is a new Lipschitzian error bound for the aforementioned trace norm–regularized problem, which may be of independent interest.

6 0.16728061 307 nips-2013-Speedup Matrix Completion with Side Information: Application to Multi-Label Learning

7 0.16628601 108 nips-2013-Error-Minimizing Estimates and Universal Entry-Wise Error Bounds for Low-Rank Matrix Completion

8 0.13966045 254 nips-2013-Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms

9 0.13686399 113 nips-2013-Exact and Stable Recovery of Pairwise Interaction Tensors

10 0.13371667 91 nips-2013-Dirty Statistical Models

11 0.11429074 296 nips-2013-Sinkhorn Distances: Lightspeed Computation of Optimal Transport

12 0.10298202 285 nips-2013-Robust Transfer Principal Component Analysis with Rank Constraints

13 0.10222603 74 nips-2013-Convex Tensor Decomposition via Structured Schatten Norm Regularization

14 0.097498231 286 nips-2013-Robust learning of low-dimensional dynamics from large neural ensembles

15 0.096859679 116 nips-2013-Fantope Projection and Selection: A near-optimal convex relaxation of sparse PCA

16 0.096359812 155 nips-2013-Learning Hidden Markov Models from Non-sequence Data via Tensor Decomposition

17 0.094276935 306 nips-2013-Speeding up Permutation Testing in Neuroimaging

18 0.089951277 68 nips-2013-Confidence Intervals and Hypothesis Testing for High-Dimensional Statistical Models

19 0.087577716 228 nips-2013-Online Learning of Dynamic Parameters in Social Networks

20 0.081618465 188 nips-2013-Memory Limited, Streaming PCA

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.215), (1, 0.101), (2, 0.167), (3, 0.23), (4, -0.007), (5, -0.1), (6, -0.042), (7, -0.029), (8, -0.054), (9, 0.025), (10, 0.061), (11, 0.022), (12, 0.06), (13, 0.009), (14, -0.04), (15, -0.022), (16, -0.1), (17, 0.019), (18, -0.005), (19, 0.016), (20, -0.041), (21, -0.192), (22, 0.004), (23, -0.097), (24, -0.053), (25, -0.037), (26, -0.004), (27, -0.331), (28, -0.011), (29, -0.039), (30, -0.092), (31, -0.107), (32, -0.107), (33, -0.098), (34, 0.017), (35, 0.013), (36, 0.055), (37, 0.052), (38, -0.041), (39, 0.021), (40, -0.008), (41, -0.008), (42, -0.011), (43, -0.033), (44, 0.039), (45, -0.091), (46, -0.0), (47, -0.012), (48, 0.051), (49, -0.009)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97073013 185 nips-2013-Matrix Completion From any Given Set of Observations

Author: Troy Lee, Adi Shraibman

2 0.88847023 206 nips-2013-Near-Optimal Entrywise Sampling for Data Matrices

Author: Dimitris Achlioptas, Zohar S. Karnin, Edo Liberty

3 0.87481391 307 nips-2013-Speedup Matrix Completion with Side Information: Application to Multi-Label Learning

Author: Miao Xu, Rong Jin, Zhi-Hua Zhou

Abstract: In standard matrix completion theory, it is required to have at least O(n ln2 n) observed entries to perfectly recover a low-rank matrix M of size n × n, leading to a large number of observations when n is large. In many real tasks, side information in addition to the observed entries is often available. In this work, we develop a novel theory of matrix completion that explicitly explore the side information to reduce the requirement on the number of observed entries. We show that, under appropriate conditions, with the assistance of side information matrices, the number of observed entries needed for a perfect recovery of matrix M can be dramatically reduced to O(ln n). We demonstrate the effectiveness of the proposed approach for matrix completion in transductive incomplete multi-label learning. 1

4 0.85347486 108 nips-2013-Error-Minimizing Estimates and Universal Entry-Wise Error Bounds for Low-Rank Matrix Completion

Author: Franz Kiraly, Louis Theran

Abstract: We propose a general framework for reconstructing and denoising single entries of incomplete and noisy entries. We describe: effective algorithms for deciding if and entry can be reconstructed and, if so, for reconstructing and denoising it; and a priori bounds on the error of each entry, individually. In the noiseless case our algorithm is exact. For rank-one matrices, the new algorithm is fast, admits a highly-parallel implementation, and produces an error minimizing estimate that is qualitatively close to our theoretical and the state-of-the-art Nuclear Norm and OptSpace methods. 1

5 0.81271523 254 nips-2013-Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms

Author: Adrien Todeschini, François Caron, Marie Chavent

Abstract: We propose a novel class of algorithms for low rank matrix completion. Our approach builds on novel penalty functions on the singular values of the low rank matrix. By exploiting a mixture model representation of this penalty, we show that a suitably chosen set of latent variables enables to derive an ExpectationMaximization algorithm to obtain a Maximum A Posteriori estimate of the completed low rank matrix. The resulting algorithm is an iterative soft-thresholded algorithm which iteratively adapts the shrinkage coefﬁcients associated to the singular values. The algorithm is simple to implement and can scale to large matrices. We provide numerical comparisons between our approach and recent alternatives showing the interest of the proposed approach for low rank matrix completion. 1

6 0.77573335 179 nips-2013-Low-Rank Matrix and Tensor Completion via Adaptive Sampling

7 0.77093542 342 nips-2013-Unsupervised Spectral Learning of Finite State Transducers

8 0.70219707 296 nips-2013-Sinkhorn Distances: Lightspeed Computation of Optimal Transport

9 0.68084294 222 nips-2013-On the Linear Convergence of the Proximal Gradient Method for Trace Norm Regularization

10 0.61350787 352 nips-2013-What do row and column marginals reveal about your dataset?

11 0.60988927 186 nips-2013-Matrix factorization with binary components

12 0.5968864 113 nips-2013-Exact and Stable Recovery of Pairwise Interaction Tensors

13 0.54248363 73 nips-2013-Convex Relaxations for Permutation Problems

14 0.53992385 297 nips-2013-Sketching Structured Matrices for Faster Nonlinear Regression

15 0.53840572 91 nips-2013-Dirty Statistical Models

16 0.52731878 11 nips-2013-A New Convex Relaxation for Tensor Completion

17 0.52135038 214 nips-2013-On Algorithms for Sparse Multi-factor NMF

18 0.5124836 306 nips-2013-Speeding up Permutation Testing in Neuroimaging

19 0.49925652 259 nips-2013-Provable Subspace Clustering: When LRR meets SSC

20 0.49015683 247 nips-2013-Phase Retrieval using Alternating Minimization

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(16, 0.01), (33, 0.201), (34, 0.058), (41, 0.011), (49, 0.019), (56, 0.182), (85, 0.052), (89, 0.028), (93, 0.02), (95, 0.34)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.913369 220 nips-2013-On the Complexity and Approximation of Binary Evidence in Lifted Inference

Author: Guy van den Broeck, Adnan Darwiche

Abstract: Lifted inference algorithms exploit symmetries in probabilistic models to speed up inference. They show impressive performance when calculating unconditional probabilities in relational models, but often resort to non-lifted inference when computing conditional probabilities. The reason is that conditioning on evidence breaks many of the model’s symmetries, which can preempt standard lifting techniques. Recent theoretical results show, for example, that conditioning on evidence which corresponds to binary relations is #P-hard, suggesting that no lifting is to be expected in the worst case. In this paper, we balance this negative result by identifying the Boolean rank of the evidence as a key parameter for characterizing the complexity of conditioning in lifted inference. In particular, we show that conditioning on binary evidence with bounded Boolean rank is efﬁcient. This opens up the possibility of approximating evidence by a low-rank Boolean matrix factorization, which we investigate both theoretically and empirically. 1

2 0.89200884 309 nips-2013-Statistical Active Learning Algorithms

Author: Maria-Florina Balcan, Vitaly Feldman

Abstract: We describe a framework for designing efﬁcient active learning algorithms that are tolerant to random classiﬁcation noise and differentially-private. The framework is based on active learning algorithms that are statistical in the sense that they rely on estimates of expectations of functions of ﬁltered random examples. It builds on the powerful statistical query framework of Kearns [30]. We show that any efﬁcient active statistical learning algorithm can be automatically converted to an efﬁcient active learning algorithm which is tolerant to random classiﬁcation noise as well as other forms of “uncorrelated” noise. We show that commonly studied concept classes including thresholds, rectangles, and linear separators can be efﬁciently actively learned in our framework. These results combined with our generic conversion lead to the ﬁrst computationally-efﬁcient algorithms for actively learning some of these concept classes in the presence of random classiﬁcation noise that provide exponential improvement in the dependence on the error over their passive counterparts. In addition, we show that our algorithms can be automatically converted to efﬁcient active differentially-private algorithms. This leads to the ﬁrst differentially-private active learning algorithms with exponential label savings over the passive case. 1

3 0.88497245 268 nips-2013-Reflection methods for user-friendly submodular optimization

Author: Stefanie Jegelka, Francis Bach, Suvrit Sra

Abstract: Recently, it has become evident that submodularity naturally captures widely occurring concepts in machine learning, signal processing and computer vision. Consequently, there is need for efﬁcient optimization procedures for submodular functions, especially for minimization problems. While general submodular minimization is challenging, we propose a new method that exploits existing decomposability of submodular functions. In contrast to previous approaches, our method is neither approximate, nor impractical, nor does it need any cumbersome parameter tuning. Moreover, it is easy to implement and parallelize. A key component of our method is a formulation of the discrete submodular minimization problem as a continuous best approximation problem that is solved through a sequence of reﬂections, and its solution can be easily thresholded to obtain an optimal discrete solution. This method solves both the continuous and discrete formulations of the problem, and therefore has applications in learning, inference, and reconstruction. In our experiments, we illustrate the beneﬁts of our method on two image segmentation tasks. 1

4 0.85752499 271 nips-2013-Regularized M-estimators with nonconvexity: Statistical and algorithmic theory for local optima

Author: Po-Ling Loh, Martin J. Wainwright

Abstract: We establish theoretical results concerning local optima of regularized M estimators, where both loss and penalty functions are allowed to be nonconvex. Our results show that as long as the loss satisﬁes restricted strong convexity and the penalty satisﬁes suitable regularity conditions, any local optimum of the composite objective lies within statistical precision of the true parameter vector. Our theory covers a broad class of nonconvex objective functions, including corrected versions of the Lasso for errors-in-variables linear models and regression in generalized linear models using nonconvex regularizers such as SCAD and MCP. On the optimization side, we show that a simple adaptation of composite gradient descent may be used to compute a global optimum up to the statistical precision stat in log(1/ stat ) iterations, the fastest possible rate for any ﬁrst-order method. We provide simulations to illustrate the sharpness of our theoretical predictions. 1

5 0.82966161 154 nips-2013-Learning Gaussian Graphical Models with Observed or Latent FVSs

Author: Ying Liu, Alan Willsky

Abstract: Gaussian Graphical Models (GGMs) or Gauss Markov random ﬁelds are widely used in many applications, and the trade-off between the modeling capacity and the efﬁciency of learning and inference has been an important research problem. In this paper, we study the family of GGMs with small feedback vertex sets (FVSs), where an FVS is a set of nodes whose removal breaks all the cycles. Exact inference such as computing the marginal distributions and the partition function has complexity O(k 2 n) using message-passing algorithms, where k is the size of the FVS, and n is the total number of nodes. We propose efﬁcient structure learning algorithms for two cases: 1) All nodes are observed, which is useful in modeling social or ﬂight networks where the FVS nodes often correspond to a small number of highly inﬂuential nodes, or hubs, while the rest of the networks is modeled by a tree. Regardless of the maximum degree, without knowing the full graph structure, we can exactly compute the maximum likelihood estimate with complexity O(kn2 + n2 log n) if the FVS is known or in polynomial time if the FVS is unknown but has bounded size. 2) The FVS nodes are latent variables, where structure learning is equivalent to decomposing an inverse covariance matrix (exactly or approximately) into the sum of a tree-structured matrix and a low-rank matrix. By incorporating efﬁcient inference into the learning steps, we can obtain a learning algorithm using alternating low-rank corrections with complexity O(kn2 + n2 log n) per iteration. We perform experiments using both synthetic data as well as real data of ﬂight delays to demonstrate the modeling capacity with FVSs of various sizes. 1

same-paper 6 0.82639611 185 nips-2013-Matrix Completion From any Given Set of Observations

7 0.76738024 42 nips-2013-Auditing: Active Learning with Outcome-Dependent Query Costs

8 0.73424608 78 nips-2013-Curvature and Optimal Algorithms for Learning and Minimizing Submodular Functions

9 0.72485149 102 nips-2013-Efficient Algorithm for Privately Releasing Smooth Queries

10 0.7235297 149 nips-2013-Latent Structured Active Learning

11 0.7161696 359 nips-2013-Σ-Optimality for Active Learning on Gaussian Random Fields

12 0.70384878 207 nips-2013-Near-optimal Anomaly Detection in Graphs using Lovasz Extended Scan Statistic

13 0.69660729 184 nips-2013-Marginals-to-Models Reducibility

14 0.69173276 33 nips-2013-An Approximate, Efficient LP Solver for LP Rounding

15 0.69127607 132 nips-2013-Global MAP-Optimality by Shrinking the Combinatorial Search Area with Convex Relaxation

16 0.69035107 142 nips-2013-Information-theoretic lower bounds for distributed statistical estimation with communication constraints

17 0.68968046 60 nips-2013-Buy-in-Bulk Active Learning

18 0.689493 28 nips-2013-Adaptive Step-Size for Policy Gradient Methods

19 0.68935239 319 nips-2013-Submodular Optimization with Submodular Cover and Submodular Knapsack Constraints

20 0.68926555 57 nips-2013-Beyond Pairwise: Provably Fast Algorithms for Approximate $k$-Way Similarity Search