nips nips2013 nips2013-185 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Troy Lee, Adi Shraibman
Abstract: In the matrix completion problem the aim is to recover an unknown real matrix from a subset of its entries. This problem comes up in many application areas, and has received a great deal of attention in the context of the netflix prize. A central approach to this problem is to output a matrix of lowest possible complexity (e.g. rank or trace norm) that agrees with the partially specified matrix. The performance of this approach under the assumption that the revealed entries are sampled randomly has received considerable attention (e.g. [1, 2, 3, 4, 5, 6, 7, 8]). In practice, often the set of revealed entries is not chosen at random and these results do not apply. We are therefore left with no guarantees on the performance of the algorithm we are using. We present a means to obtain performance guarantees with respect to any set of initial observations. The first step remains the same: find a matrix of lowest possible complexity that agrees with the partially specified matrix. We give a new way to interpret the output of this algorithm by next finding a probability distribution over the non-revealed entries with respect to which a bound on the generalization error can be proven. The more complex the set of revealed entries according to a certain measure, the better the bound on the generalization error. 1
Reference: text
sentIndex sentText sentNum sentScore
1 com Abstract In the matrix completion problem the aim is to recover an unknown real matrix from a subset of its entries. [sent-4, score-0.727]
2 A central approach to this problem is to output a matrix of lowest possible complexity (e. [sent-6, score-0.439]
3 rank or trace norm) that agrees with the partially specified matrix. [sent-8, score-0.46]
4 The performance of this approach under the assumption that the revealed entries are sampled randomly has received considerable attention (e. [sent-9, score-0.29]
5 In practice, often the set of revealed entries is not chosen at random and these results do not apply. [sent-12, score-0.26]
6 We present a means to obtain performance guarantees with respect to any set of initial observations. [sent-14, score-0.153]
7 The first step remains the same: find a matrix of lowest possible complexity that agrees with the partially specified matrix. [sent-15, score-0.484]
8 We give a new way to interpret the output of this algorithm by next finding a probability distribution over the non-revealed entries with respect to which a bound on the generalization error can be proven. [sent-16, score-0.463]
9 The more complex the set of revealed entries according to a certain measure, the better the bound on the generalization error. [sent-17, score-0.388]
10 1 Introduction In the matrix completion problem we observe a subset of the entries of a target matrix Y , and our aim is to retrieve the rest of the matrix. [sent-18, score-1.017]
11 Obviously some restriction on the target matrix Y is unavoidable as otherwise it is impossible to retrieve even one missing entry; usually, it is assumed that Y is generated in a way so as to have low complexity according to a measure such as matrix rank. [sent-19, score-0.782]
12 A common scheme for the matrix completion problem is to select a matrix X that minimizes some combination of the complexity of X and the distance between X and Y on the observed part. [sent-20, score-0.794]
13 In particular, one can demand that X agrees with Y on the observed initial sample (i. [sent-21, score-0.31]
14 the distance between X and Y on the observed part is zero). [sent-23, score-0.096]
15 It outputs a matrix with minimal complexity that agrees with Y on the initial sample S. [sent-25, score-0.618]
16 The complexity measure can be rank, or a norm to serve as an efficiently computable proxy for the rank such as the trace norm or γ2 norm. [sent-26, score-1.142]
17 When we wish to mention which complexity measure is used we write it explicitly, e. [sent-27, score-0.187]
18 Our framework is suitable using any norm satisfying few simple conditions described in the sequel. [sent-30, score-0.383]
19 The performance of Alg1 under the assumption that the initial subset is picked at random is well understood [1, 2, 3, 4, 5, 6, 7, 8]. [sent-31, score-0.161]
20 One line of research [5, 6, 4] studies conditions under which Alg1(Tr) retrieves the matrix exactly 1 . [sent-33, score-0.323]
21 They 1 There are other papers studying exact matrix completion, e. [sent-34, score-0.24]
22 1 define what they call an incoherence property which quantifies how spread the singular vectors of Y are. [sent-37, score-0.205]
23 The exact definition of the incoherence property varies in different results. [sent-38, score-0.155]
24 It is then proved that if there are enough samples relative to the rank of Y and its incoherence property, then Alg1(Tr) retrieves the matrix Y exactly with high probability, assuming the samples are chosen uniformly at random. [sent-39, score-0.555]
25 Note that in this line of research the trace norm is used as the complexity measure in the algorithm. [sent-40, score-0.696]
26 Candes and Recht [5] observed that it is impossible to reconstruct a matrix that has only one entry equal to 1 and zeros everywhere else, unless most of its entries are observed. [sent-42, score-0.518]
27 Thus, exact matrix completion must assume some special property of the target matrix Y . [sent-43, score-0.756]
28 In a second line of research, general results are proved regarding the performance of Alg1. [sent-44, score-0.129]
29 These results are weaker in that they do not prove exact recovery, but rather bounds on the distance between the output matrix X and Y . [sent-45, score-0.41]
30 But these results apply for every matrix Y , they can be generalized for non-uniform probability distributions, and also apply when the complexity measure is the γ2 norm. [sent-46, score-0.394]
31 These results take the following form: Theorem 1 ([2]) Let Y be an n × n real matrix, and P a probability distribution on pairs (i, j) ∈ [n]2 . [sent-47, score-0.116]
32 Choose a sample S of |S| > n log n entries according to P . [sent-48, score-0.228]
33 |S| i,j Where X is the output of the algorithm with sample S, and c is a universal constant. [sent-50, score-0.173]
34 What can we say about the output of the algorithm in this case? [sent-53, score-0.107]
35 In order to answer this question we need to understand what properties of a sample enable generalization. [sent-55, score-0.077]
36 A first step in this direction was taken in [9] where the initial subset was chosen deterministically as the set of edges of a good expander (more generally, a good sparsifier). [sent-56, score-0.202]
37 Deterministic guarantees were proved for the algorithm in this case, that resemble the guarantees proved for random sampling. [sent-57, score-0.238]
38 For example: Theorem 2 [9] Let S be the set of edges of a d-regular graph with second eigenvalue For every n × n real matrix Y , if X is the output of Alg1 with initial subset S, then λ 1 (Xij − Yij )2 ≤ cγ2 (Y )2 , n2 i,j d 2 bound λ. [sent-58, score-0.587]
39 This theorem was also generalized to bound the error with respect to any probability distribution. [sent-63, score-0.161]
40 Instead of expanders, sparsifiers were used to select the entries to observe for this result. [sent-64, score-0.197]
41 There is an efficiently constructible set S ⊂ [n]2 of size at most dn, such that for every n × n real target matrix Y , if X is the output of our algorithm with initial subset S, then 1 Pij (Xij − Yij )2 ≤ cγ2 (Y )2 √ . [sent-66, score-0.607]
42 d i,j The results in [9] still do not answer the practical question of how to reconstruct a matrix from an arbitrary sample. [sent-67, score-0.299]
43 We extend the results of [9] in several ways: 2 The eigenvalues are eigenvalues of the adjacency matrix of the graph. [sent-69, score-0.279]
44 We upper bound the generalization error of Alg1 given any set of initial observations. [sent-71, score-0.222]
45 This bound depends on properties of the set of observed entries. [sent-72, score-0.094]
46 We show there is a probability distribution outside of the observed entries such that the generalization error under this distribution is bounded in terms of the complexity of the observed entries, under a certain complexity measure. [sent-74, score-0.591]
47 The results hold not only for γ2 but also for the trace norm, and in fact any norm satisfying a few basic properties. [sent-76, score-0.577]
48 2 Preliminaries Here we introduce some of the matrix notation and norms that we will be using. [sent-77, score-0.249]
49 For a m-by-n matrix A with m ≥ n let σ1 (A) ≥ · · · ≥ σn (A) denote the singular values of A. [sent-79, score-0.29]
50 The trace norm, denoted A tr , is the 1 norm of the vector of singular values, and the Frobenius norm, denoted A F , is the 2 norm of the vector of singular values. [sent-80, score-1.138]
51 As the rank of a matrix is equal to the number of non-zero singular values, it follows from the Cauchy-Schwarz inequality that A 2 tr ≤ rk(A) . [sent-81, score-0.556]
52 (1) A 2 F This inequality motivates the use of the trace norm as a proxy for rank in rank minimization problems. [sent-82, score-0.832]
53 A problem with the bound of (1) as a complexity measure is that it is not monotone—the bound can be larger on a submatrix of A than on A itself. [sent-83, score-0.354]
54 As taking the Hadamard product of a matrix with a rank one matrix does not increase its rank, a way to fix this problem is to consider instead: A ◦ vuT 2 tr ≤ rk(A) . [sent-84, score-0.684]
55 max u,v A ◦ vuT 2 F u = v =1 When A is a sign matrix, this bound simplifies nicely—for then, A ◦ vuT we are left with max A ◦ vuT 2 ≤ rk(A) . [sent-85, score-0.147]
56 tr u,v F = u v = 1, and u = v =1 This motivates the definition of the γ2 norm. [sent-86, score-0.197]
57 tr u = v =1 We will also make use of the dual norms of the trace and γ2 norms. [sent-89, score-0.489]
58 Recall that in general for a norm Φ(A) the dual norm Φ∗ is defined as Φ∗ (A) = max B Notice that this means that A, B Φ(B) A, B ≤ Φ∗ (A)Φ(B) . [sent-90, score-0.733]
59 The dual of the trace norm is · the operator norm from The dual of the γ2 norm looks as follows. [sent-91, score-1.345]
60 This is the minimum of the γ2 norm over all matrices which approximate the target matrix in some sense. [sent-95, score-0.672]
61 Then 0,∞ ¯ γ2 (S) = min{γ2 (T ) : T ◦ S ≥ S, T ◦ S = 0} T 0,∞ In words, γ2 (S) is the minimum γ2 norm of a matrix T which is 0 whenever S is zero, and at least 1 whenever S is 1. [sent-99, score-0.56]
62 This can be thought of as a “one-sided error” version of the more familiar ∞ γ2 norm of a sign matrix, which is the minimum γ2 norm of a matrix which agrees in sign with the ∞ target matrix and has all entries of magnitude at least 1. [sent-100, score-1.686]
63 The γ2 bound is also known to be equal to the margin complexity [11]. [sent-101, score-0.181]
64 We can always run Alg1 and get an output matrix X. [sent-103, score-0.316]
65 What we need in order to make intelligent use of X is a way to measure the distance between X and Y . [sent-104, score-0.13]
66 Our first observation is that although Y is not known, it is possible to bound the distance between X and Y . [sent-105, score-0.124]
67 This result is stated in the following theorem which generalizes Theorems (2) and (4) of [9] 3 : Theorem 7 Fix a set of entries S ⊂ [m] × [n]. [sent-106, score-0.266]
68 Let P be a probability distribution on pairs (i, j) ∈ [m] × [n], such that there exists a real matrix Q satisfying 1. [sent-107, score-0.393]
69 γ2 (P − Q) ≤ Λ Then for every m × n real target matrix Y , if X is the output of our algorithm with initial subset S, it holds that Pij (Xij − Yij )2 ≤ 4Λγ2 (Y )2 . [sent-110, score-0.607]
70 i,j ∗ Theorem 7 says that γ2 (P − Q) determines, at least to some extent, the expected distance between X and Y with respect to P . [sent-111, score-0.135]
71 This gives us a way to measure the quality of the output of Alg1 for any set S of initial observations. [sent-112, score-0.268]
72 Choose a probability distribution P on the entries of the matrix. [sent-114, score-0.228]
73 Find a real matrix Q such that Qij = 0 when (i, j) ∈ S, and γ2 (P − Q) is minimal. [sent-116, score-0.258]
74 We then know, using Theorem 7, that the expected square distance between X and Y can be bounded in terms of Λ and the complexity of Y . [sent-119, score-0.15]
75 For example if the set of initial observations is contained in a submatrix we cannot expect X to be close to Y outside this submatrix. [sent-121, score-0.228]
76 In such cases it makes sense to restrict P to the submatrix containing S. [sent-122, score-0.078]
77 One approach to find a distribution for which we can expect to be close on the unseen entries is to ∗ optimize over probability distributions P such that Theorem 7 gives the best bound. [sent-123, score-0.228]
78 See Section 4 for the corresponding result for the trace norm as well. [sent-126, score-0.509]
79 Output: a matrix X of smallest possible CC(X) under the condition that Xij = Yij for all (i, j) ∈ S. [sent-130, score-0.209]
80 We refer to this algorithm as Alg2, or Alg2(CC) if we wish to state the complexity measure that is used. [sent-133, score-0.187]
81 Globally, our algorithm for matrix completion therefore works in two phases. [sent-140, score-0.402]
82 We first use Alg1 to get an output matrix X, and then use Alg2 in order to find optimal guarantees regarding the distance between X and Y . [sent-141, score-0.474]
83 The generalization error bounds for this algorithm are proved in Section 4. [sent-142, score-0.127]
84 1 Using a general norm In our description of Alg2 above we have used the norm γ2 . [sent-144, score-0.63]
85 The same idea works for any norm Φ satisfying the property Φ(A ◦ A) ≤ Φ(A)2 . [sent-145, score-0.416]
86 Moreover, if the dual norm can be computed efficiently via a linear or semidefinite program, then the optimal distribution P for the bound can be found efficiently as well. [sent-146, score-0.479]
87 For example for the trace norm the algorithm becomes: Given the sample S run Alg1( · tr ) and get an output matrix X. [sent-147, score-1.008]
88 Then analogously to Theorem 7, we have Pij (Xij − Yij )2 ≤ 4Λ Y 2 tr . [sent-155, score-0.152]
89 i,j Both of these results will follow from a more general theorem which we show in the next section. [sent-156, score-0.069]
90 Let P be a probability distribution on pairs (i, j) ∈ [m] × [n], such that there exists a real matrix Q satisfying 1. [sent-161, score-0.393]
91 Φ∗ (P − Q) ≤ Λ 5 Then for every m × n real target matrix Y , if X is the output of algorithm Alg1(Φ) with initial subset S, it holds that Pij (Xij − Yij )2 ≤ 4Φ(Y )2 Λ. [sent-164, score-0.607]
92 i,j Proof Let R be the matrix where Rij = (Xij − Yij )2 . [sent-165, score-0.209]
93 i,j Both the trace norm and γ2 norm satisfy the condition of the theorem as they are multiplicative under tensor product. [sent-172, score-0.893]
94 5 Analyzing the error bound We now look more closely at the minimal value of the parameter Λ from Theorem 7. [sent-173, score-0.106]
95 The optimal ¯ value of Λ depends only on the set of observed indices S. [sent-174, score-0.081]
96 ∗ Given samples S we want to find P, Q so as to minimize γ2 (P − Q) such that P is a probability ¯ and Q has support in S. [sent-176, score-0.072]
97 Taking the dual of this program we find 1/Λ =minimize A γ2 (A) ¯ subject to A ≥ S ¯ A◦S =A 1 In words, this says that that Λ is equal to the minimum γ2 norm of a matrix that is zero on all entries 0,∞ ¯ ¯ in S and at least 1 on all entries in S. [sent-179, score-1.193]
98 This says that the ¯ according to the measure γ 0,∞ , the smaller the value more complex the set of unobserved entries S 2 0,∞ ¯ ∞ ¯ ¯ of Λ. [sent-181, score-0.336]
99 Note that in particular, if we consider the sign matrix S−S then γ2 (S) ≥ (γ2 (S−S)−1)/2 ¯ is lower bounded by the margin complexity of S − S. [sent-182, score-0.415]
100 Nuclear norm penalization and optimal rates for noisy low rank matrix completion. [sent-230, score-0.671]
wordName wordTfidf (topN-words)
[('norm', 0.315), ('pij', 0.296), ('yij', 0.267), ('vut', 0.25), ('qij', 0.244), ('xij', 0.225), ('matrix', 0.209), ('entries', 0.197), ('trace', 0.194), ('completion', 0.193), ('agrees', 0.152), ('tr', 0.152), ('rank', 0.114), ('cc', 0.109), ('output', 0.107), ('dual', 0.103), ('rij', 0.1), ('initial', 0.094), ('incoherence', 0.091), ('ramanujan', 0.088), ('schechtman', 0.088), ('complexity', 0.087), ('sign', 0.086), ('retrieves', 0.081), ('target', 0.081), ('candes', 0.081), ('singular', 0.081), ('submatrix', 0.078), ('says', 0.072), ('combinatorica', 0.07), ('theorem', 0.069), ('satisfying', 0.068), ('generalization', 0.067), ('subset', 0.067), ('sparsi', 0.067), ('measure', 0.067), ('semide', 0.064), ('program', 0.064), ('revealed', 0.063), ('distance', 0.063), ('retrieve', 0.061), ('bound', 0.061), ('proved', 0.06), ('guarantees', 0.059), ('outside', 0.056), ('hadamard', 0.052), ('proxy', 0.05), ('real', 0.049), ('rk', 0.049), ('indices', 0.048), ('nition', 0.048), ('srebro', 0.047), ('obviously', 0.046), ('answer', 0.046), ('motivates', 0.045), ('minimal', 0.045), ('troy', 0.044), ('expanders', 0.044), ('shraibman', 0.044), ('reconstruct', 0.044), ('fix', 0.041), ('foygel', 0.041), ('mendelson', 0.041), ('tel', 0.041), ('koltchinskii', 0.041), ('expander', 0.041), ('minimize', 0.041), ('norms', 0.04), ('linial', 0.038), ('quantum', 0.036), ('lps', 0.036), ('keshavan', 0.036), ('pairs', 0.036), ('lowest', 0.036), ('minimum', 0.036), ('regarding', 0.036), ('impossible', 0.035), ('eigenvalues', 0.035), ('universal', 0.035), ('rennie', 0.033), ('penalization', 0.033), ('unavoidable', 0.033), ('observed', 0.033), ('property', 0.033), ('wish', 0.033), ('line', 0.033), ('margin', 0.033), ('entrywise', 0.032), ('phillips', 0.032), ('technological', 0.032), ('ix', 0.032), ('centre', 0.031), ('matrices', 0.031), ('probability', 0.031), ('sample', 0.031), ('exact', 0.031), ('nicely', 0.03), ('montanari', 0.03), ('tsybakov', 0.03), ('received', 0.03)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 185 nips-2013-Matrix Completion From any Given Set of Observations
Author: Troy Lee, Adi Shraibman
Abstract: In the matrix completion problem the aim is to recover an unknown real matrix from a subset of its entries. This problem comes up in many application areas, and has received a great deal of attention in the context of the netflix prize. A central approach to this problem is to output a matrix of lowest possible complexity (e.g. rank or trace norm) that agrees with the partially specified matrix. The performance of this approach under the assumption that the revealed entries are sampled randomly has received considerable attention (e.g. [1, 2, 3, 4, 5, 6, 7, 8]). In practice, often the set of revealed entries is not chosen at random and these results do not apply. We are therefore left with no guarantees on the performance of the algorithm we are using. We present a means to obtain performance guarantees with respect to any set of initial observations. The first step remains the same: find a matrix of lowest possible complexity that agrees with the partially specified matrix. We give a new way to interpret the output of this algorithm by next finding a probability distribution over the non-revealed entries with respect to which a bound on the generalization error can be proven. The more complex the set of revealed entries according to a certain measure, the better the bound on the generalization error. 1
2 0.30632511 206 nips-2013-Near-Optimal Entrywise Sampling for Data Matrices
Author: Dimitris Achlioptas, Zohar S. Karnin, Edo Liberty
Abstract: We consider the problem of selecting non-zero entries of a matrix A in order to produce a sparse sketch of it, B, that minimizes A B 2 . For large m n matrices, such that n m (for example, representing n observations over m attributes) we give sampling distributions that exhibit four important properties. First, they have closed forms computable from minimal information regarding A. Second, they allow sketching of matrices whose non-zeros are presented to the algorithm in arbitrary order as a stream, with O 1 computation per non-zero. Third, the resulting sketch matrices are not only sparse, but their non-zero entries are highly compressible. Lastly, and most importantly, under mild assumptions, our distributions are provably competitive with the optimal offline distribution. Note that the probabilities in the optimal offline distribution may be complex functions of all the entries in the matrix. Therefore, regardless of computational complexity, the optimal distribution might be impossible to compute in the streaming model. 1
3 0.23697039 179 nips-2013-Low-Rank Matrix and Tensor Completion via Adaptive Sampling
Author: Akshay Krishnamurthy, Aarti Singh
Abstract: We study low rank matrix and tensor completion and propose novel algorithms that employ adaptive sampling schemes to obtain strong performance guarantees. Our algorithms exploit adaptivity to identify entries that are highly informative for learning the column space of the matrix (tensor) and consequently, our results hold even when the row space is highly coherent, in contrast with previous analyses. In the absence of noise, we show that one can exactly recover a n ⇥ n matrix of rank r from merely ⌦(nr3/2 log(r)) matrix entries. We also show that one can recover an order T tensor using ⌦(nrT 1/2 T 2 log(r)) entries. For noisy recovery, our algorithm consistently estimates a low rank matrix corrupted with noise using ⌦(nr3/2 polylog(n)) entries. We complement our study with simulations that verify our theory and demonstrate the scalability of our algorithms. 1
4 0.21149209 11 nips-2013-A New Convex Relaxation for Tensor Completion
Author: Bernardino Romera-Paredes, Massimiliano Pontil
Abstract: We study the problem of learning a tensor from a set of linear measurements. A prominent methodology for this problem is based on a generalization of trace norm regularization, which has been used extensively for learning low rank matrices, to the tensor setting. In this paper, we highlight some limitations of this approach and propose an alternative convex relaxation on the Euclidean ball. We then describe a technique to solve the associated regularization problem, which builds upon the alternating direction method of multipliers. Experiments on one synthetic dataset and two real datasets indicate that the proposed method improves significantly over tensor trace norm regularization in terms of estimation error, while remaining computationally tractable. 1
5 0.1984849 222 nips-2013-On the Linear Convergence of the Proximal Gradient Method for Trace Norm Regularization
Author: Ke Hou, Zirui Zhou, Anthony Man-Cho So, Zhi-Quan Luo
Abstract: Motivated by various applications in machine learning, the problem of minimizing a convex smooth loss function with trace norm regularization has received much attention lately. Currently, a popular method for solving such problem is the proximal gradient method (PGM), which is known to have a sublinear rate of convergence. In this paper, we show that for a large class of loss functions, the convergence rate of the PGM is in fact linear. Our result is established without any strong convexity assumption on the loss function. A key ingredient in our proof is a new Lipschitzian error bound for the aforementioned trace norm–regularized problem, which may be of independent interest.
6 0.16728061 307 nips-2013-Speedup Matrix Completion with Side Information: Application to Multi-Label Learning
7 0.16628601 108 nips-2013-Error-Minimizing Estimates and Universal Entry-Wise Error Bounds for Low-Rank Matrix Completion
8 0.13966045 254 nips-2013-Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms
9 0.13686399 113 nips-2013-Exact and Stable Recovery of Pairwise Interaction Tensors
10 0.13371667 91 nips-2013-Dirty Statistical Models
11 0.11429074 296 nips-2013-Sinkhorn Distances: Lightspeed Computation of Optimal Transport
12 0.10298202 285 nips-2013-Robust Transfer Principal Component Analysis with Rank Constraints
13 0.10222603 74 nips-2013-Convex Tensor Decomposition via Structured Schatten Norm Regularization
14 0.097498231 286 nips-2013-Robust learning of low-dimensional dynamics from large neural ensembles
15 0.096859679 116 nips-2013-Fantope Projection and Selection: A near-optimal convex relaxation of sparse PCA
16 0.096359812 155 nips-2013-Learning Hidden Markov Models from Non-sequence Data via Tensor Decomposition
17 0.094276935 306 nips-2013-Speeding up Permutation Testing in Neuroimaging
18 0.089951277 68 nips-2013-Confidence Intervals and Hypothesis Testing for High-Dimensional Statistical Models
19 0.087577716 228 nips-2013-Online Learning of Dynamic Parameters in Social Networks
20 0.081618465 188 nips-2013-Memory Limited, Streaming PCA
topicId topicWeight
[(0, 0.215), (1, 0.101), (2, 0.167), (3, 0.23), (4, -0.007), (5, -0.1), (6, -0.042), (7, -0.029), (8, -0.054), (9, 0.025), (10, 0.061), (11, 0.022), (12, 0.06), (13, 0.009), (14, -0.04), (15, -0.022), (16, -0.1), (17, 0.019), (18, -0.005), (19, 0.016), (20, -0.041), (21, -0.192), (22, 0.004), (23, -0.097), (24, -0.053), (25, -0.037), (26, -0.004), (27, -0.331), (28, -0.011), (29, -0.039), (30, -0.092), (31, -0.107), (32, -0.107), (33, -0.098), (34, 0.017), (35, 0.013), (36, 0.055), (37, 0.052), (38, -0.041), (39, 0.021), (40, -0.008), (41, -0.008), (42, -0.011), (43, -0.033), (44, 0.039), (45, -0.091), (46, -0.0), (47, -0.012), (48, 0.051), (49, -0.009)]
simIndex simValue paperId paperTitle
same-paper 1 0.97073013 185 nips-2013-Matrix Completion From any Given Set of Observations
Author: Troy Lee, Adi Shraibman
Abstract: In the matrix completion problem the aim is to recover an unknown real matrix from a subset of its entries. This problem comes up in many application areas, and has received a great deal of attention in the context of the netflix prize. A central approach to this problem is to output a matrix of lowest possible complexity (e.g. rank or trace norm) that agrees with the partially specified matrix. The performance of this approach under the assumption that the revealed entries are sampled randomly has received considerable attention (e.g. [1, 2, 3, 4, 5, 6, 7, 8]). In practice, often the set of revealed entries is not chosen at random and these results do not apply. We are therefore left with no guarantees on the performance of the algorithm we are using. We present a means to obtain performance guarantees with respect to any set of initial observations. The first step remains the same: find a matrix of lowest possible complexity that agrees with the partially specified matrix. We give a new way to interpret the output of this algorithm by next finding a probability distribution over the non-revealed entries with respect to which a bound on the generalization error can be proven. The more complex the set of revealed entries according to a certain measure, the better the bound on the generalization error. 1
2 0.88847023 206 nips-2013-Near-Optimal Entrywise Sampling for Data Matrices
Author: Dimitris Achlioptas, Zohar S. Karnin, Edo Liberty
Abstract: We consider the problem of selecting non-zero entries of a matrix A in order to produce a sparse sketch of it, B, that minimizes A B 2 . For large m n matrices, such that n m (for example, representing n observations over m attributes) we give sampling distributions that exhibit four important properties. First, they have closed forms computable from minimal information regarding A. Second, they allow sketching of matrices whose non-zeros are presented to the algorithm in arbitrary order as a stream, with O 1 computation per non-zero. Third, the resulting sketch matrices are not only sparse, but their non-zero entries are highly compressible. Lastly, and most importantly, under mild assumptions, our distributions are provably competitive with the optimal offline distribution. Note that the probabilities in the optimal offline distribution may be complex functions of all the entries in the matrix. Therefore, regardless of computational complexity, the optimal distribution might be impossible to compute in the streaming model. 1
3 0.87481391 307 nips-2013-Speedup Matrix Completion with Side Information: Application to Multi-Label Learning
Author: Miao Xu, Rong Jin, Zhi-Hua Zhou
Abstract: In standard matrix completion theory, it is required to have at least O(n ln2 n) observed entries to perfectly recover a low-rank matrix M of size n × n, leading to a large number of observations when n is large. In many real tasks, side information in addition to the observed entries is often available. In this work, we develop a novel theory of matrix completion that explicitly explore the side information to reduce the requirement on the number of observed entries. We show that, under appropriate conditions, with the assistance of side information matrices, the number of observed entries needed for a perfect recovery of matrix M can be dramatically reduced to O(ln n). We demonstrate the effectiveness of the proposed approach for matrix completion in transductive incomplete multi-label learning. 1
4 0.85347486 108 nips-2013-Error-Minimizing Estimates and Universal Entry-Wise Error Bounds for Low-Rank Matrix Completion
Author: Franz Kiraly, Louis Theran
Abstract: We propose a general framework for reconstructing and denoising single entries of incomplete and noisy entries. We describe: effective algorithms for deciding if and entry can be reconstructed and, if so, for reconstructing and denoising it; and a priori bounds on the error of each entry, individually. In the noiseless case our algorithm is exact. For rank-one matrices, the new algorithm is fast, admits a highly-parallel implementation, and produces an error minimizing estimate that is qualitatively close to our theoretical and the state-of-the-art Nuclear Norm and OptSpace methods. 1
5 0.81271523 254 nips-2013-Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms
Author: Adrien Todeschini, François Caron, Marie Chavent
Abstract: We propose a novel class of algorithms for low rank matrix completion. Our approach builds on novel penalty functions on the singular values of the low rank matrix. By exploiting a mixture model representation of this penalty, we show that a suitably chosen set of latent variables enables to derive an ExpectationMaximization algorithm to obtain a Maximum A Posteriori estimate of the completed low rank matrix. The resulting algorithm is an iterative soft-thresholded algorithm which iteratively adapts the shrinkage coefficients associated to the singular values. The algorithm is simple to implement and can scale to large matrices. We provide numerical comparisons between our approach and recent alternatives showing the interest of the proposed approach for low rank matrix completion. 1
6 0.77573335 179 nips-2013-Low-Rank Matrix and Tensor Completion via Adaptive Sampling
7 0.77093542 342 nips-2013-Unsupervised Spectral Learning of Finite State Transducers
8 0.70219707 296 nips-2013-Sinkhorn Distances: Lightspeed Computation of Optimal Transport
9 0.68084294 222 nips-2013-On the Linear Convergence of the Proximal Gradient Method for Trace Norm Regularization
10 0.61350787 352 nips-2013-What do row and column marginals reveal about your dataset?
11 0.60988927 186 nips-2013-Matrix factorization with binary components
12 0.5968864 113 nips-2013-Exact and Stable Recovery of Pairwise Interaction Tensors
13 0.54248363 73 nips-2013-Convex Relaxations for Permutation Problems
14 0.53992385 297 nips-2013-Sketching Structured Matrices for Faster Nonlinear Regression
15 0.53840572 91 nips-2013-Dirty Statistical Models
16 0.52731878 11 nips-2013-A New Convex Relaxation for Tensor Completion
17 0.52135038 214 nips-2013-On Algorithms for Sparse Multi-factor NMF
18 0.5124836 306 nips-2013-Speeding up Permutation Testing in Neuroimaging
19 0.49925652 259 nips-2013-Provable Subspace Clustering: When LRR meets SSC
20 0.49015683 247 nips-2013-Phase Retrieval using Alternating Minimization
topicId topicWeight
[(16, 0.01), (33, 0.201), (34, 0.058), (41, 0.011), (49, 0.019), (56, 0.182), (85, 0.052), (89, 0.028), (93, 0.02), (95, 0.34)]
simIndex simValue paperId paperTitle
1 0.913369 220 nips-2013-On the Complexity and Approximation of Binary Evidence in Lifted Inference
Author: Guy van den Broeck, Adnan Darwiche
Abstract: Lifted inference algorithms exploit symmetries in probabilistic models to speed up inference. They show impressive performance when calculating unconditional probabilities in relational models, but often resort to non-lifted inference when computing conditional probabilities. The reason is that conditioning on evidence breaks many of the model’s symmetries, which can preempt standard lifting techniques. Recent theoretical results show, for example, that conditioning on evidence which corresponds to binary relations is #P-hard, suggesting that no lifting is to be expected in the worst case. In this paper, we balance this negative result by identifying the Boolean rank of the evidence as a key parameter for characterizing the complexity of conditioning in lifted inference. In particular, we show that conditioning on binary evidence with bounded Boolean rank is efficient. This opens up the possibility of approximating evidence by a low-rank Boolean matrix factorization, which we investigate both theoretically and empirically. 1
2 0.89200884 309 nips-2013-Statistical Active Learning Algorithms
Author: Maria-Florina Balcan, Vitaly Feldman
Abstract: We describe a framework for designing efficient active learning algorithms that are tolerant to random classification noise and differentially-private. The framework is based on active learning algorithms that are statistical in the sense that they rely on estimates of expectations of functions of filtered random examples. It builds on the powerful statistical query framework of Kearns [30]. We show that any efficient active statistical learning algorithm can be automatically converted to an efficient active learning algorithm which is tolerant to random classification noise as well as other forms of “uncorrelated” noise. We show that commonly studied concept classes including thresholds, rectangles, and linear separators can be efficiently actively learned in our framework. These results combined with our generic conversion lead to the first computationally-efficient algorithms for actively learning some of these concept classes in the presence of random classification noise that provide exponential improvement in the dependence on the error over their passive counterparts. In addition, we show that our algorithms can be automatically converted to efficient active differentially-private algorithms. This leads to the first differentially-private active learning algorithms with exponential label savings over the passive case. 1
3 0.88497245 268 nips-2013-Reflection methods for user-friendly submodular optimization
Author: Stefanie Jegelka, Francis Bach, Suvrit Sra
Abstract: Recently, it has become evident that submodularity naturally captures widely occurring concepts in machine learning, signal processing and computer vision. Consequently, there is need for efficient optimization procedures for submodular functions, especially for minimization problems. While general submodular minimization is challenging, we propose a new method that exploits existing decomposability of submodular functions. In contrast to previous approaches, our method is neither approximate, nor impractical, nor does it need any cumbersome parameter tuning. Moreover, it is easy to implement and parallelize. A key component of our method is a formulation of the discrete submodular minimization problem as a continuous best approximation problem that is solved through a sequence of reflections, and its solution can be easily thresholded to obtain an optimal discrete solution. This method solves both the continuous and discrete formulations of the problem, and therefore has applications in learning, inference, and reconstruction. In our experiments, we illustrate the benefits of our method on two image segmentation tasks. 1
4 0.85752499 271 nips-2013-Regularized M-estimators with nonconvexity: Statistical and algorithmic theory for local optima
Author: Po-Ling Loh, Martin J. Wainwright
Abstract: We establish theoretical results concerning local optima of regularized M estimators, where both loss and penalty functions are allowed to be nonconvex. Our results show that as long as the loss satisfies restricted strong convexity and the penalty satisfies suitable regularity conditions, any local optimum of the composite objective lies within statistical precision of the true parameter vector. Our theory covers a broad class of nonconvex objective functions, including corrected versions of the Lasso for errors-in-variables linear models and regression in generalized linear models using nonconvex regularizers such as SCAD and MCP. On the optimization side, we show that a simple adaptation of composite gradient descent may be used to compute a global optimum up to the statistical precision stat in log(1/ stat ) iterations, the fastest possible rate for any first-order method. We provide simulations to illustrate the sharpness of our theoretical predictions. 1
5 0.82966161 154 nips-2013-Learning Gaussian Graphical Models with Observed or Latent FVSs
Author: Ying Liu, Alan Willsky
Abstract: Gaussian Graphical Models (GGMs) or Gauss Markov random fields are widely used in many applications, and the trade-off between the modeling capacity and the efficiency of learning and inference has been an important research problem. In this paper, we study the family of GGMs with small feedback vertex sets (FVSs), where an FVS is a set of nodes whose removal breaks all the cycles. Exact inference such as computing the marginal distributions and the partition function has complexity O(k 2 n) using message-passing algorithms, where k is the size of the FVS, and n is the total number of nodes. We propose efficient structure learning algorithms for two cases: 1) All nodes are observed, which is useful in modeling social or flight networks where the FVS nodes often correspond to a small number of highly influential nodes, or hubs, while the rest of the networks is modeled by a tree. Regardless of the maximum degree, without knowing the full graph structure, we can exactly compute the maximum likelihood estimate with complexity O(kn2 + n2 log n) if the FVS is known or in polynomial time if the FVS is unknown but has bounded size. 2) The FVS nodes are latent variables, where structure learning is equivalent to decomposing an inverse covariance matrix (exactly or approximately) into the sum of a tree-structured matrix and a low-rank matrix. By incorporating efficient inference into the learning steps, we can obtain a learning algorithm using alternating low-rank corrections with complexity O(kn2 + n2 log n) per iteration. We perform experiments using both synthetic data as well as real data of flight delays to demonstrate the modeling capacity with FVSs of various sizes. 1
same-paper 6 0.82639611 185 nips-2013-Matrix Completion From any Given Set of Observations
7 0.76738024 42 nips-2013-Auditing: Active Learning with Outcome-Dependent Query Costs
8 0.73424608 78 nips-2013-Curvature and Optimal Algorithms for Learning and Minimizing Submodular Functions
9 0.72485149 102 nips-2013-Efficient Algorithm for Privately Releasing Smooth Queries
10 0.7235297 149 nips-2013-Latent Structured Active Learning
11 0.7161696 359 nips-2013-Σ-Optimality for Active Learning on Gaussian Random Fields
12 0.70384878 207 nips-2013-Near-optimal Anomaly Detection in Graphs using Lovasz Extended Scan Statistic
13 0.69660729 184 nips-2013-Marginals-to-Models Reducibility
14 0.69173276 33 nips-2013-An Approximate, Efficient LP Solver for LP Rounding
15 0.69127607 132 nips-2013-Global MAP-Optimality by Shrinking the Combinatorial Search Area with Convex Relaxation
16 0.69035107 142 nips-2013-Information-theoretic lower bounds for distributed statistical estimation with communication constraints
17 0.68968046 60 nips-2013-Buy-in-Bulk Active Learning
18 0.689493 28 nips-2013-Adaptive Step-Size for Policy Gradient Methods
19 0.68935239 319 nips-2013-Submodular Optimization with Submodular Cover and Submodular Knapsack Constraints
20 0.68926555 57 nips-2013-Beyond Pairwise: Provably Fast Algorithms for Approximate $k$-Way Similarity Search