nips nips2010 nips2010-110 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Prateek Jain, Raghu Meka, Inderjit S. Dhillon
Abstract: Minimizing the rank of a matrix subject to affine constraints is a fundamental problem with many important applications in machine learning and statistics. In this paper we propose a simple and fast algorithm SVP (Singular Value Projection) for rank minimization under affine constraints (ARMP) and show that SVP recovers the minimum rank solution for affine constraints that satisfy a restricted isometry property (RIP). Our method guarantees geometric convergence rate even in the presence of noise and requires strictly weaker assumptions on the RIP constants than the existing methods. We also introduce a Newton-step for our SVP framework to speed-up the convergence with substantial empirical gains. Next, we address a practically important application of ARMP - the problem of lowrank matrix completion, for which the defining affine constraints do not directly obey RIP, hence the guarantees of SVP do not hold. However, we provide partial progress towards a proof of exact recovery for our algorithm by showing a more restricted isometry property and observe empirically that our algorithm recovers low-rank incoherent matrices from an almost optimal number of uniformly sampled entries. We also demonstrate empirically that our algorithms outperform existing methods, such as those of [5, 18, 14], for ARMP and the matrix completion problem by an order of magnitude and are also more robust to noise and sampling schemes. In particular, results show that our SVP-Newton method is significantly robust to noise and performs impressively on a more realistic power-law sampling scheme for the matrix completion problem. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract Minimizing the rank of a matrix subject to affine constraints is a fundamental problem with many important applications in machine learning and statistics. [sent-8, score-0.289]
2 In this paper we propose a simple and fast algorithm SVP (Singular Value Projection) for rank minimization under affine constraints (ARMP) and show that SVP recovers the minimum rank solution for affine constraints that satisfy a restricted isometry property (RIP). [sent-9, score-0.703]
3 Next, we address a practically important application of ARMP - the problem of lowrank matrix completion, for which the defining affine constraints do not directly obey RIP, hence the guarantees of SVP do not hold. [sent-12, score-0.116]
4 However, we provide partial progress towards a proof of exact recovery for our algorithm by showing a more restricted isometry property and observe empirically that our algorithm recovers low-rank incoherent matrices from an almost optimal number of uniformly sampled entries. [sent-13, score-0.391]
5 We also demonstrate empirically that our algorithms outperform existing methods, such as those of [5, 18, 14], for ARMP and the matrix completion problem by an order of magnitude and are also more robust to noise and sampling schemes. [sent-14, score-0.269]
6 In particular, results show that our SVP-Newton method is significantly robust to noise and performs impressively on a more realistic power-law sampling scheme for the matrix completion problem. [sent-15, score-0.269]
7 1 Introduction In this paper we study the general affine rank minimization problem (ARMP), min rank(X) s. [sent-16, score-0.231]
8 d The affine rank minimization problem above is of considerable practical interest and many important machine learning problems such as matrix completion, low-dimensional metric embedding, lowrank kernel learning can be viewed as instances of the above problem. [sent-18, score-0.303]
9 [24] gave the first nontrivial results for the 1 problem obtaining guaranteed rank minimization for affine transformations A that satisfy a restricted isometry property (RIP). [sent-22, score-0.403]
10 Define the isometry constant of A, δk to be the smallest number such that for all X ∈ Rm×n of rank at most k, (1 − δk )∥X∥2 ≤ ∥A(X)∥2 ≤ (1 + δk )∥X∥2 . [sent-23, score-0.321]
11 show that for ARMP with isometry constant δ5k < 1/10, the minimum rank solution can be recovered by the minimum trace-norm solution. [sent-29, score-0.359]
12 We present a simple analysis showing that SVP recovers the minimum rank solution for noisy affine constraints that satisfy RIP and prove the following guarantees. [sent-31, score-0.312]
13 They also require stronger isometry √ assumptions, δ3k < 1/ 30, than our analysis. [sent-34, score-0.13]
14 1 Suppose the isometry constant of A satisfies δ2k < 1/3 and let b = A(X ∗ ) for a rank-k matrix X ∗ . [sent-36, score-0.202]
15 Furthermore, SVP outputs a matrix X of rank at most k such that ∥A(X) − b∥2 ≤ ϵ and ∥X − 2 ⌈ ⌉ X ∗ ∥2 ≤ ϵ/(1 − δ2k ) in at most F 1 log((1−δ2k )/2δ2k ) log ∥b∥2 2ϵ iterations. [sent-38, score-0.282]
16 2 (Main) Suppose the isometry constant of A satisfies δ2k < 1/3 and let b = A(X ∗ )+e for a rank k matrix X ∗ and an error vector e ∈ Rd . [sent-40, score-0.393]
17 Then, SVP with step-size ηt = 1/(1 + δ2k ) outputs a matrix X of rank at most k such that ∥A(X) − b∥2 ≤ C∥e∥2 + ϵ and ∥X − X ∗ ∥2 ≤ 2 F ⌈ ⌉ C∥e∥2 +ϵ 1−δ2k , ϵ ≥ 0, in at most 1 log(1/D) ∥b∥ log 2(C∥e∥2 +ϵ) iterations for universal constants C, D. [sent-41, score-0.297]
18 We next consider an important application of ARMP: the low-rank matrix completion problem (MCP)— given a small number of entries from an unknown low-rank matrix, the task is to complete the missing entries. [sent-48, score-0.268]
19 [14] gave the first theoretical guarantees for the problem obtaining exact recovery from an almost optimal number of uniformly sampled entries. [sent-51, score-0.094]
20 While RIP does not hold for MCP, we show that a similar property holds for incoherent matrices [6]. [sent-52, score-0.126]
21 Given our refined RIP and a hypothesis bounding the incoherence of the iterates arising in SVP, an analysis similar to that of Theorem 1. [sent-53, score-0.147]
22 We provide strong empirical evidence for our hypothesis and show that that both of our algorithms recover a low-rank matrix from an almost optimal number of uniformly sampled entries. [sent-55, score-0.121]
23 In summary, our main contributions are: • Motivated by [11], we propose a projected gradient based algorithm, SVP, for ARMP and show that our method recovers the optimal rank solution when the affine constraints satisfy RIP. [sent-56, score-0.312]
24 To the best of our knowledge, our isometry constant requirements are √ stringent: we only require least δ2k < 1/3 as opposed to δ5k < 1/10 by Recht et al. [sent-57, score-0.148]
25 For instance, on the Movie-lens dataset [1] and rank k = 3, SVP-Newton achieves an RMSE of 0. [sent-62, score-0.191]
26 • As observed in [23], most trace-norm based methods perform poorly for matrix completion when entries are sampled from more realistic power-law distributions. [sent-65, score-0.295]
27 We then specialize our algorithm for the problem of matrix completion and prove a more restricted isometry property for the same. [sent-70, score-0.413]
28 However, the Euclidean projection onto C(k) can be computed efficiently using singular value decomposition (SVD). [sent-76, score-0.105]
29 In SVP, a candidate solution to ARMP is computed iteratively by starting from the all-zero matrix and adapting the classical projected gradient descent update as follows (note that ∇ψ(X) = AT (A(X) − b)): ( ) ( ) X t+1 ← Pk X t − ηt ∇ψ(X t ) = Pk X t − ηt AT (A(X t ) − b) . [sent-81, score-0.108]
30 Note that the iterates X are always low-rank, facilitating faster computation of the SVD. [sent-83, score-0.077]
31 Then, ψ(X t+1 ) ≤ ψ(X ∗ ) + (1−δ2k ) ∥A(X ∗ − X t )∥2 , where δ2k is the rank 2k 2 isometry constant of A. [sent-95, score-0.321]
32 Further, using RIP for the rank at most 2k matrix X τ − X ∗ we log((1−δ2k )/2δ2k ) log ϵ get: ∥X τ − X ∗ ∥ ≤ ψ(X τ )/(1 − δ2k ) ≤ ⌈ ϵ/(1 − δ2k ). [sent-107, score-0.282]
33 2 Matrix Completion We first describe the low-rank matrix completion problem formally. [sent-113, score-0.245]
34 Then, the low-rank matrix completion problem (MCP) can be formulated as follows, min rank(X) s. [sent-116, score-0.245]
35 X t+1 ← Pk X t − (2) (1 + δ)p Although matrix completion is a special case of ARMP, the affine constraints that define MCP, PΩ , do not satisfy RIP in general. [sent-121, score-0.292]
36 However, we show that the matrix completion affine constraints satisfy RIP for low-rank incoherent matrices. [sent-126, score-0.383]
37 1 (Incoherence) A matrix X ∈ Rm×n with singular value decomposition X = √ √ µ µ U ΣV T is µ-incoherent if maxi,j |Uij | ≤ √m , maxi,j |Vij | ≤ √n . [sent-128, score-0.152]
38 Hence, a random sampling of the matrix should provide enough global information to satisfy RIP. [sent-133, score-0.117]
39 Using the above definition, we prove the following refined restricted isometry property. [sent-134, score-0.168]
40 (3) F F F Roughly, our proof combines a Chernoff bound estimate for ∥PΩ (X)∥2 with a union bound over F low-rank incoherent matrices. [sent-137, score-0.091]
41 1 can be used to show that SVP achieves exact recovery for low-rank incoherent matrices from uniformly sampled entries. [sent-142, score-0.202]
42 As supported by empirical evidence, we hypothesize that the iterates X t arising in SVP remain incoherent when the underlying matrix X ∗ is incoherent. [sent-143, score-0.24]
43 √ t Figure 1 (d) plots the maximum incoherence maxt µ(X t ) = n maxt,i,j |Uij |, where U t are the t left singular vectors of the intermediate iterates X computed by SVP. [sent-144, score-0.239]
44 The figure clearly shows that the incoherence µ(X t ) of the iterates is bounded by a constant independent of the matrix size n and density p throughout the execution of SVP. [sent-145, score-0.221]
45 Figure 2 (c) plots the threshold sampling density p beyond which matrix completion for randomly generated matrices is solved exactly by SVP for fixed k and varying matrix sizes n. [sent-146, score-0.415]
46 2 and supported by empirical evidence (Figures 2 (c), (d)) we hypothesize that SVP achieves exact recovery from an almost optimal number of samples for incoherent matrices. [sent-149, score-0.118]
47 Then, there exists a constant C such that for a µincoherent matrix X ∗ of rank at most k and Ω sampled from the Bernoulli model with density p = Ωµ,k ((log n)/m), SVP with step-size ηt = 1/(1 + δ)p converges to X ∗ with high probability. [sent-152, score-0.314]
48 Moreover, SVP outputs a matrix X of rank at most k such that ∥PΩ (X) − PΩ (X ∗ )∥2 ≤ ϵ after F (⌈ ( )⌉) Oµ,k log 1 iterations. [sent-153, score-0.282]
49 1 RIP for Matrix Completion on Incoherent Matrices We now prove the restricted isometry property of Theorem 2. [sent-156, score-0.168]
50 4 Let X ∈ Rm×n be a µ-incoherent matrix of rank at most k. [sent-168, score-0.263]
51 F F F 3 α2 √ While the above lemma shows Equation (3) for a fixed rank k, µ-incoherent X (i. [sent-173, score-0.225]
52 4), we need to show Equation (3) for all such rank k incoherent matrices. [sent-176, score-0.282]
53 To handle this problem, we discretize the space of low-rank incoherent matrices so as to be able to use the above lemma and a union bound. [sent-177, score-0.16]
54 We now show the existence of a small set of matrices S(µ, ϵ) ⊆ Rm×n such that every low-rank µ-incoherent matrix is close to an appropriately regular matrix from the set S(µ, ϵ). [sent-178, score-0.179]
55 For any µ-incoherent X ∈ Rm×n √ of rank k with ∥X∥2 = 1, there exists Y ∈ S(µ, ϵ) s. [sent-181, score-0.191]
56 5 and union bound, for any Y ∈ S ′ (µ, ϵ), ( 2 ) ( 2 ) [ ] −δ pmn −δ pmn 2 2 2 3(m+n)k Pr ∥PΩ (Y )∥F − p∥Y ∥F ≥ δp∥Y ∥F ≤ 2(mnk/ϵ) exp ≤ exp(C1 nk log n)·exp , 16µ2 k 16µ2 k where C1 ≥ 0 is a constant independent of m, n, k. [sent-194, score-0.085]
57 (4) F F F As the statement of the theorem is invariant under scaling, it is enough to show the statement for all µ-incoherent matrices X of rank at most k and ∥X∥2 = 1. [sent-196, score-0.25]
58 Recall that each iteration of SVP (Equation (1)) takes a step along the gradient of the objective function and then projects the iterate to the set of low rank matrices using SVD. [sent-206, score-0.242]
59 This makes SVP-Newton computationally expensive for problems with large rank, particularly for situations with a large number of constraints as is the case for matrix completion. [sent-220, score-0.098]
60 3 Related Work and Computational Issues The general rank minimization problem with affine constraints is NP-hard and is also NP-hard to approximate [22]. [sent-225, score-0.257]
61 Most methods for ARMP either relax the rank constraint to a convex function such as the trace-norm [8], [9], or assume a factorization and optimize the resulting non-convex problem by alternating minimization [4, 3, 15]. [sent-226, score-0.231]
62 [24] were later extended to noisy measurements and isometry constants √ up to δ3k < 1/4 3 by Fazel et al. [sent-228, score-0.148]
63 Recently, Lee and Bresler [17] proposed an algorithm (ADMiRA) motivated by the orthogonal matching pursuit line of work in compressed sensing and show that for affine constraints with isometry constant δ4k ≤ 0. [sent-231, score-0.172]
64 However, their method is not very efficient for large datasets and when the rank of the optimal solution is relatively large. [sent-233, score-0.191]
65 Candes and Recht [6], Candes and Tao [7] show that if X ∗ is µ-incoherent and the known entries are sampled uniformly at random with |Ω| ≥ C(µ) k 2 n log2 n, finding the minimum trace-norm solution recovers the minimum rank solution. [sent-236, score-0.339]
66 al obtained similar results independently for exact recovery from uniformly sampled Ω with |Ω| ≥ C(µ, k) n log n. [sent-238, score-0.095]
67 Minimizing the trace-norm of a matrix subject to affine constraints can be cast as a semi-definite program (SDP). [sent-239, score-0.098]
68 Also, noisy measurements pose considerable computational challenges for trace-norm optimization as the rank of the intermediate iterates can become very large (see Figure 3(b)). [sent-247, score-0.262]
69 5 1000 5000 2000 3000 4000 n (Size of the Matrix) 5000 (a) (b) (c) (d) Figure 1: (a) Time taken by SVP and SVT for random instances of the Affine Rank Minimization Problem (ARMP) with optimal rank k = 5. [sent-266, score-0.191]
70 (c) Empirical estimates of the sampling density threshold required for exact matrix completion by SVP (here C = 1. [sent-268, score-0.293]
71 (d) Maximum incoherence maxt µ(X t ) over the iterates of SVP for varying densities p and sizes n. [sent-271, score-0.143]
72 (c) Running time (on log scale) of various methods for matrix completion with sampling density p = . [sent-276, score-0.331]
73 4 Experimental Results In this section, we empirically evaluate our methods for the affine rank minimization problem and low-rank matrix completion. [sent-283, score-0.303]
74 For ARMP we compare our method against the trace-norm based singular value thresholding (SVT) method [5]. [sent-285, score-0.081]
75 We use our own implementation of SVT for ARMP and ALS, while for matrix completion we use the code provided by the respective authors for SVT, ADMiRA and OPT. [sent-290, score-0.245]
76 We generate random matrices X ∈ Rn×n of different sizes n and fixed rank k = 5. [sent-295, score-0.226]
77 Next we evaluate our method for the problem of matrix reconstruction from random measurements. [sent-298, score-0.091]
78 The MIT logo we use is a 38 × 73 image and has rank four. [sent-301, score-0.224]
79 Matrix Completion: Synthetic Datasets (Uniform Sampling) We now evaluate our method against various matrix completion methods for random low-rank ma7 ALS 0. [sent-305, score-0.264]
80 5 0 ICMC ALS SVT SVP SVP NewtonD 3 RMSE 3 k 2 3 5 7 10 12 500 1000 1500 n (Size of Matrix) 2000 0 500 1000 1500 n (Size of Matrix) (a) (b) (c) (d) Figure 3: (a): RMSE incurred by various methods for matrix completion with different rank (k) solutions on Movie-Lens Dataset. [sent-332, score-0.472]
81 (b): Time(on log scale) required by various methods for matrix completion with p = . [sent-333, score-0.283]
82 (c): RMSE incurred by various methods for matrix completion with p = 0. [sent-336, score-0.281]
83 We generate a random rank k matrix X ∈ Rn×n and generate random Bernoulli samples with probability p. [sent-340, score-0.263]
84 Figure 2 (a) compares the time required by various methods (in log-scale) to obtain a root mean square error (RMSE) of 10−3 on the sampled entries for fixed k = 2. [sent-341, score-0.085]
85 Figure 2 (c) compares the time required by various methods to obtain a root mean square error (RMSE) of 10−3 on the sampled entries for fixed n = 1000 and increasing k. [sent-346, score-0.085]
86 One of the reason for this is that due to noise, the rank of the intermediate iterates arising in SVT can be fairly large. [sent-355, score-0.284]
87 As before, we generate a random rank-k = 10 matrix X ∈ Rn×n and sample the entries of X using a graph generated using Chung-Lu-Vu model with power-law distributed degrees (see [23]) for details. [sent-357, score-0.095]
88 Figure 3 (c) plots the RMSE obtained by various methods for varying n and fixed sampling density p = 0. [sent-358, score-0.082]
89 Since, rank cannot be fixed in SVT, we try various values for the parameter τ to obtain the desired rank solution. [sent-368, score-0.401]
90 89 but required rank 17 solution and was significantly slower in convergence because many intermediate iterates had large rank (up to around 150). [sent-374, score-0.453]
91 We attribute the relatively poor performance of SVP and SVT as compared with ALS and SVP-Newton to the fact that the ratings matrix is not sampled uniformly, thus violating the crucial assumption of uniformly distributed samples. [sent-375, score-0.136]
92 A singular value thresholding algorithm for e matrix completion. [sent-400, score-0.153]
93 A rank minimization heuristic with application to minimum order system approximation. [sent-416, score-0.25]
94 Log-det heuristic for matrix rank minimization with applications to Hankel and Euclidean distance matrices. [sent-422, score-0.303]
95 Compressed sensing and robust recovery of low rank matrices. [sent-429, score-0.234]
96 Gradient descent with sparsification: an iterative algorithm for sparse recovery with restricted isometry property. [sent-437, score-0.178]
97 Convergence of fixed point continuation algorithms for matrix rank minimization, 2009. [sent-440, score-0.263]
98 Guaranteed minimum rank approximation from linear observations by nuclear norm minimization with an ellipsoidal constraint, 2009. [sent-468, score-0.269]
99 Fixed point and bregman iterative methods for matrix rank minimization. [sent-478, score-0.263]
100 Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization, 2007. [sent-497, score-0.091]
wordName wordTfidf (topN-words)
[('svp', 0.75), ('svt', 0.31), ('armp', 0.243), ('rank', 0.191), ('completion', 0.173), ('rip', 0.169), ('isometry', 0.13), ('rmse', 0.121), ('af', 0.112), ('als', 0.107), ('incoherent', 0.091), ('mcp', 0.088), ('recht', 0.087), ('newtond', 0.077), ('matrix', 0.072), ('incoherence', 0.07), ('admira', 0.068), ('singular', 0.065), ('iterates', 0.055), ('rarmp', 0.055), ('rm', 0.046), ('uk', 0.045), ('bresler', 0.044), ('raghu', 0.044), ('svd', 0.042), ('candes', 0.041), ('minimization', 0.04), ('inderjit', 0.039), ('keshavan', 0.039), ('recovers', 0.038), ('vk', 0.036), ('fazel', 0.035), ('matrices', 0.035), ('lemma', 0.034), ('doi', 0.033), ('icmc', 0.033), ('logo', 0.033), ('pmn', 0.033), ('svk', 0.033), ('pk', 0.031), ('opt', 0.029), ('sampled', 0.027), ('recovery', 0.027), ('prateek', 0.027), ('constraints', 0.026), ('meka', 0.025), ('emmanuel', 0.025), ('projection', 0.025), ('squares', 0.025), ('austin', 0.024), ('density', 0.024), ('theorem', 0.024), ('sampling', 0.024), ('url', 0.023), ('entries', 0.023), ('arising', 0.022), ('arpack', 0.022), ('facilitating', 0.022), ('kiryung', 0.022), ('propack', 0.022), ('uniformly', 0.022), ('goldfarb', 0.022), ('restricted', 0.021), ('satisfy', 0.021), ('lee', 0.02), ('projected', 0.02), ('various', 0.019), ('secs', 0.019), ('toh', 0.019), ('log', 0.019), ('cand', 0.019), ('minimum', 0.019), ('reconstruction', 0.019), ('nuclear', 0.019), ('cai', 0.018), ('guarantees', 0.018), ('hindi', 0.018), ('yehuda', 0.018), ('maxt', 0.018), ('et', 0.018), ('lemmas', 0.018), ('bernoulli', 0.018), ('conjecture', 0.017), ('jain', 0.017), ('prove', 0.017), ('bangalore', 0.017), ('incurred', 0.017), ('intermediate', 0.016), ('compares', 0.016), ('gradient', 0.016), ('thresholding', 0.016), ('chernoff', 0.016), ('uij', 0.016), ('sensing', 0.016), ('fix', 0.016), ('decomposition', 0.015), ('appendix', 0.015), ('plots', 0.015), ('ratings', 0.015), ('iterations', 0.015)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 110 nips-2010-Guaranteed Rank Minimization via Singular Value Projection
Author: Prateek Jain, Raghu Meka, Inderjit S. Dhillon
Abstract: Minimizing the rank of a matrix subject to affine constraints is a fundamental problem with many important applications in machine learning and statistics. In this paper we propose a simple and fast algorithm SVP (Singular Value Projection) for rank minimization under affine constraints (ARMP) and show that SVP recovers the minimum rank solution for affine constraints that satisfy a restricted isometry property (RIP). Our method guarantees geometric convergence rate even in the presence of noise and requires strictly weaker assumptions on the RIP constants than the existing methods. We also introduce a Newton-step for our SVP framework to speed-up the convergence with substantial empirical gains. Next, we address a practically important application of ARMP - the problem of lowrank matrix completion, for which the defining affine constraints do not directly obey RIP, hence the guarantees of SVP do not hold. However, we provide partial progress towards a proof of exact recovery for our algorithm by showing a more restricted isometry property and observe empirically that our algorithm recovers low-rank incoherent matrices from an almost optimal number of uniformly sampled entries. We also demonstrate empirically that our algorithms outperform existing methods, such as those of [5, 18, 14], for ARMP and the matrix completion problem by an order of magnitude and are also more robust to noise and sampling schemes. In particular, results show that our SVP-Newton method is significantly robust to noise and performs impressively on a more realistic power-law sampling scheme for the matrix completion problem. 1
2 0.11108918 162 nips-2010-Link Discovery using Graph Feature Tracking
Author: Emile Richard, Nicolas Baskiotis, Theodoros Evgeniou, Nicolas Vayatis
Abstract: We consider the problem of discovering links of an evolving undirected graph given a series of past snapshots of that graph. The graph is observed through the time sequence of its adjacency matrix and only the presence of edges is observed. The absence of an edge on a certain snapshot cannot be distinguished from a missing entry in the adjacency matrix. Additional information can be provided by examining the dynamics of the graph through a set of topological features, such as the degrees of the vertices. We develop a novel methodology by building on both static matrix completion methods and the estimation of the future state of relevant graph features. Our procedure relies on the formulation of an optimization problem which can be approximately solved by a fast alternating linearized algorithm whose properties are examined. We show experiments with both simulated and real data which reveal the interest of our methodology. 1
3 0.10552412 275 nips-2010-Transduction with Matrix Completion: Three Birds with One Stone
Author: Andrew Goldberg, Ben Recht, Junming Xu, Robert Nowak, Xiaojin Zhu
Abstract: We pose transductive classification as a matrix completion problem. By assuming the underlying matrix has a low rank, our formulation is able to handle three problems simultaneously: i) multi-label learning, where each item has more than one label, ii) transduction, where most of these labels are unspecified, and iii) missing data, where a large number of features are missing. We obtained satisfactory results on several real-world tasks, suggesting that the low rank assumption may not be as restrictive as it seems. Our method allows for different loss functions to apply on the feature and label entries of the matrix. The resulting nuclear norm minimization problem is solved with a modified fixed-point continuation method that is guaranteed to find the global optimum. 1
4 0.10498556 195 nips-2010-Online Learning in The Manifold of Low-Rank Matrices
Author: Uri Shalit, Daphna Weinshall, Gal Chechik
Abstract: When learning models that are represented in matrix forms, enforcing a low-rank constraint can dramatically improve the memory and run time complexity, while providing a natural regularization of the model. However, naive approaches for minimizing functions over the set of low-rank matrices are either prohibitively time consuming (repeated singular value decomposition of the matrix) or numerically unstable (optimizing a factored representation of the low rank matrix). We build on recent advances in optimization over manifolds, and describe an iterative online learning procedure, consisting of a gradient step, followed by a second-order retraction back to the manifold. While the ideal retraction is hard to compute, and so is the projection operator that approximates it, we describe another second-order retraction that can be computed efficiently, with run time and memory complexity of O ((n + m)k) for a rank-k matrix of dimension m × n, given rank-one gradients. We use this algorithm, LORETA, to learn a matrixform similarity measure over pairs of documents represented as high dimensional vectors. LORETA improves the mean average precision over a passive- aggressive approach in a factorized model, and also improves over a full model trained over pre-selected features using the same memory requirements. LORETA also showed consistent improvement over standard methods in a large (1600 classes) multi-label image classification task. 1
5 0.1037305 92 nips-2010-Fast global convergence rates of gradient methods for high-dimensional statistical recovery
Author: Alekh Agarwal, Sahand Negahban, Martin J. Wainwright
Abstract: Many statistical M -estimators are based on convex optimization problems formed by the weighted sum of a loss function with a norm-based regularizer. We analyze the convergence rates of first-order gradient methods for solving such problems within a high-dimensional framework that allows the data dimension d to grow with (and possibly exceed) the sample size n. This high-dimensional structure precludes the usual global assumptions— namely, strong convexity and smoothness conditions—that underlie classical optimization analysis. We define appropriately restricted versions of these conditions, and show that they are satisfied with high probability for various statistical models. Under these conditions, our theory guarantees that Nesterov’s first-order method [12] has a globally geometric rate of convergence up to the statistical precision of the model, meaning the typical Euclidean distance between the true unknown parameter θ∗ and the optimal solution θ. This globally linear rate is substantially faster than previous analyses of global convergence for specific methods that yielded only sublinear rates. Our analysis applies to a wide range of M -estimators and statistical models, including sparse linear regression using Lasso ( 1 regularized regression), group Lasso, block sparsity, and low-rank matrix recovery using nuclear norm regularization. Overall, this result reveals an interesting connection between statistical precision and computational efficiency in high-dimensional estimation. 1
6 0.095075846 136 nips-2010-Large-Scale Matrix Factorization with Missing Data under Additional Constraints
7 0.085589655 48 nips-2010-Collaborative Filtering in a Non-Uniform World: Learning with the Weighted Trace Norm
8 0.071981229 231 nips-2010-Robust PCA via Outlier Pursuit
9 0.069395304 210 nips-2010-Practical Large-Scale Optimization for Max-norm Regularization
10 0.057049733 221 nips-2010-Random Projections for $k$-means Clustering
11 0.054001264 265 nips-2010-The LASSO risk: asymptotic results and real world examples
12 0.048463568 124 nips-2010-Inductive Regularized Learning of Kernel Functions
13 0.046174906 31 nips-2010-An analysis on negative curvature induced by singularity in multi-layer neural-network learning
14 0.044919726 230 nips-2010-Robust Clustering as Ensembles of Affinity Relations
15 0.042221289 148 nips-2010-Learning Networks of Stochastic Differential Equations
16 0.037320405 146 nips-2010-Learning Multiple Tasks using Manifold Regularization
17 0.036106039 63 nips-2010-Distributed Dual Averaging In Networks
18 0.034369949 134 nips-2010-LSTD with Random Projections
19 0.034217294 9 nips-2010-A New Probabilistic Model for Rank Aggregation
20 0.033302311 202 nips-2010-Parallelized Stochastic Gradient Descent
topicId topicWeight
[(0, 0.102), (1, 0.018), (2, 0.073), (3, 0.032), (4, 0.036), (5, -0.05), (6, 0.031), (7, 0.017), (8, -0.079), (9, -0.019), (10, 0.05), (11, -0.04), (12, 0.095), (13, 0.049), (14, 0.087), (15, -0.02), (16, 0.009), (17, -0.068), (18, 0.02), (19, 0.146), (20, 0.078), (21, 0.099), (22, -0.029), (23, -0.172), (24, 0.094), (25, -0.021), (26, -0.019), (27, -0.072), (28, 0.003), (29, -0.041), (30, -0.082), (31, -0.017), (32, 0.054), (33, -0.001), (34, -0.006), (35, 0.003), (36, 0.056), (37, 0.03), (38, -0.041), (39, 0.024), (40, 0.034), (41, -0.076), (42, -0.012), (43, 0.15), (44, 0.005), (45, 0.034), (46, -0.025), (47, -0.023), (48, 0.061), (49, -0.074)]
simIndex simValue paperId paperTitle
same-paper 1 0.92908478 110 nips-2010-Guaranteed Rank Minimization via Singular Value Projection
Author: Prateek Jain, Raghu Meka, Inderjit S. Dhillon
Abstract: Minimizing the rank of a matrix subject to affine constraints is a fundamental problem with many important applications in machine learning and statistics. In this paper we propose a simple and fast algorithm SVP (Singular Value Projection) for rank minimization under affine constraints (ARMP) and show that SVP recovers the minimum rank solution for affine constraints that satisfy a restricted isometry property (RIP). Our method guarantees geometric convergence rate even in the presence of noise and requires strictly weaker assumptions on the RIP constants than the existing methods. We also introduce a Newton-step for our SVP framework to speed-up the convergence with substantial empirical gains. Next, we address a practically important application of ARMP - the problem of lowrank matrix completion, for which the defining affine constraints do not directly obey RIP, hence the guarantees of SVP do not hold. However, we provide partial progress towards a proof of exact recovery for our algorithm by showing a more restricted isometry property and observe empirically that our algorithm recovers low-rank incoherent matrices from an almost optimal number of uniformly sampled entries. We also demonstrate empirically that our algorithms outperform existing methods, such as those of [5, 18, 14], for ARMP and the matrix completion problem by an order of magnitude and are also more robust to noise and sampling schemes. In particular, results show that our SVP-Newton method is significantly robust to noise and performs impressively on a more realistic power-law sampling scheme for the matrix completion problem. 1
2 0.80915833 136 nips-2010-Large-Scale Matrix Factorization with Missing Data under Additional Constraints
Author: Kaushik Mitra, Sameer Sheorey, Rama Chellappa
Abstract: Matrix factorization in the presence of missing data is at the core of many computer vision problems such as structure from motion (SfM), non-rigid SfM and photometric stereo. We formulate the problem of matrix factorization with missing data as a low-rank semidefinite program (LRSDP) with the advantage that: 1) an efficient quasi-Newton implementation of the LRSDP enables us to solve large-scale factorization problems, and 2) additional constraints such as orthonormality, required in orthographic SfM, can be directly incorporated in the new formulation. Our empirical evaluations suggest that, under the conditions of matrix completion theory, the proposed algorithm finds the optimal solution, and also requires fewer observations compared to the current state-of-the-art algorithms. We further demonstrate the effectiveness of the proposed algorithm in solving the affine SfM problem, non-rigid SfM and photometric stereo problems.
3 0.7341612 48 nips-2010-Collaborative Filtering in a Non-Uniform World: Learning with the Weighted Trace Norm
Author: Nathan Srebro, Ruslan Salakhutdinov
Abstract: We show that matrix completion with trace-norm regularization can be significantly hurt when entries of the matrix are sampled non-uniformly, but that a properly weighted version of the trace-norm regularizer works well with non-uniform sampling. We show that the weighted trace-norm regularization indeed yields significant gains on the highly non-uniformly sampled Netflix dataset.
4 0.70822281 195 nips-2010-Online Learning in The Manifold of Low-Rank Matrices
Author: Uri Shalit, Daphna Weinshall, Gal Chechik
Abstract: When learning models that are represented in matrix forms, enforcing a low-rank constraint can dramatically improve the memory and run time complexity, while providing a natural regularization of the model. However, naive approaches for minimizing functions over the set of low-rank matrices are either prohibitively time consuming (repeated singular value decomposition of the matrix) or numerically unstable (optimizing a factored representation of the low rank matrix). We build on recent advances in optimization over manifolds, and describe an iterative online learning procedure, consisting of a gradient step, followed by a second-order retraction back to the manifold. While the ideal retraction is hard to compute, and so is the projection operator that approximates it, we describe another second-order retraction that can be computed efficiently, with run time and memory complexity of O ((n + m)k) for a rank-k matrix of dimension m × n, given rank-one gradients. We use this algorithm, LORETA, to learn a matrixform similarity measure over pairs of documents represented as high dimensional vectors. LORETA improves the mean average precision over a passive- aggressive approach in a factorized model, and also improves over a full model trained over pre-selected features using the same memory requirements. LORETA also showed consistent improvement over standard methods in a large (1600 classes) multi-label image classification task. 1
5 0.60227382 275 nips-2010-Transduction with Matrix Completion: Three Birds with One Stone
Author: Andrew Goldberg, Ben Recht, Junming Xu, Robert Nowak, Xiaojin Zhu
Abstract: We pose transductive classification as a matrix completion problem. By assuming the underlying matrix has a low rank, our formulation is able to handle three problems simultaneously: i) multi-label learning, where each item has more than one label, ii) transduction, where most of these labels are unspecified, and iii) missing data, where a large number of features are missing. We obtained satisfactory results on several real-world tasks, suggesting that the low rank assumption may not be as restrictive as it seems. Our method allows for different loss functions to apply on the feature and label entries of the matrix. The resulting nuclear norm minimization problem is solved with a modified fixed-point continuation method that is guaranteed to find the global optimum. 1
6 0.57816285 210 nips-2010-Practical Large-Scale Optimization for Max-norm Regularization
7 0.5719189 162 nips-2010-Link Discovery using Graph Feature Tracking
8 0.54363829 31 nips-2010-An analysis on negative curvature induced by singularity in multi-layer neural-network learning
9 0.50930834 231 nips-2010-Robust PCA via Outlier Pursuit
10 0.47499663 9 nips-2010-A New Probabilistic Model for Rank Aggregation
11 0.465817 92 nips-2010-Fast global convergence rates of gradient methods for high-dimensional statistical recovery
12 0.4365606 221 nips-2010-Random Projections for $k$-means Clustering
13 0.43231791 45 nips-2010-CUR from a Sparse Optimization Viewpoint
14 0.37728715 219 nips-2010-Random Conic Pursuit for Semidefinite Programming
15 0.36301321 106 nips-2010-Global Analytic Solution for Variational Bayesian Matrix Factorization
16 0.32277912 63 nips-2010-Distributed Dual Averaging In Networks
17 0.3135305 202 nips-2010-Parallelized Stochastic Gradient Descent
18 0.30606654 265 nips-2010-The LASSO risk: asymptotic results and real world examples
19 0.30199659 248 nips-2010-Sparse Inverse Covariance Selection via Alternating Linearization Methods
20 0.29676899 12 nips-2010-A Primal-Dual Algorithm for Group Sparse Regularization with Overlapping Groups
topicId topicWeight
[(13, 0.069), (27, 0.03), (30, 0.052), (35, 0.016), (42, 0.351), (45, 0.127), (50, 0.035), (52, 0.063), (60, 0.034), (77, 0.052), (78, 0.013), (79, 0.012), (90, 0.025)]
simIndex simValue paperId paperTitle
same-paper 1 0.6720019 110 nips-2010-Guaranteed Rank Minimization via Singular Value Projection
Author: Prateek Jain, Raghu Meka, Inderjit S. Dhillon
Abstract: Minimizing the rank of a matrix subject to affine constraints is a fundamental problem with many important applications in machine learning and statistics. In this paper we propose a simple and fast algorithm SVP (Singular Value Projection) for rank minimization under affine constraints (ARMP) and show that SVP recovers the minimum rank solution for affine constraints that satisfy a restricted isometry property (RIP). Our method guarantees geometric convergence rate even in the presence of noise and requires strictly weaker assumptions on the RIP constants than the existing methods. We also introduce a Newton-step for our SVP framework to speed-up the convergence with substantial empirical gains. Next, we address a practically important application of ARMP - the problem of lowrank matrix completion, for which the defining affine constraints do not directly obey RIP, hence the guarantees of SVP do not hold. However, we provide partial progress towards a proof of exact recovery for our algorithm by showing a more restricted isometry property and observe empirically that our algorithm recovers low-rank incoherent matrices from an almost optimal number of uniformly sampled entries. We also demonstrate empirically that our algorithms outperform existing methods, such as those of [5, 18, 14], for ARMP and the matrix completion problem by an order of magnitude and are also more robust to noise and sampling schemes. In particular, results show that our SVP-Newton method is significantly robust to noise and performs impressively on a more realistic power-law sampling scheme for the matrix completion problem. 1
2 0.56591713 202 nips-2010-Parallelized Stochastic Gradient Descent
Author: Martin Zinkevich, Markus Weimer, Lihong Li, Alex J. Smola
Abstract: With the increase in available data parallel machine learning has become an increasingly pressing problem. In this paper we present the first parallel stochastic gradient descent algorithm including a detailed analysis and experimental evidence. Unlike prior work on parallel optimization algorithms [5, 7] our variant comes with parallel acceleration guarantees and it poses no overly tight latency constraints, which might only be available in the multicore setting. Our analysis introduces a novel proof technique — contractive mappings to quantify the speed of convergence of parameter distributions to their asymptotic limits. As a side effect this answers the question of how quickly stochastic gradient descent algorithms reach the asymptotically normal regime [1, 8]. 1
3 0.56244981 160 nips-2010-Linear Complementarity for Regularized Policy Evaluation and Improvement
Author: Jeffrey Johns, Christopher Painter-wakefield, Ronald Parr
Abstract: Recent work in reinforcement learning has emphasized the power of L1 regularization to perform feature selection and prevent overfitting. We propose formulating the L1 regularized linear fixed point problem as a linear complementarity problem (LCP). This formulation offers several advantages over the LARS-inspired formulation, LARS-TD. The LCP formulation allows the use of efficient off-theshelf solvers, leads to a new uniqueness result, and can be initialized with starting points from similar problems (warm starts). We demonstrate that warm starts, as well as the efficiency of LCP solvers, can speed up policy iteration. Moreover, warm starts permit a form of modified policy iteration that can be used to approximate a “greedy” homotopy path, a generalization of the LARS-TD homotopy path that combines policy evaluation and optimization. 1
4 0.44261762 117 nips-2010-Identifying graph-structured activation patterns in networks
Author: James Sharpnack, Aarti Singh
Abstract: We consider the problem of identifying an activation pattern in a complex, largescale network that is embedded in very noisy measurements. This problem is relevant to several applications, such as identifying traces of a biochemical spread by a sensor network, expression levels of genes, and anomalous activity or congestion in the Internet. Extracting such patterns is a challenging task specially if the network is large (pattern is very high-dimensional) and the noise is so excessive that it masks the activity at any single node. However, typically there are statistical dependencies in the network activation process that can be leveraged to fuse the measurements of multiple nodes and enable reliable extraction of highdimensional noisy patterns. In this paper, we analyze an estimator based on the graph Laplacian eigenbasis, and establish the limits of mean square error recovery of noisy patterns arising from a probabilistic (Gaussian or Ising) model based on an arbitrary graph structure. We consider both deterministic and probabilistic network evolution models, and our results indicate that by leveraging the network interaction structure, it is possible to consistently recover high-dimensional patterns even when the noise variance increases with network size. 1
5 0.43829069 265 nips-2010-The LASSO risk: asymptotic results and real world examples
Author: Mohsen Bayati, José Pereira, Andrea Montanari
Abstract: We consider the problem of learning a coefficient vector x0 ∈ RN from noisy linear observation y = Ax0 + w ∈ Rn . In many contexts (ranging from model selection to image processing) it is desirable to construct a sparse estimator x. In this case, a popular approach consists in solving an ℓ1 -penalized least squares problem known as the LASSO or Basis Pursuit DeNoising (BPDN). For sequences of matrices A of increasing dimensions, with independent gaussian entries, we prove that the normalized risk of the LASSO converges to a limit, and we obtain an explicit expression for this limit. Our result is the first rigorous derivation of an explicit formula for the asymptotic mean square error of the LASSO for random instances. The proof technique is based on the analysis of AMP, a recently developed efficient algorithm, that is inspired from graphical models ideas. Through simulations on real data matrices (gene expression data and hospital medical records) we observe that these results can be relevant in a broad array of practical applications.
6 0.43678507 92 nips-2010-Fast global convergence rates of gradient methods for high-dimensional statistical recovery
7 0.435366 7 nips-2010-A Family of Penalty Functions for Structured Sparsity
8 0.43394274 238 nips-2010-Short-term memory in neuronal networks through dynamical compressed sensing
9 0.43373603 31 nips-2010-An analysis on negative curvature induced by singularity in multi-layer neural-network learning
10 0.43043178 148 nips-2010-Learning Networks of Stochastic Differential Equations
11 0.43005362 222 nips-2010-Random Walk Approach to Regret Minimization
13 0.4276239 221 nips-2010-Random Projections for $k$-means Clustering
14 0.4273192 10 nips-2010-A Novel Kernel for Learning a Neuron Model from Spike Train Data
15 0.42729473 261 nips-2010-Supervised Clustering
16 0.42548281 122 nips-2010-Improving the Asymptotic Performance of Markov Chain Monte-Carlo by Inserting Vortices
17 0.42533296 5 nips-2010-A Dirty Model for Multi-task Learning
18 0.42490098 280 nips-2010-Unsupervised Kernel Dimension Reduction
19 0.42449829 136 nips-2010-Large-Scale Matrix Factorization with Missing Data under Additional Constraints
20 0.4244267 204 nips-2010-Penalized Principal Component Regression on Graphs for Analysis of Subnetworks