nips nips2010 nips2010-110 knowledge-graph by maker-knowledge-mining

110 nips-2010-Guaranteed Rank Minimization via Singular Value Projection

Source: pdf

Author: Prateek Jain, Raghu Meka, Inderjit S. Dhillon

Abstract: Minimizing the rank of a matrix subject to afﬁne constraints is a fundamental problem with many important applications in machine learning and statistics. In this paper we propose a simple and fast algorithm SVP (Singular Value Projection) for rank minimization under afﬁne constraints (ARMP) and show that SVP recovers the minimum rank solution for afﬁne constraints that satisfy a restricted isometry property (RIP). Our method guarantees geometric convergence rate even in the presence of noise and requires strictly weaker assumptions on the RIP constants than the existing methods. We also introduce a Newton-step for our SVP framework to speed-up the convergence with substantial empirical gains. Next, we address a practically important application of ARMP - the problem of lowrank matrix completion, for which the deﬁning afﬁne constraints do not directly obey RIP, hence the guarantees of SVP do not hold. However, we provide partial progress towards a proof of exact recovery for our algorithm by showing a more restricted isometry property and observe empirically that our algorithm recovers low-rank incoherent matrices from an almost optimal number of uniformly sampled entries. We also demonstrate empirically that our algorithms outperform existing methods, such as those of [5, 18, 14], for ARMP and the matrix completion problem by an order of magnitude and are also more robust to noise and sampling schemes. In particular, results show that our SVP-Newton method is signiﬁcantly robust to noise and performs impressively on a more realistic power-law sampling scheme for the matrix completion problem. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract Minimizing the rank of a matrix subject to afﬁne constraints is a fundamental problem with many important applications in machine learning and statistics. [sent-8, score-0.289]

2 In this paper we propose a simple and fast algorithm SVP (Singular Value Projection) for rank minimization under afﬁne constraints (ARMP) and show that SVP recovers the minimum rank solution for afﬁne constraints that satisfy a restricted isometry property (RIP). [sent-9, score-0.703]

3 Next, we address a practically important application of ARMP - the problem of lowrank matrix completion, for which the deﬁning afﬁne constraints do not directly obey RIP, hence the guarantees of SVP do not hold. [sent-12, score-0.116]

4 However, we provide partial progress towards a proof of exact recovery for our algorithm by showing a more restricted isometry property and observe empirically that our algorithm recovers low-rank incoherent matrices from an almost optimal number of uniformly sampled entries. [sent-13, score-0.391]

5 We also demonstrate empirically that our algorithms outperform existing methods, such as those of [5, 18, 14], for ARMP and the matrix completion problem by an order of magnitude and are also more robust to noise and sampling schemes. [sent-14, score-0.269]

6 In particular, results show that our SVP-Newton method is signiﬁcantly robust to noise and performs impressively on a more realistic power-law sampling scheme for the matrix completion problem. [sent-15, score-0.269]

7 1 Introduction In this paper we study the general afﬁne rank minimization problem (ARMP), min rank(X) s. [sent-16, score-0.231]

8 d The afﬁne rank minimization problem above is of considerable practical interest and many important machine learning problems such as matrix completion, low-dimensional metric embedding, lowrank kernel learning can be viewed as instances of the above problem. [sent-18, score-0.303]

9 [24] gave the ﬁrst nontrivial results for the 1 problem obtaining guaranteed rank minimization for afﬁne transformations A that satisfy a restricted isometry property (RIP). [sent-22, score-0.403]

10 Deﬁne the isometry constant of A, δk to be the smallest number such that for all X ∈ Rm×n of rank at most k, (1 − δk )∥X∥2 ≤ ∥A(X)∥2 ≤ (1 + δk )∥X∥2 . [sent-23, score-0.321]

11 show that for ARMP with isometry constant δ5k < 1/10, the minimum rank solution can be recovered by the minimum trace-norm solution. [sent-29, score-0.359]

12 We present a simple analysis showing that SVP recovers the minimum rank solution for noisy afﬁne constraints that satisfy RIP and prove the following guarantees. [sent-31, score-0.312]

13 They also require stronger isometry √ assumptions, δ3k < 1/ 30, than our analysis. [sent-34, score-0.13]

14 1 Suppose the isometry constant of A satisﬁes δ2k < 1/3 and let b = A(X ∗ ) for a rank-k matrix X ∗ . [sent-36, score-0.202]

15 Furthermore, SVP outputs a matrix X of rank at most k such that ∥A(X) − b∥2 ≤ ϵ and ∥X − 2 ⌈ ⌉ X ∗ ∥2 ≤ ϵ/(1 − δ2k ) in at most F 1 log((1−δ2k )/2δ2k ) log ∥b∥2 2ϵ iterations. [sent-38, score-0.282]

16 2 (Main) Suppose the isometry constant of A satisﬁes δ2k < 1/3 and let b = A(X ∗ )+e for a rank k matrix X ∗ and an error vector e ∈ Rd . [sent-40, score-0.393]

17 Then, SVP with step-size ηt = 1/(1 + δ2k ) outputs a matrix X of rank at most k such that ∥A(X) − b∥2 ≤ C∥e∥2 + ϵ and ∥X − X ∗ ∥2 ≤ 2 F ⌈ ⌉ C∥e∥2 +ϵ 1−δ2k , ϵ ≥ 0, in at most 1 log(1/D) ∥b∥ log 2(C∥e∥2 +ϵ) iterations for universal constants C, D. [sent-41, score-0.297]

18 We next consider an important application of ARMP: the low-rank matrix completion problem (MCP)— given a small number of entries from an unknown low-rank matrix, the task is to complete the missing entries. [sent-48, score-0.268]

19 [14] gave the ﬁrst theoretical guarantees for the problem obtaining exact recovery from an almost optimal number of uniformly sampled entries. [sent-51, score-0.094]

20 While RIP does not hold for MCP, we show that a similar property holds for incoherent matrices [6]. [sent-52, score-0.126]

21 Given our reﬁned RIP and a hypothesis bounding the incoherence of the iterates arising in SVP, an analysis similar to that of Theorem 1. [sent-53, score-0.147]

22 We provide strong empirical evidence for our hypothesis and show that that both of our algorithms recover a low-rank matrix from an almost optimal number of uniformly sampled entries. [sent-55, score-0.121]

23 In summary, our main contributions are: • Motivated by [11], we propose a projected gradient based algorithm, SVP, for ARMP and show that our method recovers the optimal rank solution when the afﬁne constraints satisfy RIP. [sent-56, score-0.312]

24 To the best of our knowledge, our isometry constant requirements are √ stringent: we only require least δ2k < 1/3 as opposed to δ5k < 1/10 by Recht et al. [sent-57, score-0.148]

25 For instance, on the Movie-lens dataset [1] and rank k = 3, SVP-Newton achieves an RMSE of 0. [sent-62, score-0.191]

26 • As observed in [23], most trace-norm based methods perform poorly for matrix completion when entries are sampled from more realistic power-law distributions. [sent-65, score-0.295]

27 We then specialize our algorithm for the problem of matrix completion and prove a more restricted isometry property for the same. [sent-70, score-0.413]

28 However, the Euclidean projection onto C(k) can be computed efﬁciently using singular value decomposition (SVD). [sent-76, score-0.105]

29 In SVP, a candidate solution to ARMP is computed iteratively by starting from the all-zero matrix and adapting the classical projected gradient descent update as follows (note that ∇ψ(X) = AT (A(X) − b)): ( ) ( ) X t+1 ← Pk X t − ηt ∇ψ(X t ) = Pk X t − ηt AT (A(X t ) − b) . [sent-81, score-0.108]

30 Note that the iterates X are always low-rank, facilitating faster computation of the SVD. [sent-83, score-0.077]

31 Then, ψ(X t+1 ) ≤ ψ(X ∗ ) + (1−δ2k ) ∥A(X ∗ − X t )∥2 , where δ2k is the rank 2k 2 isometry constant of A. [sent-95, score-0.321]

32 Further, using RIP for the rank at most 2k matrix X τ − X ∗ we log((1−δ2k )/2δ2k ) log ϵ get: ∥X τ − X ∗ ∥ ≤ ψ(X τ )/(1 − δ2k ) ≤ ⌈ ϵ/(1 − δ2k ). [sent-107, score-0.282]

33 2 Matrix Completion We ﬁrst describe the low-rank matrix completion problem formally. [sent-113, score-0.245]

34 Then, the low-rank matrix completion problem (MCP) can be formulated as follows, min rank(X) s. [sent-116, score-0.245]

35 X t+1 ← Pk X t − (2) (1 + δ)p Although matrix completion is a special case of ARMP, the afﬁne constraints that deﬁne MCP, PΩ , do not satisfy RIP in general. [sent-121, score-0.292]

36 However, we show that the matrix completion afﬁne constraints satisfy RIP for low-rank incoherent matrices. [sent-126, score-0.383]

37 1 (Incoherence) A matrix X ∈ Rm×n with singular value decomposition X = √ √ µ µ U ΣV T is µ-incoherent if maxi,j |Uij | ≤ √m , maxi,j |Vij | ≤ √n . [sent-128, score-0.152]

38 Hence, a random sampling of the matrix should provide enough global information to satisfy RIP. [sent-133, score-0.117]

39 Using the above deﬁnition, we prove the following reﬁned restricted isometry property. [sent-134, score-0.168]

40 (3) F F F Roughly, our proof combines a Chernoff bound estimate for ∥PΩ (X)∥2 with a union bound over F low-rank incoherent matrices. [sent-137, score-0.091]

41 1 can be used to show that SVP achieves exact recovery for low-rank incoherent matrices from uniformly sampled entries. [sent-142, score-0.202]

42 As supported by empirical evidence, we hypothesize that the iterates X t arising in SVP remain incoherent when the underlying matrix X ∗ is incoherent. [sent-143, score-0.24]

43 √ t Figure 1 (d) plots the maximum incoherence maxt µ(X t ) = n maxt,i,j |Uij |, where U t are the t left singular vectors of the intermediate iterates X computed by SVP. [sent-144, score-0.239]

44 The ﬁgure clearly shows that the incoherence µ(X t ) of the iterates is bounded by a constant independent of the matrix size n and density p throughout the execution of SVP. [sent-145, score-0.221]

45 Figure 2 (c) plots the threshold sampling density p beyond which matrix completion for randomly generated matrices is solved exactly by SVP for ﬁxed k and varying matrix sizes n. [sent-146, score-0.415]

46 2 and supported by empirical evidence (Figures 2 (c), (d)) we hypothesize that SVP achieves exact recovery from an almost optimal number of samples for incoherent matrices. [sent-149, score-0.118]

47 Then, there exists a constant C such that for a µincoherent matrix X ∗ of rank at most k and Ω sampled from the Bernoulli model with density p = Ωµ,k ((log n)/m), SVP with step-size ηt = 1/(1 + δ)p converges to X ∗ with high probability. [sent-152, score-0.314]

48 Moreover, SVP outputs a matrix X of rank at most k such that ∥PΩ (X) − PΩ (X ∗ )∥2 ≤ ϵ after F (⌈ ( )⌉) Oµ,k log 1 iterations. [sent-153, score-0.282]

49 1 RIP for Matrix Completion on Incoherent Matrices We now prove the restricted isometry property of Theorem 2. [sent-156, score-0.168]

50 4 Let X ∈ Rm×n be a µ-incoherent matrix of rank at most k. [sent-168, score-0.263]

51 F F F 3 α2 √ While the above lemma shows Equation (3) for a ﬁxed rank k, µ-incoherent X (i. [sent-173, score-0.225]

52 4), we need to show Equation (3) for all such rank k incoherent matrices. [sent-176, score-0.282]

53 To handle this problem, we discretize the space of low-rank incoherent matrices so as to be able to use the above lemma and a union bound. [sent-177, score-0.16]

54 We now show the existence of a small set of matrices S(µ, ϵ) ⊆ Rm×n such that every low-rank µ-incoherent matrix is close to an appropriately regular matrix from the set S(µ, ϵ). [sent-178, score-0.179]

55 For any µ-incoherent X ∈ Rm×n √ of rank k with ∥X∥2 = 1, there exists Y ∈ S(µ, ϵ) s. [sent-181, score-0.191]

56 5 and union bound, for any Y ∈ S ′ (µ, ϵ), ( 2 ) ( 2 ) [ ] −δ pmn −δ pmn 2 2 2 3(m+n)k Pr ∥PΩ (Y )∥F − p∥Y ∥F ≥ δp∥Y ∥F ≤ 2(mnk/ϵ) exp ≤ exp(C1 nk log n)·exp , 16µ2 k 16µ2 k where C1 ≥ 0 is a constant independent of m, n, k. [sent-194, score-0.085]

57 (4) F F F As the statement of the theorem is invariant under scaling, it is enough to show the statement for all µ-incoherent matrices X of rank at most k and ∥X∥2 = 1. [sent-196, score-0.25]

58 Recall that each iteration of SVP (Equation (1)) takes a step along the gradient of the objective function and then projects the iterate to the set of low rank matrices using SVD. [sent-206, score-0.242]

59 This makes SVP-Newton computationally expensive for problems with large rank, particularly for situations with a large number of constraints as is the case for matrix completion. [sent-220, score-0.098]

60 3 Related Work and Computational Issues The general rank minimization problem with afﬁne constraints is NP-hard and is also NP-hard to approximate [22]. [sent-225, score-0.257]

61 Most methods for ARMP either relax the rank constraint to a convex function such as the trace-norm [8], [9], or assume a factorization and optimize the resulting non-convex problem by alternating minimization [4, 3, 15]. [sent-226, score-0.231]

62 [24] were later extended to noisy measurements and isometry constants √ up to δ3k < 1/4 3 by Fazel et al. [sent-228, score-0.148]

63 Recently, Lee and Bresler [17] proposed an algorithm (ADMiRA) motivated by the orthogonal matching pursuit line of work in compressed sensing and show that for afﬁne constraints with isometry constant δ4k ≤ 0. [sent-231, score-0.172]

64 However, their method is not very efﬁcient for large datasets and when the rank of the optimal solution is relatively large. [sent-233, score-0.191]

65 Candes and Recht [6], Candes and Tao [7] show that if X ∗ is µ-incoherent and the known entries are sampled uniformly at random with |Ω| ≥ C(µ) k 2 n log2 n, ﬁnding the minimum trace-norm solution recovers the minimum rank solution. [sent-236, score-0.339]

66 al obtained similar results independently for exact recovery from uniformly sampled Ω with |Ω| ≥ C(µ, k) n log n. [sent-238, score-0.095]

67 Minimizing the trace-norm of a matrix subject to afﬁne constraints can be cast as a semi-deﬁnite program (SDP). [sent-239, score-0.098]

68 Also, noisy measurements pose considerable computational challenges for trace-norm optimization as the rank of the intermediate iterates can become very large (see Figure 3(b)). [sent-247, score-0.262]

69 5 1000 5000 2000 3000 4000 n (Size of the Matrix) 5000 (a) (b) (c) (d) Figure 1: (a) Time taken by SVP and SVT for random instances of the Afﬁne Rank Minimization Problem (ARMP) with optimal rank k = 5. [sent-266, score-0.191]

70 (c) Empirical estimates of the sampling density threshold required for exact matrix completion by SVP (here C = 1. [sent-268, score-0.293]

71 (d) Maximum incoherence maxt µ(X t ) over the iterates of SVP for varying densities p and sizes n. [sent-271, score-0.143]

72 (c) Running time (on log scale) of various methods for matrix completion with sampling density p = . [sent-276, score-0.331]

73 4 Experimental Results In this section, we empirically evaluate our methods for the afﬁne rank minimization problem and low-rank matrix completion. [sent-283, score-0.303]

74 For ARMP we compare our method against the trace-norm based singular value thresholding (SVT) method [5]. [sent-285, score-0.081]

75 We use our own implementation of SVT for ARMP and ALS, while for matrix completion we use the code provided by the respective authors for SVT, ADMiRA and OPT. [sent-290, score-0.245]

76 We generate random matrices X ∈ Rn×n of different sizes n and ﬁxed rank k = 5. [sent-295, score-0.226]

77 Next we evaluate our method for the problem of matrix reconstruction from random measurements. [sent-298, score-0.091]

78 The MIT logo we use is a 38 × 73 image and has rank four. [sent-301, score-0.224]

79 Matrix Completion: Synthetic Datasets (Uniform Sampling) We now evaluate our method against various matrix completion methods for random low-rank ma7 ALS 0. [sent-305, score-0.264]

80 5 0 ICMC ALS SVT SVP SVP NewtonD 3 RMSE 3 k 2 3 5 7 10 12 500 1000 1500 n (Size of Matrix) 2000 0 500 1000 1500 n (Size of Matrix) (a) (b) (c) (d) Figure 3: (a): RMSE incurred by various methods for matrix completion with different rank (k) solutions on Movie-Lens Dataset. [sent-332, score-0.472]

81 (b): Time(on log scale) required by various methods for matrix completion with p = . [sent-333, score-0.283]

82 (c): RMSE incurred by various methods for matrix completion with p = 0. [sent-336, score-0.281]

83 We generate a random rank k matrix X ∈ Rn×n and generate random Bernoulli samples with probability p. [sent-340, score-0.263]

84 Figure 2 (a) compares the time required by various methods (in log-scale) to obtain a root mean square error (RMSE) of 10−3 on the sampled entries for ﬁxed k = 2. [sent-341, score-0.085]

85 Figure 2 (c) compares the time required by various methods to obtain a root mean square error (RMSE) of 10−3 on the sampled entries for ﬁxed n = 1000 and increasing k. [sent-346, score-0.085]

86 One of the reason for this is that due to noise, the rank of the intermediate iterates arising in SVT can be fairly large. [sent-355, score-0.284]

87 As before, we generate a random rank-k = 10 matrix X ∈ Rn×n and sample the entries of X using a graph generated using Chung-Lu-Vu model with power-law distributed degrees (see [23]) for details. [sent-357, score-0.095]

88 Figure 3 (c) plots the RMSE obtained by various methods for varying n and ﬁxed sampling density p = 0. [sent-358, score-0.082]

89 Since, rank cannot be ﬁxed in SVT, we try various values for the parameter τ to obtain the desired rank solution. [sent-368, score-0.401]

90 89 but required rank 17 solution and was signiﬁcantly slower in convergence because many intermediate iterates had large rank (up to around 150). [sent-374, score-0.453]

91 We attribute the relatively poor performance of SVP and SVT as compared with ALS and SVP-Newton to the fact that the ratings matrix is not sampled uniformly, thus violating the crucial assumption of uniformly distributed samples. [sent-375, score-0.136]

92 A singular value thresholding algorithm for e matrix completion. [sent-400, score-0.153]

93 A rank minimization heuristic with application to minimum order system approximation. [sent-416, score-0.25]

94 Log-det heuristic for matrix rank minimization with applications to Hankel and Euclidean distance matrices. [sent-422, score-0.303]

95 Compressed sensing and robust recovery of low rank matrices. [sent-429, score-0.234]

96 Gradient descent with sparsiﬁcation: an iterative algorithm for sparse recovery with restricted isometry property. [sent-437, score-0.178]

97 Convergence of ﬁxed point continuation algorithms for matrix rank minimization, 2009. [sent-440, score-0.263]

98 Guaranteed minimum rank approximation from linear observations by nuclear norm minimization with an ellipsoidal constraint, 2009. [sent-468, score-0.269]

99 Fixed point and bregman iterative methods for matrix rank minimization. [sent-478, score-0.263]

100 Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization, 2007. [sent-497, score-0.091]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('svp', 0.75), ('svt', 0.31), ('armp', 0.243), ('rank', 0.191), ('completion', 0.173), ('rip', 0.169), ('isometry', 0.13), ('rmse', 0.121), ('af', 0.112), ('als', 0.107), ('incoherent', 0.091), ('mcp', 0.088), ('recht', 0.087), ('newtond', 0.077), ('matrix', 0.072), ('incoherence', 0.07), ('admira', 0.068), ('singular', 0.065), ('iterates', 0.055), ('rarmp', 0.055), ('rm', 0.046), ('uk', 0.045), ('bresler', 0.044), ('raghu', 0.044), ('svd', 0.042), ('candes', 0.041), ('minimization', 0.04), ('inderjit', 0.039), ('keshavan', 0.039), ('recovers', 0.038), ('vk', 0.036), ('fazel', 0.035), ('matrices', 0.035), ('lemma', 0.034), ('doi', 0.033), ('icmc', 0.033), ('logo', 0.033), ('pmn', 0.033), ('svk', 0.033), ('pk', 0.031), ('opt', 0.029), ('sampled', 0.027), ('recovery', 0.027), ('prateek', 0.027), ('constraints', 0.026), ('meka', 0.025), ('emmanuel', 0.025), ('projection', 0.025), ('squares', 0.025), ('austin', 0.024), ('density', 0.024), ('theorem', 0.024), ('sampling', 0.024), ('url', 0.023), ('entries', 0.023), ('arising', 0.022), ('arpack', 0.022), ('facilitating', 0.022), ('kiryung', 0.022), ('propack', 0.022), ('uniformly', 0.022), ('goldfarb', 0.022), ('restricted', 0.021), ('satisfy', 0.021), ('lee', 0.02), ('projected', 0.02), ('various', 0.019), ('secs', 0.019), ('toh', 0.019), ('log', 0.019), ('cand', 0.019), ('minimum', 0.019), ('reconstruction', 0.019), ('nuclear', 0.019), ('cai', 0.018), ('guarantees', 0.018), ('hindi', 0.018), ('yehuda', 0.018), ('maxt', 0.018), ('et', 0.018), ('lemmas', 0.018), ('bernoulli', 0.018), ('conjecture', 0.017), ('jain', 0.017), ('prove', 0.017), ('bangalore', 0.017), ('incurred', 0.017), ('intermediate', 0.016), ('compares', 0.016), ('gradient', 0.016), ('thresholding', 0.016), ('chernoff', 0.016), ('uij', 0.016), ('sensing', 0.016), ('fix', 0.016), ('decomposition', 0.015), ('appendix', 0.015), ('plots', 0.015), ('ratings', 0.015), ('iterations', 0.015)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000001 110 nips-2010-Guaranteed Rank Minimization via Singular Value Projection

Author: Prateek Jain, Raghu Meka, Inderjit S. Dhillon

2 0.11108918 162 nips-2010-Link Discovery using Graph Feature Tracking

Author: Emile Richard, Nicolas Baskiotis, Theodoros Evgeniou, Nicolas Vayatis

Abstract: We consider the problem of discovering links of an evolving undirected graph given a series of past snapshots of that graph. The graph is observed through the time sequence of its adjacency matrix and only the presence of edges is observed. The absence of an edge on a certain snapshot cannot be distinguished from a missing entry in the adjacency matrix. Additional information can be provided by examining the dynamics of the graph through a set of topological features, such as the degrees of the vertices. We develop a novel methodology by building on both static matrix completion methods and the estimation of the future state of relevant graph features. Our procedure relies on the formulation of an optimization problem which can be approximately solved by a fast alternating linearized algorithm whose properties are examined. We show experiments with both simulated and real data which reveal the interest of our methodology. 1

3 0.10552412 275 nips-2010-Transduction with Matrix Completion: Three Birds with One Stone

Author: Andrew Goldberg, Ben Recht, Junming Xu, Robert Nowak, Xiaojin Zhu

Abstract: We pose transductive classiﬁcation as a matrix completion problem. By assuming the underlying matrix has a low rank, our formulation is able to handle three problems simultaneously: i) multi-label learning, where each item has more than one label, ii) transduction, where most of these labels are unspeciﬁed, and iii) missing data, where a large number of features are missing. We obtained satisfactory results on several real-world tasks, suggesting that the low rank assumption may not be as restrictive as it seems. Our method allows for different loss functions to apply on the feature and label entries of the matrix. The resulting nuclear norm minimization problem is solved with a modiﬁed ﬁxed-point continuation method that is guaranteed to ﬁnd the global optimum. 1

4 0.10498556 195 nips-2010-Online Learning in The Manifold of Low-Rank Matrices

Author: Uri Shalit, Daphna Weinshall, Gal Chechik

Abstract: When learning models that are represented in matrix forms, enforcing a low-rank constraint can dramatically improve the memory and run time complexity, while providing a natural regularization of the model. However, naive approaches for minimizing functions over the set of low-rank matrices are either prohibitively time consuming (repeated singular value decomposition of the matrix) or numerically unstable (optimizing a factored representation of the low rank matrix). We build on recent advances in optimization over manifolds, and describe an iterative online learning procedure, consisting of a gradient step, followed by a second-order retraction back to the manifold. While the ideal retraction is hard to compute, and so is the projection operator that approximates it, we describe another second-order retraction that can be computed efﬁciently, with run time and memory complexity of O ((n + m)k) for a rank-k matrix of dimension m × n, given rank-one gradients. We use this algorithm, LORETA, to learn a matrixform similarity measure over pairs of documents represented as high dimensional vectors. LORETA improves the mean average precision over a passive- aggressive approach in a factorized model, and also improves over a full model trained over pre-selected features using the same memory requirements. LORETA also showed consistent improvement over standard methods in a large (1600 classes) multi-label image classiﬁcation task. 1

5 0.1037305 92 nips-2010-Fast global convergence rates of gradient methods for high-dimensional statistical recovery

Author: Alekh Agarwal, Sahand Negahban, Martin J. Wainwright

Abstract: Many statistical M -estimators are based on convex optimization problems formed by the weighted sum of a loss function with a norm-based regularizer. We analyze the convergence rates of ﬁrst-order gradient methods for solving such problems within a high-dimensional framework that allows the data dimension d to grow with (and possibly exceed) the sample size n. This high-dimensional structure precludes the usual global assumptions— namely, strong convexity and smoothness conditions—that underlie classical optimization analysis. We deﬁne appropriately restricted versions of these conditions, and show that they are satisﬁed with high probability for various statistical models. Under these conditions, our theory guarantees that Nesterov’s ﬁrst-order method [12] has a globally geometric rate of convergence up to the statistical precision of the model, meaning the typical Euclidean distance between the true unknown parameter θ∗ and the optimal solution θ. This globally linear rate is substantially faster than previous analyses of global convergence for speciﬁc methods that yielded only sublinear rates. Our analysis applies to a wide range of M -estimators and statistical models, including sparse linear regression using Lasso ( 1 regularized regression), group Lasso, block sparsity, and low-rank matrix recovery using nuclear norm regularization. Overall, this result reveals an interesting connection between statistical precision and computational eﬃciency in high-dimensional estimation. 1

6 0.095075846 136 nips-2010-Large-Scale Matrix Factorization with Missing Data under Additional Constraints

7 0.085589655 48 nips-2010-Collaborative Filtering in a Non-Uniform World: Learning with the Weighted Trace Norm

8 0.071981229 231 nips-2010-Robust PCA via Outlier Pursuit

9 0.069395304 210 nips-2010-Practical Large-Scale Optimization for Max-norm Regularization

10 0.057049733 221 nips-2010-Random Projections for $k$-means Clustering

11 0.054001264 265 nips-2010-The LASSO risk: asymptotic results and real world examples

12 0.048463568 124 nips-2010-Inductive Regularized Learning of Kernel Functions

13 0.046174906 31 nips-2010-An analysis on negative curvature induced by singularity in multi-layer neural-network learning

14 0.044919726 230 nips-2010-Robust Clustering as Ensembles of Affinity Relations

15 0.042221289 148 nips-2010-Learning Networks of Stochastic Differential Equations

16 0.037320405 146 nips-2010-Learning Multiple Tasks using Manifold Regularization

17 0.036106039 63 nips-2010-Distributed Dual Averaging In Networks

18 0.034369949 134 nips-2010-LSTD with Random Projections

19 0.034217294 9 nips-2010-A New Probabilistic Model for Rank Aggregation

20 0.033302311 202 nips-2010-Parallelized Stochastic Gradient Descent

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.102), (1, 0.018), (2, 0.073), (3, 0.032), (4, 0.036), (5, -0.05), (6, 0.031), (7, 0.017), (8, -0.079), (9, -0.019), (10, 0.05), (11, -0.04), (12, 0.095), (13, 0.049), (14, 0.087), (15, -0.02), (16, 0.009), (17, -0.068), (18, 0.02), (19, 0.146), (20, 0.078), (21, 0.099), (22, -0.029), (23, -0.172), (24, 0.094), (25, -0.021), (26, -0.019), (27, -0.072), (28, 0.003), (29, -0.041), (30, -0.082), (31, -0.017), (32, 0.054), (33, -0.001), (34, -0.006), (35, 0.003), (36, 0.056), (37, 0.03), (38, -0.041), (39, 0.024), (40, 0.034), (41, -0.076), (42, -0.012), (43, 0.15), (44, 0.005), (45, 0.034), (46, -0.025), (47, -0.023), (48, 0.061), (49, -0.074)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.92908478 110 nips-2010-Guaranteed Rank Minimization via Singular Value Projection

Author: Prateek Jain, Raghu Meka, Inderjit S. Dhillon

2 0.80915833 136 nips-2010-Large-Scale Matrix Factorization with Missing Data under Additional Constraints

Author: Kaushik Mitra, Sameer Sheorey, Rama Chellappa

Abstract: Matrix factorization in the presence of missing data is at the core of many computer vision problems such as structure from motion (SfM), non-rigid SfM and photometric stereo. We formulate the problem of matrix factorization with missing data as a low-rank semideﬁnite program (LRSDP) with the advantage that: 1) an efﬁcient quasi-Newton implementation of the LRSDP enables us to solve large-scale factorization problems, and 2) additional constraints such as orthonormality, required in orthographic SfM, can be directly incorporated in the new formulation. Our empirical evaluations suggest that, under the conditions of matrix completion theory, the proposed algorithm ﬁnds the optimal solution, and also requires fewer observations compared to the current state-of-the-art algorithms. We further demonstrate the effectiveness of the proposed algorithm in solving the afﬁne SfM problem, non-rigid SfM and photometric stereo problems.

3 0.7341612 48 nips-2010-Collaborative Filtering in a Non-Uniform World: Learning with the Weighted Trace Norm

Author: Nathan Srebro, Ruslan Salakhutdinov

Abstract: We show that matrix completion with trace-norm regularization can be signiﬁcantly hurt when entries of the matrix are sampled non-uniformly, but that a properly weighted version of the trace-norm regularizer works well with non-uniform sampling. We show that the weighted trace-norm regularization indeed yields signiﬁcant gains on the highly non-uniformly sampled Netﬂix dataset.

4 0.70822281 195 nips-2010-Online Learning in The Manifold of Low-Rank Matrices

Author: Uri Shalit, Daphna Weinshall, Gal Chechik

5 0.60227382 275 nips-2010-Transduction with Matrix Completion: Three Birds with One Stone

Author: Andrew Goldberg, Ben Recht, Junming Xu, Robert Nowak, Xiaojin Zhu

6 0.57816285 210 nips-2010-Practical Large-Scale Optimization for Max-norm Regularization

7 0.5719189 162 nips-2010-Link Discovery using Graph Feature Tracking

8 0.54363829 31 nips-2010-An analysis on negative curvature induced by singularity in multi-layer neural-network learning

9 0.50930834 231 nips-2010-Robust PCA via Outlier Pursuit

10 0.47499663 9 nips-2010-A New Probabilistic Model for Rank Aggregation

11 0.465817 92 nips-2010-Fast global convergence rates of gradient methods for high-dimensional statistical recovery

12 0.4365606 221 nips-2010-Random Projections for $k$-means Clustering

13 0.43231791 45 nips-2010-CUR from a Sparse Optimization Viewpoint

14 0.37728715 219 nips-2010-Random Conic Pursuit for Semidefinite Programming

15 0.36301321 106 nips-2010-Global Analytic Solution for Variational Bayesian Matrix Factorization

16 0.32277912 63 nips-2010-Distributed Dual Averaging In Networks

17 0.3135305 202 nips-2010-Parallelized Stochastic Gradient Descent

18 0.30606654 265 nips-2010-The LASSO risk: asymptotic results and real world examples

19 0.30199659 248 nips-2010-Sparse Inverse Covariance Selection via Alternating Linearization Methods

20 0.29676899 12 nips-2010-A Primal-Dual Algorithm for Group Sparse Regularization with Overlapping Groups

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(13, 0.069), (27, 0.03), (30, 0.052), (35, 0.016), (42, 0.351), (45, 0.127), (50, 0.035), (52, 0.063), (60, 0.034), (77, 0.052), (78, 0.013), (79, 0.012), (90, 0.025)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.6720019 110 nips-2010-Guaranteed Rank Minimization via Singular Value Projection

Author: Prateek Jain, Raghu Meka, Inderjit S. Dhillon

2 0.56591713 202 nips-2010-Parallelized Stochastic Gradient Descent

Author: Martin Zinkevich, Markus Weimer, Lihong Li, Alex J. Smola

Abstract: With the increase in available data parallel machine learning has become an increasingly pressing problem. In this paper we present the ﬁrst parallel stochastic gradient descent algorithm including a detailed analysis and experimental evidence. Unlike prior work on parallel optimization algorithms [5, 7] our variant comes with parallel acceleration guarantees and it poses no overly tight latency constraints, which might only be available in the multicore setting. Our analysis introduces a novel proof technique — contractive mappings to quantify the speed of convergence of parameter distributions to their asymptotic limits. As a side effect this answers the question of how quickly stochastic gradient descent algorithms reach the asymptotically normal regime [1, 8]. 1

3 0.56244981 160 nips-2010-Linear Complementarity for Regularized Policy Evaluation and Improvement

Author: Jeffrey Johns, Christopher Painter-wakefield, Ronald Parr

Abstract: Recent work in reinforcement learning has emphasized the power of L1 regularization to perform feature selection and prevent overﬁtting. We propose formulating the L1 regularized linear ﬁxed point problem as a linear complementarity problem (LCP). This formulation offers several advantages over the LARS-inspired formulation, LARS-TD. The LCP formulation allows the use of efﬁcient off-theshelf solvers, leads to a new uniqueness result, and can be initialized with starting points from similar problems (warm starts). We demonstrate that warm starts, as well as the efﬁciency of LCP solvers, can speed up policy iteration. Moreover, warm starts permit a form of modiﬁed policy iteration that can be used to approximate a “greedy” homotopy path, a generalization of the LARS-TD homotopy path that combines policy evaluation and optimization. 1

4 0.44261762 117 nips-2010-Identifying graph-structured activation patterns in networks

Author: James Sharpnack, Aarti Singh

Abstract: We consider the problem of identifying an activation pattern in a complex, largescale network that is embedded in very noisy measurements. This problem is relevant to several applications, such as identifying traces of a biochemical spread by a sensor network, expression levels of genes, and anomalous activity or congestion in the Internet. Extracting such patterns is a challenging task specially if the network is large (pattern is very high-dimensional) and the noise is so excessive that it masks the activity at any single node. However, typically there are statistical dependencies in the network activation process that can be leveraged to fuse the measurements of multiple nodes and enable reliable extraction of highdimensional noisy patterns. In this paper, we analyze an estimator based on the graph Laplacian eigenbasis, and establish the limits of mean square error recovery of noisy patterns arising from a probabilistic (Gaussian or Ising) model based on an arbitrary graph structure. We consider both deterministic and probabilistic network evolution models, and our results indicate that by leveraging the network interaction structure, it is possible to consistently recover high-dimensional patterns even when the noise variance increases with network size. 1

5 0.43829069 265 nips-2010-The LASSO risk: asymptotic results and real world examples

Author: Mohsen Bayati, José Pereira, Andrea Montanari

Abstract: We consider the problem of learning a coefﬁcient vector x0 ∈ RN from noisy linear observation y = Ax0 + w ∈ Rn . In many contexts (ranging from model selection to image processing) it is desirable to construct a sparse estimator x. In this case, a popular approach consists in solving an ℓ1 -penalized least squares problem known as the LASSO or Basis Pursuit DeNoising (BPDN). For sequences of matrices A of increasing dimensions, with independent gaussian entries, we prove that the normalized risk of the LASSO converges to a limit, and we obtain an explicit expression for this limit. Our result is the ﬁrst rigorous derivation of an explicit formula for the asymptotic mean square error of the LASSO for random instances. The proof technique is based on the analysis of AMP, a recently developed efﬁcient algorithm, that is inspired from graphical models ideas. Through simulations on real data matrices (gene expression data and hospital medical records) we observe that these results can be relevant in a broad array of practical applications.

6 0.43678507 92 nips-2010-Fast global convergence rates of gradient methods for high-dimensional statistical recovery

7 0.435366 7 nips-2010-A Family of Penalty Functions for Structured Sparsity

8 0.43394274 238 nips-2010-Short-term memory in neuronal networks through dynamical compressed sensing

9 0.43373603 31 nips-2010-An analysis on negative curvature induced by singularity in multi-layer neural-network learning

10 0.43043178 148 nips-2010-Learning Networks of Stochastic Differential Equations

11 0.43005362 222 nips-2010-Random Walk Approach to Regret Minimization

12 0.4284406 30 nips-2010-An Inverse Power Method for Nonlinear Eigenproblems with Applications in 1-Spectral Clustering and Sparse PCA

13 0.4276239 221 nips-2010-Random Projections for $k$-means Clustering

14 0.4273192 10 nips-2010-A Novel Kernel for Learning a Neuron Model from Spike Train Data

15 0.42729473 261 nips-2010-Supervised Clustering

16 0.42548281 122 nips-2010-Improving the Asymptotic Performance of Markov Chain Monte-Carlo by Inserting Vortices

17 0.42533296 5 nips-2010-A Dirty Model for Multi-task Learning

18 0.42490098 280 nips-2010-Unsupervised Kernel Dimension Reduction

19 0.42449829 136 nips-2010-Large-Scale Matrix Factorization with Missing Data under Additional Constraints

20 0.4244267 204 nips-2010-Penalized Principal Component Regression on Graphs for Analysis of Subnetworks