jmlr jmlr2012 jmlr2012-99 knowledge-graph by maker-knowledge-mining

99 jmlr-2012-Restricted Strong Convexity and Weighted Matrix Completion: Optimal Bounds with Noise

Source: pdf

Author: Sahand Negahban, Martin J. Wainwright

Abstract: We consider the matrix completion problem under a form of row/column weighted entrywise sampling, including the case of uniform entrywise sampling as a special case. We analyze the associated random observation operator, and prove that with high probability, it satisﬁes a form of restricted strong convexity with respect to weighted Frobenius norm. Using this property, we obtain as corollaries a number of error bounds on matrix completion in the weighted Frobenius norm under noisy sampling and for both exact and near low-rank matrices. Our results are based on measures of the “spikiness” and “low-rankness” of matrices that are less restrictive than the incoherence conditions imposed in previous work. Our technique involves an M-estimator that includes controls on both the rank and spikiness of the solution, and we establish non-asymptotic error bounds in weighted Frobenius norm for recovering matrices lying with ℓq -“balls” of bounded spikiness. Using information-theoretic methods, we show that no algorithm can achieve better estimates (up to a logarithmic factor) over these same sets, showing that our conditions on matrices and associated rates are essentially optimal. Keywords: matrix completion, collaborative ﬁltering, convex optimization

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Using this property, we obtain as corollaries a number of error bounds on matrix completion in the weighted Frobenius norm under noisy sampling and for both exact and near low-rank matrices. [sent-8, score-0.645]

2 Our technique involves an M-estimator that includes controls on both the rank and spikiness of the solution, and we establish non-asymptotic error bounds in weighted Frobenius norm for recovering matrices lying with ℓq -“balls” of bounded spikiness. [sent-10, score-0.634]

3 The focus of this paper is low-rank matrix completion based on noisy observations. [sent-16, score-0.328]

4 In this paper, we analyze a method for approximate low-rank matrix recovery using an Mestimator that is a combination of a data term, and a weighted nuclear norm as a regularizer. [sent-26, score-0.444]

5 The nuclear norm is the sum of the singular values of a matrix (Horn and Johnson, 1985), and has been studied in a body of past work, both on matrix completion and more general problems of low-rank matrix estimation (e. [sent-27, score-0.797]

6 Here we limit our detailed discussion to those papers that study various aspects of the matrix completion problem. [sent-37, score-0.281]

7 Among other results, Rohde and Tsybakov (2011) establish prediction error bounds for matrix completion, a different metric than the matrix recovery problem of interest here. [sent-44, score-0.343]

8 In recent work, Salakhutdinov and Srebro (2010) provided various motivations for the use of weighted nuclear norms, in particular showing that the standard nuclear norm relaxation can behave very poorly when the sampling is non-uniform. [sent-45, score-0.458]

9 The analysis of this paper applies to both uniform and nonuniform sampling, as well as a form of reweighted nuclear norm as suggested by these authors, one which includes the ordinary nuclear norm as a special case. [sent-46, score-0.444]

10 As has been noted before (Cand` s and Plan, 2010), a signiﬁcant theoretical challenge is that e conditions that have proven very useful for sparse linear regression—among them the restricted isometry property—are not satisﬁed for the matrix completion problem. [sent-49, score-0.308]

11 For this reason, it is natural to seek an alternative and less restrictive property that might be satisﬁed in the matrix completion setting. [sent-50, score-0.308]

12 If it did hold, then it would be possible to derive non-asymptotic error bounds (in Frobenius norm) for matrix completion based on noisy observations. [sent-57, score-0.382]

13 Our ﬁrst result (Theorem 1) proves that the matrix completion loss function satisﬁes restricted strong convexity with high probability over the set C. [sent-61, score-0.341]

14 Our second result (Theorem 2) exploits this fact to derive a non-asymptotic error bound for matrix recovery in the weighted Frobenius norm, one applicable to general matrices. [sent-62, score-0.322]

15 We then specialize this result to the problem of estimating exactly low-rank matrices (with a small number of non-zero singular values), as well as near low-rank matrices characterized by relatively swift decay of their singular values. [sent-63, score-0.37]

16 To the best of our knowledge, our results on near low-rank matrices are the ﬁrst for approximate matrix recovery in the noisy setting, and as we discuss at more length in Section 3. [sent-64, score-0.334]

17 Indeed, our ﬁnal result (Theorem 3) uses information-theoretic techniques to establish that up to logarithmic factors, no algorithm can obtain faster rates than our method over the ℓq -balls of matrices with bounded spikiness treated in this paper. [sent-66, score-0.364]

18 observations of the form yi = Θ∗ j(i)k(i) + √ ν ξi , dr dc (1) ν Here the quantities √d d ξi correspond to additive observation noises with variance appropriately r c scaled according to the matrix dimensions. [sent-78, score-0.848]

19 , dc } representing probability distributions over the rows and columns of an dr × dc matrix. [sent-97, score-0.983]

20 We consider the weighted sampling model in which we make a noisy observation of entry ( j, k) with probability R jCk /(dr dc ), meaning that the row index j(i) (respectively column index k(i)) is chosen according to the probability distribution R/dr (respectively C/dc ). [sent-98, score-0.472]

21 However, apart from the constraints ∑dr Raa = dr and ∑dc Cbb = dc , we do not require that the a=1 b=1 row and column weights remain bounded as dr and dc tend to inﬁnity. [sent-101, score-1.438]

22 , n, deﬁne the matrix X (i) = dr dc εi ea(i) eT , b(i) where εi ∈ {−1, +1} is a random sign, and consider the observation model yi = X (i) , Θ∗ + ν ξi , for i = 1, . [sent-107, score-0.848]

23 The model (2) is can be obtained from the original model (1) by √ rescaling all terms by the factor dr dc , and introducing the random signs εi . [sent-111, score-0.719]

24 (3) For sample sizes of interest for matrix completion (n ≪ dr dc ) , one cannot expect such a bound to hold uniformly over all matrices Θ ∈ Rdr ×dc , even when rank constraints are imposed. [sent-125, score-1.29]

25 Indeed, for a sample size n ≪ dr dc , we have 11 a vanishing probability of observing the entry Θ∗ , so that Xn (Θ∗ ) = 0 with high probability. [sent-127, score-0.745]

26 In particular, for any non-zero matrix Θ, let us deﬁne (for any non-zero matrix) the weighted spikiness ratio αsp (Θ) := dr dc |||Θ|||ω(∞) , |||Θ|||ω(F) √ √ RΘ C ∞ is the weighted elementwise ℓ∞ -norm. [sent-133, score-1.185]

27 Note that this ratio is inwhere |||Θ|||ω(∞) := √ variant to the scaling of Θ, and satisﬁes the inequalities 1 ≤ αsp (Θ) ≤ dr dc . [sent-134, score-0.758]

28 We have αsp (Θ) = 1 √ for any non-zero matrix whose entries are all equal, whereas the opposite extreme αsp (Θ) = dr dc is achieved by the “maximally spiky” matrix that is zero everywhere except for a single position. [sent-135, score-0.911]

29 In order to provide a tractable measure of how close Θ is to a low-rank matrix, we deﬁne (for any non-zero matrix) the ratio βra (Θ) := |||Θ|||ω(1) |||Θ|||ω(F) which satisﬁes the inequalities 1 ≤ βra (Θ) ≤ min{dr , dc }. [sent-136, score-0.303]

30 By deﬁnition of the (weighted) nuclear and Frobenius norms, note that βra (Θ) is simply the ratio of the ℓ1 to ℓ2 norms of the singular values √ √ of the weighted matrix RΘ C. [sent-137, score-0.403]

31 1 Restricted Strong Convexity for Matrix Sampling 1 Introducing the convenient shorthand d = 2 (dr + dc ), let us deﬁne the constraint set C(n; c0 ) := ∆ ∈ Rdr ×dc , ∆ = 0 | αsp (∆) βra (∆) ≤ 1 c0 L n d log d , (4) where c0 is a universal constant. [sent-149, score-0.376]

32 Note that as the sample size n increases, this set allows for matrices with larger values of the spikiness and/or rank measures, αsp (∆) and βra (∆) respectively. [sent-150, score-0.403]

33 Roughly speaking, this bound guarantees that the observation operator captures a substantial 128Lαsp (∆) √ component of any matrix ∆ ∈ C(n; c0 ) that is not overly spiky. [sent-152, score-0.312]

34 √ As discussed previously, the worst-case value of the “spikiness” measure is αsp (∆) = dr dc , achieved for a matrix that is zero everywhere except a single position. [sent-158, score-0.815]

35 In this most degenerate of α (∆) √ cases, the combination of the constraints sp n < 1 and the membership condition ∆ ∈ C(n; c0 ) imply that even for a rank one matrix (so that βra (∆) = 1), we need sample size n ≫ d 2 for Theorem 1 to provide a non-trivial result, as is to be expected. [sent-159, score-0.393]

36 2 Consequences for Noisy Matrix Completion We now turn to some consequences of Theorem 1 for matrix completion in the noisy setting. [sent-161, score-0.328]

37 Based on this intuition, it is natural to consider the estimator Θ ∈ arg min ∗ |||Θ|||ω(∞) ≤ √α d dr c 1 y − Xn (Θ) 2n 2 2 + λn |||Θ|||ω(1) , (7) where α∗ ≥ 1 is a measure of spikiness, and the regularization parameter λn > 0 serves to control the nuclear norm of the solution. [sent-167, score-0.677]

38 , dr , the error ∆ = Θ − Θ∗ satisﬁes |||∆|||2 ≤ c1 α∗ λ∗ n ω(F) √ r|||∆|||ω(F) + √ √ c1 (α∗ L)2 . [sent-178, score-0.455]

39 σ j ( RΘ∗ C) + ∑ n j=r+1 dr (9) Apart from the trailing O (n−1 ) the term, the bound (9) shows a natural splitting into two terms. [sent-179, score-0.555]

40 When Θ∗ is exactly low-rank, then it is obvious that we should choose r = rank(Θ∗ ), so that the approximation error vanishes—more speciﬁcally, so that √ √ dr ∑ j=r+1 σ j ( RΘ∗ C) j = 0. [sent-184, score-0.455]

41 , zero-mean and sub-exponential, and Θ∗ has rank at most r, Frobenius norm at most 1, and spikiness at most αsp (Θ∗ ) ≤ α∗ . [sent-188, score-0.395]

42 If we solve the SDP (7) with λn = 4ν such that d log d n |||Θ − Θ∗ |||2 ≤ c′ (ν2 ∨ L2 ) (α∗ )2 1 ω(F) then there is a numerical constant c′ 1 c1 (α∗ L)2 rd log d + n n (10) with probability greater than 1 − c2 exp(−c3 log d). [sent-189, score-0.288]

43 Note that this rate has a natural interpretation: since a rank r matrix of dimension dr × dc has roughly r(dr + dc ) free parameters, we require a sample size of this order (up to logarithmic factors) so as to obtain a controlled error bound. [sent-190, score-1.198]

44 (11) For q = 0, this set corresponds to the set of matrices with rank at most r = ρ0 , whereas for values q ∈ (0, 1], it consists of matrices whose (weighted) singular values decay at a relatively fast rate. [sent-202, score-0.361]

45 By applying Theorem 2 to this matrix family, we obtain the following corollary: Corollary 2 (Estimation of near low-rank matrices) Suppose that the noise {ξi } is zero-mean and sub-exponential, Consider a matrix Θ∗ ∈ Bq (ρq ) with spikiness at most αsp (Θ∗ ) ≤ α∗ , and Frobenius norm at most one. [sent-203, score-0.555]

46 Although √ have stated our results in terms of bounds on the weighted squared Frobenius norm we √ |||Θ|||2 = ||| RΘ C|||2 , our assumed lower bound on the entries R and C implies that |||Θ|||2 ≥ F ω(F) ω(F) |||Θ|||2 F . [sent-210, score-0.301]

47 Since we require only a lower bound on the row/column sampling frequencies, our Frobenius norm bounds would not degrade if some rows and/or columns were heavily sampled. [sent-213, score-0.285]

48 In particular, we applied the nuclear norm SDP to simulated data, using Gaussian observation noise with variance ν2 = 0. [sent-216, score-0.289]

49 Figure 1 shows the results in the case of exactly low-rank matrices (q = 0), with the matrix rank given by r = ⌈log2 (d)⌉. [sent-220, score-0.286]

50 , 2010b), a parameter counting argument indicates that roughly n ≈ r (dr + dc ) samples are required to estimate an dr × dc matrix with rank r. [sent-235, score-1.173]

51 Accordingly, in this section, we provide a direct and constructive argument to lower bound the minimax rates of Frobenius norm over classes of matrices that are near low-rank and not overly spiky. [sent-267, score-0.341]

52 More precisely, consider the matrix classes B(ρq ) = Θ ∈ Rd×d | d ∑ σ j (Θ)q ≤ ρq , αsp (Θ) ≤ 32 log d , j=1 corresponding to square d × d matrices that are near low-rank (belonging to the ℓq -balls previously deﬁned (11)), and have a logarithmic spikiness ratio. [sent-269, score-0.519]

53 n The term of primary interest in this bound is the ﬁrst one—namely, ρq term in the bound whenever the ℓq -radius satisﬁes the bound ν2 d ρq ≤ n q ν2 d 1− 2 . [sent-301, score-0.3]

54 d Note that if the noise standard deviation ν tends to zero while the sample size n, matrix size p and rank r all remain ﬁxed, then this bound guarantees that the Frobenius error tends to zero. [sent-318, score-0.324]

55 In the setting of square matrices with δ ≥ r log d , our result yields an upper bound d √ tighter by a factor of order d better than those presented in CP. [sent-324, score-0.282]

56 With this notation, their analysis guarantees error ) bounds of the form d|||∆|||F with high probability, which is a weaker guarantee than our bound rd log d n whenever |||∆|||F ≥ c r log d and n = Ω(d log d). [sent-326, score-0.442]

57 2 C OMPARISON OF M ATRIX C ONDITIONS We now turn to a comparison of the various matrix incoherence assumptions invoked in the analysis of CP and KMO, and comparison to our spikiness condition. [sent-341, score-0.383]

58 Here U ∈ Rd×r and V ∈ Rd×r are matrices of the left and right singular vectors respectively, satisfying U T U = V T V = Ir×r , whereas Σ ∈ Rr×r is a diagonal matrix of the singular values. [sent-344, score-0.342]

59 However, if approximate recovery is the goal—as is necessarily the case in the more realistic setting of noisy observations—then it is clear that a minimal set of sufﬁcient conditions should also involve the singular values, as is the case for our spikiness measure αsp (Θ∗ ). [sent-357, score-0.402]

60 Moreover, as long as t = O (1/d), we are guaranteed that our spikiness measure satisﬁes the bound αsp (Θ∗ ) = O (1). [sent-363, score-0.313]

61 |||Γ|||F α′ (Γ) sp Based on this change of variables, let us deﬁne a modiﬁed version of the constraint set (4) as follows C′ (n; c0 ) = 0 = Γ ∈ Rd×d | α′ (Γ) β′ (Γ) ≤ sp ra 1 c0 n d log d . [sent-382, score-0.547]

62 (17) In this new notation, the lower bound (5) from Theorem 1 can be re-stated as Xn ′ (Γ) √ n 2 128Lα′ (Γ) 1 sp √ ≥ |||Γ|||F 1 − 8 n for all Γ ∈ C′ (n; c0 ). [sent-383, score-0.303]

63 Then there exists a matrix decomposition ∆ = ∆′ + ∆′′ of the error ∆ such that (a) The matrix ∆′ satisﬁes the constraint rank(∆′ ) ≤ 2r, and (b) Given the choice (8), the nuclear norm of ∆′′ is bounded as |||∆′′ |||1 ≤ 3|||∆′ |||1 + 4 dr ∑ σ j (Γ∗ ). [sent-392, score-0.869]

64 (20) j=r+1 Note that the bound (20), combined with triangle inequality, implies that |||∆|||1 ≤ |||∆′ |||1 + |||∆′′ |||1 ≤ 4|||∆′ |||1 + 4 √ ≤ 8 r|||∆|||F + 4 dr ∑ dr ∑ σ j (Γ∗ ) j=r+1 σ j (Γ∗ ) (21) j=r+1 where the second inequality uses the fact that rank(∆′ ) ≤ 2r. [sent-393, score-1.041]

65 In this case, by the deﬁnition (17), we have / |||∆|||2 ≤ c0 L F dr dc ∆ ∞ ≤ 2c0 Lα∗ |||∆|||1 since ∆ ∞ ≤ Γ∗ ∞+ Γ ∞ ≤ 2α∗ √ . [sent-398, score-0.719]

66 dr dc |||∆|||2 ≤ 2c0 L α∗ F |||∆|||1 d log d n d log d , n Now applying the bound (21), we obtain dr d log d √ 8 r|||∆|||F + 4 ∑ σ j (Γ∗ ) . [sent-399, score-1.457]

67 On one hand, if 128Lα′ (∆) √ sp n > 1/2, then we have √ 256L dr dc ∆ √ |||∆|||F ≤ n 1679 ∞ ≤ 512 L α∗ √ . [sent-404, score-0.922]

68 n (23) N EGAHBAN AND WAINWRIGHT On the other hand, if 128Lα′ (∆) √ sp n ≤ 1/2, then from the bound (18), we have Xn ′ (∆) √ n 2 ≥ |||∆|||F 16 (24) with high probability. [sent-405, score-0.303]

69 512 n i=1 From this point onwards, the proof is identical (apart from constants) to Theorem 1 in Negahban and Wainwright (2011), and we obtain that there is a numerical constant c1 such that |||∆|||2 ≤ c1 α∗ λn F √ r|||∆|||F + dr ∑ σ j (Γ∗ ) . [sent-408, score-0.482]

70 Since α∗ ≥ 1, these claims can be summarized in the form |||∆|||2 ≤ c1 max λn , F d log d n √ r|||∆|||F + dr ∑ σ j (Γ∗ ) . [sent-412, score-0.516]

71 3 Proof of Corollary 1 √ √ √ ∗√ When Θ∗ (and hence RΘ∗ C) has rank r < dr , then we have ∑dr C) = 0. [sent-416, score-0.549]

72 Deﬁning the random matrix Y (i) := ξi R−1/2 X (i) C−1/2 , we ﬁrst note that ξi is sub-exponential with parameter 1, √ and |R−1/2 X (i)C−1/2 | has a single entry with magnitude at most L dr dc , which implies that Y (i) ψ1 ≤ Lν dr dc ≤ 2ν Ld. [sent-420, score-1.56]

73 Moreover, we have dr dc e eT e eT R j(i) Ck(i) k(i) j(i) j(i) k(i) dr dc e eT = ν2 E R j(i) Ck(i) k(i) k(i) E[(Y (i) )T Y (i) ] = ν2 E = ν2 dr Idc ×dc . [sent-422, score-1.893]

74 so that |||E[(Y (i) )T Y (i) ]|||2 ≤ 2ν2 d, recalling that 2d = dr + dc ≥ dr . [sent-423, score-1.199]

75 The same bound applies to |||E[Y (i) (Y (i) )T ]|||2 , so that applying Lemma 7 with t = nδ, we conclude that P ||| Since √ 1 n nδ ∑ ξi R−1/2 X (i)C−1/2 |||2 ≥ δ ≤ (dr × dc ) max exp(−nδ2 /(16ν2 d), exp(− 4ν Ld ) . [sent-424, score-0.364]

76 n i=1 dr dc ≤ dr + dc = 2d, if we set δ2 = c2 ν2 d log d for a sufﬁciently large constant c1 , the re1 n sult follows. [sent-425, score-1.499]

77 This choice ensures that dr ∑ j=r+1 σ j (Γ∗ ) = τ dr σ j (Γ∗ ) σ j (Γ∗ ) ≤ τ ∑ ∑ τ τ j=r+1 j=r+1 dr q ≤ τ1−q ρq . [sent-431, score-1.365]

78 F 2ν2 Combined with the bound (26), we obtain the bound P[V = V ] = EXn P[V = V | Xn ] ≥ 1− ( M 2 n )−1 ∑ℓ=k 2ν2 |||Θk − Θℓ |||2 + log 2 F , log M (27) The remainder of the proof hinges on the following technical lemma, which we prove in Appendix A. [sent-456, score-0.349]

79 Note that packing set from Lemma 2 satisﬁes |||Θk − Θℓ |||F ≤ 2δ for all k = ℓ, and hence from Fano bound (27), we obtain 2 2 nδ2 + log 2 ν P[V = V ] ≥ 1 − rd 128 − log 4 2 2 nδ2 + log 2 ν rd 256 2 512 nδ2 + 256 log 2 ν ≥ 1− ≥ 1− If we now choose δ2 = ν2 rd 2048 n , rd . [sent-476, score-0.794]

80 then P[V = V ] ≥ 1 − rd 4 + 256 log 2 1 ≥ , rd 2 where the ﬁnal inequality again uses the bound rd ≥ 1024 log 2. [sent-477, score-0.568]

81 In the special case q = 0, the proof is complete, since the elements Θℓ all have rank r = R0 , and √ satisfy the bound αsp (Θℓ ) ≤ 32 log d. [sent-478, score-0.282]

82 Finally, we note that rd ≥ min ρq n d n q 1− 2 , d2 , n so that we conclude that the minimax error is lower bounded by 1 min ρq 4096 ν2 d n q 1− 2 , ν2 d 2 n for dr sufﬁciently large. [sent-481, score-0.56]

83 Proof of Theorem 1 We now turn to the proof that the sampling operator in weighted matrix completion satisﬁes restricted strong convexity over the set C, as stated in Theorem 1. [sent-484, score-0.524]

84 In order to lighten notation, we prove the theorem in the case dr = dc . [sent-485, score-0.746]

85 In terms of rates, this is a worst-case assumption, effectively amounting to replacing both dr and dc by the worst-case max{dr , dc }. [sent-486, score-0.983]

86 However, since our rates are driven by d = 1 (dr + dc ) and we have the inequalities 2 1 1 max{dr , dc } ≤ (dr + dc ) ≤ max{dr , dc }, 2 2 this change has only an effect on the constant factors. [sent-487, score-1.056]

87 The proof can be extended to the general setting dr = dc by appropriate modiﬁcations if these constant factors are of interest. [sent-488, score-0.746]

88 We now show that in order to establish a tail bound on E (Xn ′ ), it sufﬁces to bound the probability of some simpler events E (Xn ′ ; D), deﬁned below. [sent-494, score-0.292]

89 Combining these two lemmas with the upper bound (33) with δ = D/8, we obtain D D 48L D + +√ + 8 8 n 2 3D 48L ≤ +√ 4 n Zn (D) ≤ 2 nD with probability at least 1 − 4 exp − 8192 , thereby establishing the tail bound (32) and completing the proof of Theorem 1. [sent-534, score-0.391]

90 Discussion In this paper, we have established error bounds for the problem of weighted matrix completion based on partial and noisy observations. [sent-536, score-0.441]

91 (2009) so as to obtain bounds for matrix completion with general losses. [sent-548, score-0.335]

92 F rd 1687 N EGAHBAN AND WAINWRIGHT Since there are less than (M ′ )2 pairs of matrices in total, setting t = 1 yields P |||Θℓ − Θk |||2 rd 7 F ≥ 1 ≥ 1 − 2 exp − + 2 log M ′ ≥ , ℓ,k=1,. [sent-588, score-0.419]

93 Known results on sub-Gaussian matrices (Vershynin, 2012) yield δ 2δ √ √ P √ |||U|||2 ≤ √ r+ d rd rd 3 1 √ √ ≥ 1 − 2 exp − ( r + d)2 ≥ 4 4 for d ≥ 10. [sent-610, score-0.358]

94 Since any violating matrix must fall into some set Sℓ , the union bound implies that P[E (Xn ′ )] ≤ ∞ ∑ P[E (Xn′ ; αℓ µ)] ℓ=1 ∞ ≤ c1 ∑ exp − c2 nα2ℓ µ2 ℓ=1 ∞ ≤ c1 ≤ c1 ∑ exp ℓ=1 − 2c2 log(α) ℓ nµ2 exp(−c′ nµ2 ) 2 . [sent-633, score-0.3]

95 6 in Pisier, 1989), we have log N(δ) ≤ ≤ 3 E δ G, ∆ sup |||∆|||1 ≤ρ(D) 3ρ(D) E |||G|||2 , δ where the second inequality follows from the duality between the nuclear and operator norms. [sent-667, score-0.336]

96 From known results on the operator norms of Gaussian random matrices (Davidson and Szarek, 2001), √ we have the upper bound E[|||G|||2 ] ≤ 2 d, so that log N(δ) ≤ 6ρ(D) √ d, δ thereby establishing the bound (39). [sent-668, score-0.461]

97 Proof of Lemma 6 We prove this lemma by applying a form of Ahlswede-Winter matrix bound (2002), as stated in Appendix F, to the matrix Y (i) := εi X (i) . [sent-711, score-0.327]

98 Ahlswede-Winter Matrix Bound Here we state a Bernstein version of the Ahlswede-Winter (2002) tail bound for the operator norm of a sum of random matrices. [sent-721, score-0.304]

99 Let Y (i) be independent dr × dc zero-mean random matrices such that |||Y (i) |||2 ≤ M, and deﬁne 2 := max |||E[(Y (i) )T Y (i) ]||| , σi |||E[Y (i) (Y (i) )T ]|||2 } as well as σ2 := ∑n σ2 . [sent-723, score-0.815]

100 Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. [sent-879, score-0.318]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('dr', 0.455), ('dc', 0.264), ('xn', 0.241), ('spikiness', 0.213), ('sp', 0.203), ('frobenius', 0.196), ('egahban', 0.189), ('completion', 0.185), ('eighted', 0.162), ('estricted', 0.162), ('ompletion', 0.162), ('trong', 0.145), ('onvexity', 0.145), ('wainwright', 0.14), ('nuclear', 0.134), ('negahban', 0.126), ('atrix', 0.119), ('rsc', 0.118), ('mse', 0.107), ('rd', 0.105), ('bound', 0.1), ('matrix', 0.096), ('matrices', 0.096), ('rank', 0.094), ('rdr', 0.091), ('norm', 0.088), ('keshavan', 0.082), ('ra', 0.08), ('singular', 0.075), ('incoherence', 0.074), ('recovery', 0.067), ('ledoux', 0.067), ('cand', 0.066), ('rescaled', 0.065), ('tail', 0.062), ('log', 0.061), ('srebro', 0.059), ('weighted', 0.059), ('sup', 0.056), ('operator', 0.054), ('bounds', 0.054), ('exp', 0.052), ('universal', 0.051), ('recht', 0.049), ('cp', 0.049), ('exn', 0.047), ('kmo', 0.047), ('rohde', 0.047), ('noisy', 0.047), ('corollaries', 0.045), ('sampling', 0.043), ('corollary', 0.042), ('collaborative', 0.04), ('ratio', 0.039), ('talagrand', 0.037), ('versus', 0.035), ('plan', 0.035), ('lemma', 0.035), ('noiseless', 0.035), ('noise', 0.034), ('sdp', 0.033), ('convexity', 0.033), ('observation', 0.033), ('unitary', 0.031), ('salakhutdinov', 0.031), ('inequality', 0.031), ('sahand', 0.03), ('event', 0.03), ('establish', 0.03), ('ces', 0.03), ('fano', 0.03), ('packing', 0.03), ('ld', 0.03), ('overly', 0.029), ('ck', 0.028), ('panel', 0.028), ('near', 0.028), ('concentration', 0.028), ('restricted', 0.027), ('vershynin', 0.027), ('past', 0.027), ('restrictive', 0.027), ('theorem', 0.027), ('proof', 0.027), ('entry', 0.026), ('plots', 0.025), ('logarithmic', 0.025), ('movies', 0.025), ('ltering', 0.025), ('upper', 0.025), ('recalling', 0.025), ('establishing', 0.025), ('banach', 0.024), ('deza', 0.024), ('entrywise', 0.024), ('mazumber', 0.024), ('spiky', 0.024), ('szarek', 0.024), ('uu', 0.023), ('ix', 0.023)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999982 99 jmlr-2012-Restricted Strong Convexity and Weighted Matrix Completion: Optimal Bounds with Noise

Author: Sahand Negahban, Martin J. Wainwright

2 0.14539506 67 jmlr-2012-Minimax-Optimal Rates For Sparse Additive Models Over Kernel Classes Via Convex Programming

Author: Garvesh Raskutti, Martin J. Wainwright, Bin Yu

Abstract: Sparse additive models are families of d-variate functions with the additive decomposition f ∗ = ∑ j∈S f j∗ , where S is an unknown subset of cardinality s ≪ d. In this paper, we consider the case where each univariate component function f j∗ lies in a reproducing kernel Hilbert space (RKHS), and analyze a method for estimating the unknown function f ∗ based on kernels combined with ℓ1 -type convex regularization. Working within a high-dimensional framework that allows both the dimension d and sparsity s to increase with n, we derive convergence rates in the L2 (P) and L2 (Pn ) norms over the class Fd,s,H of sparse additive models with each univariate function f j∗ in the unit ball of a univariate RKHS with bounded kernel function. We complement our upper bounds by deriving minimax lower bounds on the L2 (P) error, thereby showing the optimality of our method. Thus, we obtain optimal minimax rates for many interesting classes of sparse additive models, including polynomials, splines, and Sobolev classes. We also show that if, in contrast to our univariate conditions, the d-variate function class is assumed to be globally bounded, then much √ faster estimation rates are possible for any sparsity s = Ω( n), showing that global boundedness is a signiﬁcant restriction in the high-dimensional setting. Keywords: sparsity, kernel, non-parametric, convex, minimax

3 0.13441356 51 jmlr-2012-Integrating a Partial Model into Model Free Reinforcement Learning

Author: Aviv Tamar, Dotan Di Castro, Ron Meir

Abstract: In reinforcement learning an agent uses online feedback from the environment in order to adaptively select an effective policy. Model free approaches address this task by directly mapping environmental states to actions, while model based methods attempt to construct a model of the environment, followed by a selection of optimal actions based on that model. Given the complementary advantages of both approaches, we suggest a novel procedure which augments a model free algorithm with a partial model. The resulting hybrid algorithm switches between a model based and a model free mode, depending on the current state and the agent’s knowledge. Our method relies on a novel deﬁnition for a partially known model, and an estimator that incorporates such knowledge in order to reduce uncertainty in stochastic approximation iterations. We prove that such an approach leads to improved policy evaluation whenever environmental knowledge is available, without compromising performance when such knowledge is absent. Numerical simulations demonstrate the effectiveness of the approach on policy gradient and Q-learning algorithms, and its usefulness in solving a call admission control problem. Keywords: reinforcement learning, temporal difference, stochastic approximation, markov decision processes, hybrid model based model free algorithms

4 0.12966828 52 jmlr-2012-Iterative Reweighted Algorithms for Matrix Rank Minimization

Author: Karthik Mohan, Maryam Fazel

Abstract: The problem of minimizing the rank of a matrix subject to afﬁne constraints has applications in several areas including machine learning, and is known to be NP-hard. A tractable relaxation for this problem is nuclear norm (or trace norm) minimization, which is guaranteed to ﬁnd the minimum rank matrix under suitable assumptions. In this paper, we propose a family of Iterative Reweighted Least Squares algorithms IRLS-p (with 0 ≤ p ≤ 1), as a computationally efﬁcient way to improve over the performance of nuclear norm minimization. The algorithms can be viewed as (locally) minimizing certain smooth approximations to the rank function. When p = 1, we give theoretical guarantees similar to those for nuclear norm minimization, that is, recovery of low-rank matrices under certain assumptions on the operator deﬁning the constraints. For p < 1, IRLSp shows better empirical performance in terms of recovering low-rank matrices than nuclear norm minimization. We provide an efﬁcient implementation for IRLS-p, and also present a related family of algorithms, sIRLS-p. These algorithms exhibit competitive run times and improved recovery when compared to existing algorithms for random instances of the matrix completion problem, as well as on the MovieLens movie recommendation data set. Keywords: matrix rank minimization, matrix completion, iterative algorithms, null-space property

5 0.091402829 97 jmlr-2012-Regularization Techniques for Learning with Matrices

Author: Sham M. Kakade, Shai Shalev-Shwartz, Ambuj Tewari

Abstract: There is growing body of learning problems for which it is natural to organize the parameters into a matrix. As a result, it becomes easy to impose sophisticated prior knowledge by appropriately regularizing the parameters under some matrix norm. This work describes and analyzes a systematic method for constructing such matrix-based regularization techniques. In particular, we focus on how the underlying statistical properties of a given problem can help us decide which regularization function is appropriate. Our methodology is based on a known duality phenomenon: a function is strongly convex with respect to some norm if and only if its conjugate function is strongly smooth with respect to the dual norm. This result has already been found to be a key component in deriving and analyzing several learning algorithms. We demonstrate the potential of this framework by deriving novel generalization and regret bounds for multi-task learning, multi-class learning, and multiple kernel learning. Keywords: regularization, strong convexity, regret bounds, generalization bounds, multi-task learning, multi-class learning, multiple kernel learning

6 0.085422039 43 jmlr-2012-Fast Approximation of Matrix Coherence and Statistical Leverage

7 0.071486622 71 jmlr-2012-Multi-Instance Learning with Any Hypothesis Class

8 0.0707874 59 jmlr-2012-Linear Regression With Random Projections

9 0.06951049 48 jmlr-2012-High-Dimensional Gaussian Graphical Model Selection: Walk Summability and Local Separation Criterion

10 0.06873031 2 jmlr-2012-A Comparison of the Lasso and Marginal Regression

11 0.065244749 93 jmlr-2012-Quantum Set Intersection and its Application to Associative Memory

12 0.061633173 111 jmlr-2012-Structured Sparsity and Generalization

13 0.059416831 83 jmlr-2012-Online Learning in the Embedded Manifold of Low-rank Matrices

14 0.052671857 68 jmlr-2012-Minimax Manifold Estimation

15 0.052102201 80 jmlr-2012-On Ranking and Generalization Bounds

16 0.051663607 82 jmlr-2012-On the Necessity of Irrelevant Variables

17 0.051490612 39 jmlr-2012-Estimation and Selection via Absolute Penalized Convex Minimization And Its Multistage Adaptive Applications

18 0.050102394 81 jmlr-2012-On the Convergence Rate oflp-Norm Multiple Kernel Learning

19 0.049715575 103 jmlr-2012-Sampling Methods for the Nyström Method

20 0.049229171 29 jmlr-2012-Consistent Model Selection Criteria on High Dimensions

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.242), (1, 0.094), (2, -0.093), (3, -0.019), (4, -0.044), (5, -0.082), (6, -0.088), (7, -0.163), (8, 0.005), (9, -0.203), (10, -0.165), (11, 0.068), (12, -0.043), (13, 0.103), (14, 0.059), (15, -0.092), (16, 0.204), (17, -0.177), (18, -0.012), (19, -0.106), (20, -0.214), (21, -0.184), (22, 0.099), (23, -0.041), (24, -0.13), (25, 0.022), (26, -0.196), (27, -0.01), (28, -0.062), (29, -0.017), (30, -0.01), (31, -0.073), (32, -0.043), (33, 0.101), (34, -0.166), (35, 0.042), (36, -0.006), (37, -0.016), (38, 0.029), (39, -0.089), (40, -0.04), (41, 0.037), (42, 0.047), (43, 0.117), (44, 0.007), (45, 0.085), (46, -0.029), (47, -0.082), (48, 0.04), (49, 0.073)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94549757 99 jmlr-2012-Restricted Strong Convexity and Weighted Matrix Completion: Optimal Bounds with Noise

Author: Sahand Negahban, Martin J. Wainwright

2 0.69354689 52 jmlr-2012-Iterative Reweighted Algorithms for Matrix Rank Minimization

Author: Karthik Mohan, Maryam Fazel

3 0.57112986 51 jmlr-2012-Integrating a Partial Model into Model Free Reinforcement Learning

Author: Aviv Tamar, Dotan Di Castro, Ron Meir

4 0.55713576 67 jmlr-2012-Minimax-Optimal Rates For Sparse Additive Models Over Kernel Classes Via Convex Programming

Author: Garvesh Raskutti, Martin J. Wainwright, Bin Yu

5 0.45190787 43 jmlr-2012-Fast Approximation of Matrix Coherence and Statistical Leverage

Author: Petros Drineas, Malik Magdon-Ismail, Michael W. Mahoney, David P. Woodruff

Abstract: The statistical leverage scores of a matrix A are the squared row-norms of the matrix containing its (top) left singular vectors and the coherence is the largest leverage score. These quantities are of interest in recently-popular problems such as matrix completion and Nystr¨ m-based low-rank o matrix approximation as well as in large-scale statistical data analysis applications more generally; moreover, they are of interest since they deﬁne the key structural nonuniformity that must be dealt with in developing fast randomized matrix algorithms. Our main result is a randomized algorithm that takes as input an arbitrary n × d matrix A, with n ≫ d, and that returns as output relative-error approximations to all n of the statistical leverage scores. The proposed algorithm runs (under assumptions on the precise values of n and d) in O(nd log n) time, as opposed to the O(nd 2 ) time required by the na¨ve algorithm that involves computing an orthogonal basis for the ı range of A. Our analysis may be viewed in terms of computing a relative-error approximation to an underconstrained least-squares approximation problem, or, relatedly, it may be viewed as an application of Johnson-Lindenstrauss type ideas. Several practically-important extensions of our basic result are also described, including the approximation of so-called cross-leverage scores, the extension of these ideas to matrices with n ≈ d, and the extension to streaming environments. Keywords: matrix coherence, statistical leverage, randomized algorithm

6 0.44472146 71 jmlr-2012-Multi-Instance Learning with Any Hypothesis Class

7 0.43697816 48 jmlr-2012-High-Dimensional Gaussian Graphical Model Selection: Walk Summability and Local Separation Criterion

8 0.43246001 93 jmlr-2012-Quantum Set Intersection and its Application to Associative Memory

9 0.39198145 59 jmlr-2012-Linear Regression With Random Projections

10 0.35854301 83 jmlr-2012-Online Learning in the Embedded Manifold of Low-rank Matrices

11 0.35663781 97 jmlr-2012-Regularization Techniques for Learning with Matrices

12 0.32930979 2 jmlr-2012-A Comparison of the Lasso and Marginal Regression

13 0.31535313 111 jmlr-2012-Structured Sparsity and Generalization

14 0.3100571 81 jmlr-2012-On the Convergence Rate oflp-Norm Multiple Kernel Learning

15 0.30008745 46 jmlr-2012-Finite-Sample Analysis of Least-Squares Policy Iteration

16 0.29278517 29 jmlr-2012-Consistent Model Selection Criteria on High Dimensions

17 0.28147173 68 jmlr-2012-Minimax Manifold Estimation

18 0.2814554 103 jmlr-2012-Sampling Methods for the Nyström Method

19 0.2779426 108 jmlr-2012-Sparse and Unique Nonnegative Matrix Factorization Through Data Preprocessing

20 0.27734008 39 jmlr-2012-Estimation and Selection via Absolute Penalized Convex Minimization And Its Multistage Adaptive Applications

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(7, 0.021), (21, 0.036), (26, 0.04), (29, 0.023), (35, 0.011), (69, 0.013), (75, 0.032), (79, 0.012), (92, 0.616), (96, 0.096)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.99224091 59 jmlr-2012-Linear Regression With Random Projections

Author: Odalric-Ambrym Maillard, Rémi Munos

Abstract: We investigate a method for regression that makes use of a randomly generated subspace GP ⊂ F (of ﬁnite dimension P) of a given large (possibly inﬁnite) dimensional function space F , for example, L2 ([0, 1]d ; R). GP is deﬁned as the span of P random features that are linear combinations of a basis functions of F weighted by random Gaussian i.i.d. coefﬁcients. We show practical motivation for the use of this approach, detail the link that this random projections method share with RKHS and Gaussian objects theory and prove, both in deterministic and random design, approximation error bounds when searching for the best regression function in GP rather than in F , and derive excess risk bounds for a speciﬁc regression algorithm (least squares regression in GP ). This paper stresses the motivation to study such methods, thus the analysis developed is kept simple for explanations purpose and leaves room for future developments. Keywords: regression, random matrices, dimension reduction

same-paper 2 0.99131262 99 jmlr-2012-Restricted Strong Convexity and Weighted Matrix Completion: Optimal Bounds with Noise

Author: Sahand Negahban, Martin J. Wainwright

3 0.98217118 51 jmlr-2012-Integrating a Partial Model into Model Free Reinforcement Learning

Author: Aviv Tamar, Dotan Di Castro, Ron Meir

4 0.9771111 25 jmlr-2012-Characterization and Greedy Learning of Interventional Markov Equivalence Classes of Directed Acyclic Graphs

Author: Alain Hauser, Peter Bühlmann

Abstract: The investigation of directed acyclic graphs (DAGs) encoding the same Markov property, that is the same conditional independence relations of multivariate observational distributions, has a long tradition; many algorithms exist for model selection and structure learning in Markov equivalence classes. In this paper, we extend the notion of Markov equivalence of DAGs to the case of interventional distributions arising from multiple intervention experiments. We show that under reasonable assumptions on the intervention experiments, interventional Markov equivalence deﬁnes a ﬁner partitioning of DAGs than observational Markov equivalence and hence improves the identiﬁability of causal models. We give a graph theoretic criterion for two DAGs being Markov equivalent under interventions and show that each interventional Markov equivalence class can, analogously to the observational case, be uniquely represented by a chain graph called interventional essential graph (also known as CPDAG in the observational case). These are key insights for deriving a generalization of the Greedy Equivalence Search algorithm aimed at structure learning from interventional data. This new algorithm is evaluated in a simulation study. Keywords: causal inference, interventions, graphical model, Markov equivalence, greedy equivalence search

5 0.91197526 46 jmlr-2012-Finite-Sample Analysis of Least-Squares Policy Iteration

Author: Alessandro Lazaric, Mohammad Ghavamzadeh, Rémi Munos

Abstract: In this paper, we report a performance bound for the widely used least-squares policy iteration (LSPI) algorithm. We ﬁrst consider the problem of policy evaluation in reinforcement learning, that is, learning the value function of a ﬁxed policy, using the least-squares temporal-difference (LSTD) learning method, and report ﬁnite-sample analysis for this algorithm. To do so, we ﬁrst derive a bound on the performance of the LSTD solution evaluated at the states generated by the Markov chain and used by the algorithm to learn an estimate of the value function. This result is general in the sense that no assumption is made on the existence of a stationary distribution for the Markov chain. We then derive generalization bounds in the case when the Markov chain possesses a stationary distribution and is β-mixing. Finally, we analyze how the error at each policy evaluation step is propagated through the iterations of a policy iteration method, and derive a performance bound for the LSPI algorithm. Keywords: Markov decision processes, reinforcement learning, least-squares temporal-difference, least-squares policy iteration, generalization bounds, ﬁnite-sample analysis

6 0.89010698 67 jmlr-2012-Minimax-Optimal Rates For Sparse Additive Models Over Kernel Classes Via Convex Programming

7 0.86641264 34 jmlr-2012-Dynamic Policy Programming

8 0.82239354 68 jmlr-2012-Minimax Manifold Estimation

9 0.8175503 111 jmlr-2012-Structured Sparsity and Generalization

10 0.79564983 29 jmlr-2012-Consistent Model Selection Criteria on High Dimensions

11 0.79080337 82 jmlr-2012-On the Necessity of Irrelevant Variables

12 0.7726987 73 jmlr-2012-Multi-task Regression using Minimal Penalties

13 0.76874471 15 jmlr-2012-Algebraic Geometric Comparison of Probability Distributions

14 0.76299155 7 jmlr-2012-A Multi-Stage Framework for Dantzig Selector and LASSO

15 0.75638062 96 jmlr-2012-Refinement of Operator-valued Reproducing Kernels

16 0.75302529 13 jmlr-2012-Active Learning via Perfect Selective Classification

17 0.75235909 80 jmlr-2012-On Ranking and Generalization Bounds

18 0.75090539 8 jmlr-2012-A Primal-Dual Convergence Analysis of Boosting

19 0.74362808 71 jmlr-2012-Multi-Instance Learning with Any Hypothesis Class

20 0.74174535 64 jmlr-2012-Manifold Identification in Dual Averaging for Regularized Stochastic Online Learning