nips nips2011 nips2011-73 knowledge-graph by maker-knowledge-mining

73 nips-2011-Divide-and-Conquer Matrix Factorization

Source: pdf

Author: Lester W. Mackey, Michael I. Jordan, Ameet Talwalkar

Abstract: This work introduces Divide-Factor-Combine (DFC), a parallel divide-andconquer framework for noisy matrix factorization. DFC divides a large-scale matrix factorization task into smaller subproblems, solves each subproblem in parallel using an arbitrary base matrix factorization algorithm, and combines the subproblem solutions using techniques from randomized matrix approximation. Our experiments with collaborative ﬁltering, video background modeling, and simulated data demonstrate the near-linear to super-linear speed-ups attainable with this approach. Moreover, our analysis shows that DFC enjoys high-probability recovery guarantees comparable to those of its base algorithm.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Jordana, b Department of Electrical Engineering and Computer Science, UC Berkeley b Department of Statistics, UC Berkeley Abstract This work introduces Divide-Factor-Combine (DFC), a parallel divide-andconquer framework for noisy matrix factorization. [sent-2, score-0.245]

2 DFC divides a large-scale matrix factorization task into smaller subproblems, solves each subproblem in parallel using an arbitrary base matrix factorization algorithm, and combines the subproblem solutions using techniques from randomized matrix approximation. [sent-3, score-0.788]

3 Our experiments with collaborative ﬁltering, video background modeling, and simulated data demonstrate the near-linear to super-linear speed-ups attainable with this approach. [sent-4, score-0.123]

4 Moreover, our analysis shows that DFC enjoys high-probability recovery guarantees comparable to those of its base algorithm. [sent-5, score-0.307]

5 1 Introduction The goal in matrix factorization is to recover a low-rank matrix from irrelevant noise and corruption. [sent-6, score-0.27]

6 We focus on two instances of the problem: noisy matrix completion, i. [sent-7, score-0.171]

7 , recovering a low-rank matrix from a small subset of noisy entries, and noisy robust matrix factorization [2, 3, 4], i. [sent-9, score-0.463]

8 , recovering a low-rank matrix from corruption by noise and outliers of arbitrary magnitude. [sent-11, score-0.141]

9 These two classes of matrix factorization problems have attracted signiﬁcant interest in the research community. [sent-13, score-0.186]

10 In particular, convex formulations of noisy matrix factorization have been shown to admit strong theoretical recovery guarantees [1, 2, 3, 20], and a variety of algorithms (e. [sent-14, score-0.448]

11 , [15, 16, 23]) have been developed for solving both matrix completion and robust matrix factorization via convex relaxation. [sent-16, score-0.343]

12 To improve scalability and leverage the growing availability of parallel computing architectures, we propose a divide-and-conquer framework for large-scale matrix factorization. [sent-18, score-0.182]

13 The inherent parallelism of DFC allows for near-linear to superlinear speedups in practice, while our theory provides high-probability recovery guarantees for DFC comparable to those enjoyed by its base algorithm. [sent-20, score-0.352]

14 In Section 2, we deﬁne the setting of noisy matrix factorization and introduce the components of the DFC framework. [sent-22, score-0.273]

15 To illustrate the signiﬁcant speed-up and robustness of DFC and to highlight the effectiveness of DFC ensembling, we present experimental results on collaborative ﬁltering, video background modeling, and simulated data in Section 3. [sent-23, score-0.123]

16 There, we establish high-probability noisy recovery guarantees for DFC that rest upon a novel analysis of randomized matrix approximation and a new recovery result for noisy matrix completion. [sent-25, score-0.681]

17 We deﬁne M+ = VM Σ−1 UM as the Moore-Penrose pseudoinverse of M and PM = MM+ as the M orthogonal projection onto the column space of M. [sent-28, score-0.097]

18 We let · 2 , · F , and · ∗ respectively denote the spectral, Frobenius, and nuclear norms of a matrix and let · represent the 2 norm of a vector. [sent-29, score-0.084]

19 2 The Divide-Factor-Combine Framework In this section, we present our divide-and-conquer framework for scalable noisy matrix factorization. [sent-30, score-0.171]

20 1 Noisy Matrix Factorization (MF) In the setting of noisy matrix factorization, we observe a subset of the entries of a matrix M = L0 + S0 + Z0 ∈ Rm×n , where L0 has rank r m, n, S0 represents a sparse matrix of outliers of arbitrary magnitude, and Z0 is a dense noise matrix. [sent-33, score-0.477]

21 We let Ω represent the locations of the observed entries and PΩ be the orthogonal projection onto the space of m × n matrices with support Ω, so that (PΩ (M))ij = Mij , if (i, j) ∈ Ω and (PΩ (M))ij = 0 otherwise. [sent-34, score-0.138]

22 Our goal is to recover the low-rank matrix L0 from PΩ (M) with error proportional to the noise level ∆ Z0 F . [sent-35, score-0.084]

23 We will focus on two speciﬁc instances of this general problem: • Noisy Matrix Completion (MC): s |Ω| entries of M are revealed uniformly without replacement, along with their locations. [sent-36, score-0.111]

24 • Noisy Robust Matrix Factorization (RMF): S0 is identically zero save for s outlier entries of arbitrary magnitude with unknown locations distributed uniformly without replacement. [sent-38, score-0.087]

25 Each algorithm has three simple steps: (D step) Divide input matrix into submatrices: DFC-P ROJ randomly partitions PΩ (M) into t lcolumn submatrices, {PΩ (C1 ), . [sent-42, score-0.084]

26 (F step) Factor each submatrix in parallel using any base MF algorithm: DFC-P ROJ performs t parallel submatrix factorizations, while DFC-N YS performs two such parallel factorizaˆ ˆ tions. [sent-46, score-0.548]

27 Standard base MF algorithms output the low-rank approximations {C1 , . [sent-47, score-0.132]

28 ˆ DFC-P ROJ and C, ˆ (C step) Combine submatrix estimates: DFC-P ROJ generates a ﬁnal low-rank estimate Lproj by ˆ 1 , . [sent-52, score-0.097]

29 , Ct ] onto the column space of C1 , while DFC-N YS forms the lowˆ ˆ projecting [C ˆ ˆ ˆ rank estimate Lnys from C and R via the generalized Nystr¨ m method. [sent-55, score-0.087]

30 These matrix o approximation techniques are described in more detail in Section 2. [sent-56, score-0.084]

31 3 Randomized Matrix Approximations Our divide-and-conquer algorithms rely on two methods that generate randomized low-rank approximations to an arbitrary matrix M from submatrices of M. [sent-59, score-0.183]

32 2 Algorithm 1 DFC-P ROJ Input: PΩ (M), t {PΩ (Ci )}1≤i≤t = S AMP C OL(PΩ (M), t) do in parallel ˆ C1 = BASE -MF-A LG(PΩ (C1 )) . [sent-62, score-0.074]

33 We begin by sampling l < n columns uniformly without replacement and let C be the m × l matrix of sampled columns. [sent-71, score-0.18]

34 Then, column projection uses C to generate a “matrix projection” approximation [13] of M as follows: Lproj = CC+ M = UC UC M. [sent-72, score-0.076]

35 In particular, after sampling columns to obtain C, imagine that we independently sample d < m rows uniformly without replacement. [sent-77, score-0.115]

36 Let R be the d × n matrix of sampled rows and W be the d × l matrix formed from the intersection of the sampled rows and columns. [sent-78, score-0.31]

37 The cost of combining the submatrix estimates is even smaller, since the outputs of standard MF algorithms are returned in factored form. [sent-84, score-0.119]

38 Indeed, the column projection step of DFC-P ROJ requires only O(mk 2 + lk 2 ) time for ˆ k maxi kCi : O(mk 2 + lk 2 ) time for the pseudoinversion of C1 and O(mk 2 + lk 2 ) time for maˆ i in parallel. [sent-85, score-0.184]

39 Hence, DFC divides the expensive task of matrix factorization into smaller subproblems that can be executed in parallel and efﬁciently combines the low-rank, factored results. [sent-87, score-0.348]

40 5 Ensemble Methods Ensemble methods have been shown to improve performance of matrix approximation algorithms, while straightforwardly leveraging the parallelism of modern many-core and distributed architectures [14]. [sent-89, score-0.128]

41 As such, we propose ensemble variants of the DFC algorithms that demonstrably reduce recovery error while introducing a negligible cost to the parallel running time. [sent-90, score-0.233]

42 , Ct ] onto the ˆ column space of each Ci in parallel and then average the t resulting low-rank approximations. [sent-94, score-0.139]

43 For DFC-N YS -E NS, we choose a random d-row submatrix PΩ (R) as in DFC-N YS and independently partition the columns of PΩ (M) into {PΩ (C1 ), . [sent-95, score-0.163]

44 After running the 3 ˆ ˆ base MF algorithm on each submatrix, we apply the generalized Nystr¨ m method to each (Ci , R) o pair in parallel and average the t resulting low-rank approximations. [sent-99, score-0.206]

45 We use state-of-the-art matrix factorization algorithms in our experiments: the Accelerated Proximal Gradient (APG) algorithm of [23] as our base noisy MC algorithm and the APG algorithm of [15] as our base noisy RMF algorithm. [sent-102, score-0.624]

46 In all experiments, we use the default parameter settings suggested by [23] and [15], measure recovery error via root mean square error (RMSE), and report parallel running times for DFC. [sent-103, score-0.21]

47 We moreover compare against two baseline methods: APG used on the full matrix M and PARTITION, which performs matrix factorization on t submatrices just like DFCP ROJ but omits the ﬁnal column projection step. [sent-104, score-0.417]

48 We created L0 ∈ Rm×m as a random product, AB , where A and B are m × r matrices with independent N (0, 1/r) entries such that each entry of L0 has unit variance. [sent-109, score-0.085]

49 In the MC setting, s entries of L0 + Z0 were revealed uniformly at random. [sent-112, score-0.111]

50 In the RMF setting, the support of S0 was generated uniformly at random, and the s corrupted entries took values in [0, 1] with uniform probability. [sent-113, score-0.108]

51 05 2 4 6 8 0 0 10 % revealed entries 10 20 30 40 % of outliers 50 60 70 Figure 1: Recovery error of DFC relative to base algorithms. [sent-125, score-0.272]

52 We ﬁrst explored the recovery error of DFC as a function of s, using (m = 10K, r = 10) with varying observation sparsity for MC and (m = 1K, r = 10) with a varying percentage of outliers for RMF. [sent-126, score-0.193]

53 2 In both MC and RMF, the gaps in recovery between APG and DFC are small when sampling only 10% of rows and columns. [sent-128, score-0.181]

54 We next explored the speed-up of DFC as a function of matrix size. [sent-130, score-0.084]

55 For MC, we revealed 4% of the matrix entries and set r = 0. [sent-131, score-0.167]

56 We sampled 10% of rows and columns and observed that recovery errors were comparable to the errors presented in Figure 1 for similar settings of s; in particular, at all values of n for both MC and RMF, the errors of APG and DFC-P ROJ -E NS were nearly identical. [sent-134, score-0.249]

57 Our timing results, presented in Figure 2, illustrate a near-linear speed-up for MC and a superlinear speed-up for RMF across varying matrix sizes. [sent-135, score-0.108]

58 Note that the timing curves of the DFC algorithms and PARTITION all overlap, a fact that highlights the minimal computational cost of the ﬁnal matrix approximation step. [sent-136, score-0.084]

59 5 m 0 5 4 1000 2000 3000 m x 10 4000 5000 Figure 2: Speed-up of DFC relative to base algorithms. [sent-142, score-0.132]

60 2 Collaborative Filtering Collaborative ﬁltering for recommender systems is one prevalent real-world application of noisy matrix completion. [sent-144, score-0.213]

61 A collaborative ﬁltering dataset can be interpreted as the incomplete observation of a ratings matrix with columns corresponding to users and rows corresponding to items. [sent-145, score-0.225]

62 To generate test sets drawn from the training distribution, for each dataset, we aggregated all available rating data into a single training set and withheld test entries uniformly at random, while ensuring that at least one training observation remained in each row and column. [sent-148, score-0.087]

63 Since these matrices have many more columns than rows, MF on column submatrices is inherently easier than MF on row submatrices, and for ˆ ˆ DFC-N YS, we observe that C is an accurate estimate while R is not. [sent-154, score-0.183]

64 This problem can be framed as an application of noisy RMF, where each video frame is a column of some matrix (M), the background model is low-rank (L0 ), and moving objects and 3 4 http://www. [sent-202, score-0.284]

65 We evaluate DFC on two videos: ‘Hall’ (200 frames of size 176 × 144) contains signiﬁcant foreground variation and was studied by [2], while ‘Lobby’ (1546 frames of size 168×120) includes many changes in illumination (a smaller video with 250 frames was studied by [2]). [sent-209, score-0.096]

66 2s) Figure 3: Sample ‘Hall’ recovery by APG, DFC-P ROJ -E NS-5%, and DFC-P ROJ -E NS-. [sent-229, score-0.136]

67 4 Theoretical Analysis Having investigated the empirical advantages of DFC, we now show that DFC admits highprobability recovery guarantees comparable to those of its base algorithm. [sent-231, score-0.333]

68 1 Matrix Coherence Since not all matrices can be recovered from missing entries or gross outliers, recent theoretical advances have studied sufﬁcient conditions for accurate noisy MC [3, 12, 20] and RMF [1, 25]. [sent-233, score-0.216]

69 Most prevalent among these are matrix coherence conditions, which limit the extent to which the singular vectors of a matrix are correlated with the standard basis. [sent-234, score-0.254]

70 Letting ei be the ith column of the standard basis, we deﬁne two standard notions of coherence [22]: Deﬁnition 1 (µ0 -Coherence). [sent-235, score-0.091]

71 For any µ > 0, we will call a matrix L (µ, r)-coherent if rank(L) = r, max(µ0 (UL ), µ0 (VL )) ≤ √ µ, and µ1 (L) ≤ µ. [sent-241, score-0.084]

72 Our analysis will focus on base MC and RMF algorithms that express their recovery guarantees in terms of the (µ, r)-coherence of the target low-rank matrix L0 . [sent-242, score-0.391]

73 For such algorithms, lower values of µ correspond to better recovery properties. [sent-243, score-0.136]

74 2 DFC Master Theorem We now show that the same coherence conditions that allow for accurate MC and RMF also imply high-probability recovery for DFC. [sent-245, score-0.183]

75 We further ﬁx any , δ ∈ (0, 1] and deﬁne A(X) rµ2 as the event that a matrix X is ( 1− /2 , r)-coherent. [sent-247, score-0.084]

76 3 provides a generic recovery bound for DFC when used in combination with an arbitrary base algorithm. [sent-249, score-0.268]

77 The proof requires a novel, coherence-based analysis of column projection and random column sampling. [sent-250, score-0.12]

78 Choose t = n/l and l ≥ crµ log(n) log(2/δ)/ 2 , where c is a ﬁxed positive constant, and ﬁx any ce ≥ 0. [sent-253, score-0.076]

79 Under the notation of Algorithm 1, if a base MF algorithm yields √ ˆ P C0,i − Ci F > ce ml∆ | A(C0,i ) ≤ δC for each i, where C0,i is the corresponding partition of L0 , then, with probability at least (1 − δ)(1 − tδC ), DFC-P ROJ guarantees √ ˆ L0 − Lproj F ≤ (2 + )ce mn∆. [sent-254, score-0.271]

80 √ ˆ Under Algorithm 2, if a base MF algorithm yields P C0 − C F > ce ml∆ | A(C) ≤ δC √ ˆ ˆ and P R0 − R F > ce dn∆ | A(R) ≤ δR for d ≥ clµ0 (C) log(m) log(1/δ)/ 2 , then, with probability at least (1 − δ)2 (1 − δC − δR ), DFC-N YS guarantees √ ˆ L0 − Lnys F ≤ (2 + 3 )ce ml + dn∆. [sent-255, score-0.353]

81 3, consider a typical base algorithm which, when applied to √ ˆ ˆ PΩ (M), recovers an estimate L satisfying L0 − L F ≤ ce mn∆ with high probability. [sent-257, score-0.208]

82 3 asserts that, with appropriately reduced probability, DFC-P ROJ exhibits the same recovery error scaled by an adjustable factor of 2 + , while DFC-N YS exhibits a somewhat smaller error scaled by 2+3 . [sent-259, score-0.136]

83 Thus, DFC can quickly provide near-optimal recovery in the noisy setting and exact recovery in the noiseless setting (∆ = 0), even when entries are missing or grossly corrupted. [sent-261, score-0.418]

84 3 can be applied to derive speciﬁc DFC recovery guarantees for noisy MC and noisy RMF. [sent-263, score-0.349]

85 3 shows that DFC retains the high-probability recovery guarantees of a standard MC solver while operating on matrices of much smaller dimension. [sent-267, score-0.201]

86 Suppose that a base MC algorithm solves the following convex optimization problem, studied in [3]: minimizeL L subject to ∗ PΩ (M − L) F ≤ ∆. [sent-268, score-0.154]

87 4 follows from a novel guarantee for noisy convex MC, proved in the appendix. [sent-270, score-0.087]

88 4 allows for the fraction of columns and rows sampled to decrease as the oversampling parameter βs increases with m and n. [sent-280, score-0.142]

89 4 requires only O( m log2 (m + n)) sampled columns and O( m log2 (m + n)) sampled rows. [sent-282, score-0.094]

90 4 requires the number of sampled columns and rows to grow linearly with the matrix dimensions. [sent-284, score-0.197]

91 In this setting, √ only O( m + n) columns and rows are required by Cor. [sent-286, score-0.087]

92 4 Consequences for Noisy RMF Our next corollary shows that DFC retains the high-probability recovery guarantees of a standard RMF solver while operating on matrices of much smaller dimension. [sent-289, score-0.221]

93 Suppose that a base RMF algorithm solves the following convex optimization problem, studied in [25]: minimizeL,S L ∗ + λ S 1 subject to M − L − S F ≤ ∆, √ with λ = 1/ n. [sent-290, score-0.154]

94 5 places only very mild restrictions on the number of columns and rows to be sampled. [sent-303, score-0.087]

95 Indeed, l and d need only grow poly-logarithmically in the matrix dimensions to achieve highprobability noisy recovery. [sent-304, score-0.197]

96 5 Conclusions To improve the scalability of existing matrix factorization algorithms while leveraging the ubiquity of parallel computing architectures, we introduced, evaluated, and analyzed DFC, a divide-andconquer framework for noisy matrix factorization with missing entries or outliers. [sent-305, score-0.616]

97 Noisy matrix decomposition via convex relaxation: Optimal rates in high dimensions. [sent-313, score-0.084]

98 Fast convex optimization algorithms for exact recovery of a corrupted low-rank matrix. [sent-412, score-0.157]

99 Fixed point and bregman iterative methods for matrix rank minimization. [sent-418, score-0.106]

100 Restricted strong convexity and weighted matrix completion: Optimal bounds with noise. [sent-443, score-0.084]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('dfc', 0.575), ('roj', 0.442), ('rmf', 0.309), ('ys', 0.201), ('apg', 0.156), ('mc', 0.154), ('recovery', 0.136), ('proj', 0.133), ('base', 0.132), ('nystr', 0.118), ('nys', 0.104), ('lnys', 0.103), ('lproj', 0.103), ('factorization', 0.102), ('ns', 0.101), ('ens', 0.1), ('submatrix', 0.097), ('mf', 0.092), ('noisy', 0.087), ('matrix', 0.084), ('ce', 0.076), ('parallel', 0.074), ('submatrices', 0.071), ('entries', 0.059), ('rmse', 0.058), ('outliers', 0.057), ('collaborative', 0.054), ('completion', 0.054), ('lg', 0.052), ('ct', 0.05), ('coherence', 0.047), ('mn', 0.047), ('rows', 0.045), ('uc', 0.044), ('column', 0.044), ('columns', 0.042), ('guarantees', 0.039), ('subproblems', 0.038), ('lk', 0.036), ('background', 0.036), ('ol', 0.036), ('video', 0.033), ('ltering', 0.033), ('projection', 0.032), ('wright', 0.03), ('vm', 0.03), ('ml', 0.03), ('amp', 0.029), ('cvw', 0.029), ('dfcp', 0.029), ('lobby', 0.029), ('oversampling', 0.029), ('um', 0.029), ('randomized', 0.028), ('divides', 0.028), ('dn', 0.028), ('uniformly', 0.028), ('cr', 0.027), ('mohri', 0.026), ('matrices', 0.026), ('highprobability', 0.026), ('uw', 0.026), ('sampled', 0.026), ('ci', 0.025), ('cl', 0.024), ('subproblem', 0.024), ('revealed', 0.024), ('scalability', 0.024), ('ul', 0.024), ('frieze', 0.024), ('movielens', 0.024), ('superlinear', 0.024), ('partition', 0.024), ('log', 0.023), ('recovered', 0.023), ('ensemble', 0.023), ('architectures', 0.023), ('recommender', 0.022), ('surveillance', 0.022), ('rm', 0.022), ('factored', 0.022), ('solves', 0.022), ('rank', 0.022), ('parallelism', 0.021), ('gross', 0.021), ('cand', 0.021), ('corrupted', 0.021), ('onto', 0.021), ('mk', 0.021), ('frames', 0.021), ('accelerated', 0.02), ('prevalent', 0.02), ('corollary', 0.02), ('hall', 0.02), ('vl', 0.019), ('robust', 0.019), ('singular', 0.019), ('ganesh', 0.019), ('portions', 0.019), ('mij', 0.019)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0 73 nips-2011-Divide-and-Conquer Matrix Factorization

Author: Lester W. Mackey, Michael I. Jordan, Ameet Talwalkar

2 0.079698108 257 nips-2011-SpaRCS: Recovering low-rank and sparse matrices from compressive measurements

Author: Andrew E. Waters, Aswin C. Sankaranarayanan, Richard Baraniuk

Abstract: We consider the problem of recovering a matrix M that is the sum of a low-rank matrix L and a sparse matrix S from a small set of linear measurements of the form y = A(M) = A(L + S). This model subsumes three important classes of signal recovery problems: compressive sensing, afﬁne rank minimization, and robust principal component analysis. We propose a natural optimization problem for signal recovery under this model and develop a new greedy algorithm called SpaRCS to solve it. Empirically, SpaRCS inherits a number of desirable properties from the state-of-the-art CoSaMP and ADMiRA algorithms, including exponential convergence and efﬁcient implementation. Simulation results with video compressive sensing, hyperspectral imaging, and robust matrix completion data sets demonstrate both the accuracy and efﬁcacy of the algorithm. 1

3 0.075220361 297 nips-2011-Universal low-rank matrix recovery from Pauli measurements

Author: Yi-kai Liu

Abstract: We study the problem of reconstructing an unknown matrix M of rank r and dimension d using O(rd poly log d) Pauli measurements. This has applications in quantum state tomography, and is a non-commutative analogue of a well-known problem in compressed sensing: recovering a sparse vector from a few of its Fourier coefﬁcients. We show that almost all sets of O(rd log6 d) Pauli measurements satisfy the rankr restricted isometry property (RIP). This implies that M can be recovered from a ﬁxed (“universal”) set of Pauli measurements, using nuclear-norm minimization (e.g., the matrix Lasso), with nearly-optimal bounds on the error. A similar result holds for any class of measurements that use an orthonormal operator basis whose elements have small operator norm. Our proof uses Dudley’s inequality for Gaussian processes, together with bounds on covering numbers obtained via entropy duality. 1

4 0.073646329 239 nips-2011-Robust Lasso with missing and grossly corrupted observations

Author: Nasser M. Nasrabadi, Trac D. Tran, Nam Nguyen

Abstract: This paper studies the problem of accurately recovering a sparse vector β from highly corrupted linear measurements y = Xβ + e + w where e is a sparse error vector whose nonzero entries may be unbounded and w is a bounded noise. We propose a so-called extended Lasso optimization which takes into consideration sparse prior information of both β and e . Our ﬁrst result shows that the extended Lasso can faithfully recover both the regression and the corruption vectors. Our analysis is relied on a notion of extended restricted eigenvalue for the design matrix X. Our second set of results applies to a general class of Gaussian design matrix X with i.i.d rows N (0, Σ), for which we provide a surprising phenomenon: the extended Lasso can recover exact signed supports of both β and e from only Ω(k log p log n) observations, even the fraction of corruption is arbitrarily close to one. Our analysis also shows that this amount of observations required to achieve exact signed support is optimal. 1

5 0.067308858 118 nips-2011-High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity

Author: Po-ling Loh, Martin J. Wainwright

Abstract: Although the standard formulations of prediction problems involve fully-observed and noiseless data drawn in an i.i.d. manner, many applications involve noisy and/or missing data, possibly involving dependencies. We study these issues in the context of high-dimensional sparse linear regression, and propose novel estimators for the cases of noisy, missing, and/or dependent data. Many standard approaches to noisy or missing data, such as those using the EM algorithm, lead to optimization problems that are inherently non-convex, and it is difﬁcult to establish theoretical guarantees on practical algorithms. While our approach also involves optimizing non-convex programs, we are able to both analyze the statistical error associated with any global optimum, and prove that a simple projected gradient descent algorithm will converge in polynomial time to a small neighborhood of the set of global minimizers. On the statistical side, we provide non-asymptotic bounds that hold with high probability for the cases of noisy, missing, and/or dependent data. On the computational side, we prove that under the same types of conditions required for statistical consistency, the projected gradient descent algorithm will converge at geometric rates to a near-global minimizer. We illustrate these theoretical predictions with simulations, showing agreement with the predicted scalings. 1

6 0.060840968 265 nips-2011-Sparse recovery by thresholded non-negative least squares

7 0.059887841 203 nips-2011-On the accuracy of l1-filtering of signals with block-sparse structure

8 0.058819924 211 nips-2011-Penalty Decomposition Methods for Rank Minimization

9 0.058438495 259 nips-2011-Sparse Estimation with Structured Dictionaries

10 0.056146424 165 nips-2011-Matrix Completion for Multi-label Image Classification

11 0.055556282 260 nips-2011-Sparse Features for PCA-Like Linear Regression

12 0.054557785 230 nips-2011-RTRMC: A Riemannian trust-region method for low-rank matrix completion

13 0.054009438 159 nips-2011-Learning with the weighted trace-norm under arbitrary sampling distributions

14 0.050842606 228 nips-2011-Quasi-Newton Methods for Markov Chain Monte Carlo

15 0.048404772 29 nips-2011-Algorithms and hardness results for parallel large margin learning

16 0.046934318 264 nips-2011-Sparse Recovery with Brownian Sensing

17 0.045327246 186 nips-2011-Noise Thresholds for Spectral Clustering

18 0.04315418 102 nips-2011-Generalised Coupled Tensor Factorisation

19 0.038449589 5 nips-2011-A Denoising View of Matrix Completion

20 0.038371086 251 nips-2011-Shaping Level Sets with Submodular Functions

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.11), (1, 0.002), (2, -0.034), (3, -0.079), (4, -0.05), (5, 0.034), (6, -0.007), (7, 0.036), (8, 0.039), (9, 0.054), (10, 0.059), (11, -0.004), (12, -0.038), (13, 0.004), (14, 0.074), (15, -0.064), (16, 0.004), (17, -0.023), (18, -0.047), (19, 0.011), (20, -0.011), (21, 0.018), (22, 0.009), (23, -0.113), (24, -0.009), (25, 0.03), (26, -0.054), (27, 0.013), (28, 0.033), (29, 0.017), (30, -0.047), (31, 0.041), (32, 0.007), (33, -0.035), (34, 0.071), (35, -0.021), (36, -0.058), (37, 0.038), (38, 0.053), (39, -0.2), (40, -0.084), (41, -0.02), (42, -0.027), (43, -0.079), (44, 0.023), (45, -0.01), (46, -0.046), (47, 0.021), (48, -0.005), (49, -0.021)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.91332936 73 nips-2011-Divide-and-Conquer Matrix Factorization

Author: Lester W. Mackey, Michael I. Jordan, Ameet Talwalkar

2 0.77815968 257 nips-2011-SpaRCS: Recovering low-rank and sparse matrices from compressive measurements

Author: Andrew E. Waters, Aswin C. Sankaranarayanan, Richard Baraniuk

3 0.67964745 203 nips-2011-On the accuracy of l1-filtering of signals with block-sparse structure

Author: Fatma K. Karzan, Arkadi S. Nemirovski, Boris T. Polyak, Anatoli Juditsky

Abstract: We discuss new methods for the recovery of signals with block-sparse structure, based on 1 -minimization. Our emphasis is on the efﬁciently computable error bounds for the recovery routines. We optimize these bounds with respect to the method parameters to construct the estimators with improved statistical properties. We justify the proposed approach with an oracle inequality which links the properties of the recovery algorithms and the best estimation performance. 1

4 0.67128325 297 nips-2011-Universal low-rank matrix recovery from Pauli measurements

Author: Yi-kai Liu

5 0.66248506 264 nips-2011-Sparse Recovery with Brownian Sensing

Author: Alexandra Carpentier, Odalric-ambrym Maillard, Rémi Munos

Abstract: We consider the problem of recovering the parameter α ∈ RK of a sparse function f (i.e. the number of non-zero entries of α is small compared to the number K of features) given noisy evaluations of f at a set of well-chosen sampling points. We introduce an additional randomization process, called Brownian sensing, based on the computation of stochastic integrals, which produces a Gaussian sensing matrix, for which good recovery properties are proven, independently on the number of sampling points N , even when the features are arbitrarily non-orthogonal. Under the assumption that f is H¨ lder continuous with exponent at least √ we proo 1/2, vide an estimate α of the parameter such that �α − α�2 = O(�η�2 / N ), where � � η is the observation noise. The method uses a set of sampling points uniformly distributed along a one-dimensional curve selected according to the features. We report numerical experiments illustrating our method. 1

6 0.62212265 209 nips-2011-Orthogonal Matching Pursuit with Replacement

7 0.54907274 230 nips-2011-RTRMC: A Riemannian trust-region method for low-rank matrix completion

8 0.52711052 260 nips-2011-Sparse Features for PCA-Like Linear Regression

9 0.5270474 161 nips-2011-Linearized Alternating Direction Method with Adaptive Penalty for Low-Rank Representation

10 0.5097394 5 nips-2011-A Denoising View of Matrix Completion

11 0.49606797 265 nips-2011-Sparse recovery by thresholded non-negative least squares

12 0.49093258 172 nips-2011-Minimax Localization of Structural Information in Large Noisy Matrices

13 0.46039402 211 nips-2011-Penalty Decomposition Methods for Rank Minimization

14 0.43914735 239 nips-2011-Robust Lasso with missing and grossly corrupted observations

15 0.4347679 228 nips-2011-Quasi-Newton Methods for Markov Chain Monte Carlo

16 0.42772397 164 nips-2011-Manifold Precis: An Annealing Technique for Diverse Sampling of Manifolds

17 0.42085123 159 nips-2011-Learning with the weighted trace-norm under arbitrary sampling distributions

18 0.4199869 118 nips-2011-High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity

19 0.4141143 65 nips-2011-Convergent Fitted Value Iteration with Linear Function Approximation

20 0.40816516 259 nips-2011-Sparse Estimation with Structured Dictionaries

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.058), (4, 0.031), (11, 0.29), (20, 0.044), (26, 0.04), (31, 0.055), (33, 0.022), (43, 0.07), (45, 0.095), (48, 0.014), (56, 0.011), (57, 0.041), (74, 0.039), (83, 0.025), (99, 0.05)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.75347459 73 nips-2011-Divide-and-Conquer Matrix Factorization

Author: Lester W. Mackey, Michael I. Jordan, Ameet Talwalkar

2 0.72106385 18 nips-2011-Action-Gap Phenomenon in Reinforcement Learning

Author: Amir-massoud Farahmand

Abstract: Many practitioners of reinforcement learning problems have observed that oftentimes the performance of the agent reaches very close to the optimal performance even though the estimated (action-)value function is still far from the optimal one. The goal of this paper is to explain and formalize this phenomenon by introducing the concept of the action-gap regularity. As a typical result, we prove that for an ˆ agent following the greedy policy π with respect to an action-value function Q, the ˆ ˆ ˆ performance loss E V ∗ (X) − V π (X) is upper bounded by O( Q − Q∗ 1+ζ ), ∞ in which ζ ≥ 0 is the parameter quantifying the action-gap regularity. For ζ > 0, our results indicate smaller performance loss compared to what previous analyses had suggested. Finally, we show how this regularity affects the performance of the family of approximate value iteration algorithms. 1

3 0.68377995 36 nips-2011-Analysis and Improvement of Policy Gradient Estimation

Author: Tingting Zhao, Hirotaka Hachiya, Gang Niu, Masashi Sugiyama

Abstract: Policy gradient is a useful model-free reinforcement learning approach, but it tends to suffer from instability of gradient estimates. In this paper, we analyze and improve the stability of policy gradient methods. We ﬁrst prove that the variance of gradient estimates in the PGPE (policy gradients with parameter-based exploration) method is smaller than that of the classical REINFORCE method under a mild assumption. We then derive the optimal baseline for PGPE, which contributes to further reducing the variance. We also theoretically show that PGPE with the optimal baseline is more preferable than REINFORCE with the optimal baseline in terms of the variance of gradient estimates. Finally, we demonstrate the usefulness of the improved PGPE method through experiments. 1

4 0.66027939 66 nips-2011-Crowdclustering

Author: Ryan G. Gomes, Peter Welinder, Andreas Krause, Pietro Perona

Abstract: Is it possible to crowdsource categorization? Amongst the challenges: (a) each worker has only a partial view of the data, (b) different workers may have different clustering criteria and may produce different numbers of categories, (c) the underlying category structure may be hierarchical. We propose a Bayesian model of how workers may approach clustering and show how one may infer clusters / categories, as well as worker parameters, using this model. Our experiments, carried out on large collections of images, suggest that Bayesian crowdclustering works well and may be superior to single-expert annotations. 1

5 0.6381709 150 nips-2011-Learning a Distance Metric from a Network

Author: Blake Shaw, Bert Huang, Tony Jebara

Abstract: Many real-world networks are described by both connectivity information and features for every node. To better model and understand these networks, we present structure preserving metric learning (SPML), an algorithm for learning a Mahalanobis distance metric from a network such that the learned distances are tied to the inherent connectivity structure of the network. Like the graph embedding algorithm structure preserving embedding, SPML learns a metric which is structure preserving, meaning a connectivity algorithm such as k-nearest neighbors will yield the correct connectivity when applied using the distances from the learned metric. We show a variety of synthetic and real-world experiments where SPML predicts link patterns from node features more accurately than standard techniques. We further demonstrate a method for optimizing SPML based on stochastic gradient descent which removes the running-time dependency on the size of the network and allows the method to easily scale to networks of thousands of nodes and millions of edges. 1

6 0.50106752 198 nips-2011-On U-processes and clustering performance

7 0.49948964 186 nips-2011-Noise Thresholds for Spectral Clustering

8 0.49911058 22 nips-2011-Active Ranking using Pairwise Comparisons

9 0.49777463 291 nips-2011-Transfer from Multiple MDPs

10 0.49747443 29 nips-2011-Algorithms and hardness results for parallel large margin learning

11 0.49654961 204 nips-2011-Online Learning: Stochastic, Constrained, and Smoothed Adversaries

12 0.49591458 159 nips-2011-Learning with the weighted trace-norm under arbitrary sampling distributions

13 0.49514821 96 nips-2011-Fast and Balanced: Efficient Label Tree Learning for Large Scale Object Recognition

14 0.49263632 265 nips-2011-Sparse recovery by thresholded non-negative least squares

15 0.49017265 84 nips-2011-EigenNet: A Bayesian hybrid of generative and conditional models for sparse learning

16 0.48755208 80 nips-2011-Efficient Online Learning via Randomized Rounding

17 0.48615077 106 nips-2011-Generalizing from Several Related Classification Tasks to a New Unlabeled Sample

18 0.48604569 118 nips-2011-High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity

19 0.48582178 199 nips-2011-On fast approximate submodular minimization

20 0.48546118 233 nips-2011-Rapid Deformable Object Detection using Dual-Tree Branch-and-Bound