nips nips2013 nips2013-285 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Yuhong Guo
Abstract: Principal component analysis (PCA), a well-established technique for data analysis and processing, provides a convenient form of dimensionality reduction that is effective for cleaning small Gaussian noises presented in the data. However, the applicability of standard principal component analysis in real scenarios is limited by its sensitivity to large errors. In this paper, we tackle the challenge problem of recovering data corrupted with errors of high magnitude by developing a novel robust transfer principal component analysis method. Our method is based on the assumption that useful information for the recovery of a corrupted data matrix can be gained from an uncorrupted related data matrix. Specifically, we formulate the data recovery problem as a joint robust principal component analysis problem on the two data matrices, with common principal components shared across matrices and individual principal components specific to each data matrix. The formulated optimization problem is a minimization problem over a convex objective function but with non-convex rank constraints. We develop an efficient proximal projected gradient descent algorithm to solve the proposed optimization problem with convergence guarantees. Our empirical results over image denoising tasks show the proposed method can effectively recover images with random large errors, and significantly outperform both standard PCA and robust PCA with rank constraints. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract Principal component analysis (PCA), a well-established technique for data analysis and processing, provides a convenient form of dimensionality reduction that is effective for cleaning small Gaussian noises presented in the data. [sent-2, score-0.211]
2 However, the applicability of standard principal component analysis in real scenarios is limited by its sensitivity to large errors. [sent-3, score-0.15]
3 In this paper, we tackle the challenge problem of recovering data corrupted with errors of high magnitude by developing a novel robust transfer principal component analysis method. [sent-4, score-0.868]
4 Our method is based on the assumption that useful information for the recovery of a corrupted data matrix can be gained from an uncorrupted related data matrix. [sent-5, score-0.523]
5 Specifically, we formulate the data recovery problem as a joint robust principal component analysis problem on the two data matrices, with common principal components shared across matrices and individual principal components specific to each data matrix. [sent-6, score-0.765]
6 The formulated optimization problem is a minimization problem over a convex objective function but with non-convex rank constraints. [sent-7, score-0.308]
7 We develop an efficient proximal projected gradient descent algorithm to solve the proposed optimization problem with convergence guarantees. [sent-8, score-0.273]
8 Our empirical results over image denoising tasks show the proposed method can effectively recover images with random large errors, and significantly outperform both standard PCA and robust PCA with rank constraints. [sent-9, score-0.699]
9 It has been used to discover important latent information about observed data matrices for visualization, feature recovery, embedding and data cleaning. [sent-11, score-0.106]
10 The fundamental assumption roots in dimensionality reduction is that the intrinsic structure of high dimensional observation data lies on a low dimensional linear subspace. [sent-12, score-0.221]
11 It seeks the best low-rank approximation of the given data matrix under a well understood least-squares reconstruction loss, and projects data onto uncorrelated low dimensional subspace. [sent-14, score-0.334]
12 These properties make PCA a well suited reduction method when the observed data is mildly corrupted with small Gaussian noise [12]. [sent-16, score-0.276]
13 Even a small fraction of large errors can cause severe degradation in PCA’s estimate of the low rank structure. [sent-18, score-0.303]
14 Real-life data, however, is often corrupted with large errors or even missing observations. [sent-19, score-0.27]
15 To tackle dimensionality reduction with arbitrarily large errors and outliers, a number of approaches that robustify PCA have been developed in the literature, including ℓ1 -norm regularized robust PCA [14], influence function techniques [5, 13], and alternating ℓ1 -norm minimization [8]. [sent-20, score-0.374]
16 Nevertheless, the 1 capacity of these approaches on recovering the low-rank structure of a corrupted data matrix can still be degraded with the increasing of the fraction of the large errors. [sent-21, score-0.428]
17 In this paper, we propose a novel robust transfer principal component analysis method to recover the low rank representation of heavily corrupted data by leveraging related uncorrupted auxiliary data. [sent-22, score-1.269]
18 Seeking knowledge transfer from a related auxiliary data source for the target learning problem has been popularly studied in supervised learning. [sent-23, score-0.553]
19 It is also known that modeling related data sources together provides rich information for discovering theirs shared subspace representations [4]. [sent-24, score-0.139]
20 This robust transfer PCA framework combines aspects of both robust PCA and transfer learning methodologies. [sent-26, score-0.712]
21 We expect the critical low rank structure shared between the two data matrices can be effectively transferred from the uncorrupted auxiliary data to recover the low dimensional subspace representation of the heavily corrupted target data in a robust manner. [sent-27, score-1.317]
22 We formulate this robust transfer PCA as a joint minimization problem over a convex combination of least squares losses with non-convex matrix rank constraints. [sent-28, score-0.826]
23 Though a simple relaxation of the matrix rank constraints into convex nuclear norm constraints can lead to a convex optimization problem, it is very difficult to control the rank of the low-rank representation matrix we aim to recover through the nuclear norm. [sent-29, score-1.181]
24 We thus develop a proximal projected gradient descent optimization algorithm to solve the proposed optimization problem with rank constraints, which permits a convenient closed-form solution for each proximal step based on singular value decomposition and converges to a stationary point. [sent-30, score-0.767]
25 Our experiments over image denoising tasks show the proposed method can effectively recover images corrupted with random large errors, and significantly outperform both standard PCA and robust PCA with rank constraints. [sent-31, score-0.892]
26 Notations: In this paper, we use In to denote an n × n identify matrix, use On,m to denote an n × m matrix with all 0 values, use · F to denote the matrix Frobenius norm, and use · ∗ to denote the nuclear norm (trace norm). [sent-32, score-0.409]
27 2 Preliminaries Assume we are given an observed data matrix X ∈ Rn×d consisting of n observations of ddimensional feature vectors, which was generated by corrupting some entries of a latent low-rank matrix M ∈ Rn×d with an error matrix E ∈ Rn×d such that X = M + E. [sent-33, score-0.392]
28 We aim to to recover the low-rank matrix M by projecting the high dimensional observations X into a low dimensional manifold representation matrix Z ∈ Rn×k over the low dimensional subspace B ∈ Rk×d , such that M = ZB, BB ⊤ = Ik for k < d. [sent-34, score-0.537]
29 1 PCA Given the above setup, standard PCA assumes the error matrix E contains small i. [sent-36, score-0.122]
30 Gaussian noises, and seeks optimal low dimensional encoding matrix Z and basis matrix B to reconstruct X by X = ZB + E. [sent-39, score-0.346]
31 (1) F Z,B That is, standard PCA seeks the best rank-k estimate of the latent low-rank matrix M = ZB by solving min X − M 2 s. [sent-43, score-0.183]
32 With the convenient solution, standard PCA has been widely used for modern data analysis and serves as an efficient and effective dimensionality reduction procedure when the error E is small and i. [sent-47, score-0.137]
33 2 Robust PCA The validity of standard PCA however breaks down when corrupted errors in the observed data matrix are large. [sent-52, score-0.418]
34 Note that even a single grossly corrupted entry in the observation matrix X can render the recovered M ∗ matrix to be shifted away from the true low-rank matrix M . [sent-53, score-0.559]
35 To recover the intrinsic low-rank matrix M from the observation matrix X corrupted with sparse large errors E, a polynomial-time robust PCA method has been developed in [14], which induces the following optimization problem min rank(M ) + γ E 0 M,E s. [sent-54, score-0.837]
36 (4) By relaxing the non-convex rank function and the ℓ0 -norm into their convex envelopes of nuclear norm and ℓ1 -norm respectively, a convex relaxation of the robust PCA can be yielded min M M,E ∗ +λ E 1 s. [sent-57, score-0.685]
37 (5) With an appropriate choice of λ parameter, one can exactly recover the M, E matrices that generated the observations X by solving this convex program. [sent-60, score-0.196]
38 To produce a scalable optimization for robust PCA, a more convenient relaxed formulation has been considered in [14] min M M,E ∗ +λ E 1 + α M +E−X 2 2 F (6) where the original equality constraint is replaced with a reconstruction loss penalty term. [sent-61, score-0.298]
39 This formulation apparently seeks the lowest rank M that can best reconstruct the observation matrix X subjecting to sparse errors E. [sent-62, score-0.456]
40 Robust PCA though can effectively recover the low-rank matrix given very sparse large errors in the observed data, its performance can be degraded when the observation data is heavily corrupted with dense large errors. [sent-63, score-0.593]
41 In this work, we propose to tackle this problem by exploiting information from related uncorrupted auxiliary data. [sent-64, score-0.254]
42 3 Robust Transfer PCA Exploring labeled information in a related auxiliary data set to assist the learning problem on a target data set has been widely studied in supervised learning scenarios within the context of transfer learning, domain adaptation and multi-task learning [10]. [sent-65, score-0.44]
43 Moreover, it has also been shown that modeling related data sources together can provide useful information for discovering their shared subspace representations in an unsupervised manner [4]. [sent-66, score-0.139]
44 The principle behind these knowledge transfer learning approaches is that related data sets can complement each other on identifying the intrinsic latent structure shared between them. [sent-67, score-0.324]
45 Following this transfer learning scheme, we present a robust transfer PCA method for recovering low-rank matrix from a heavily corrupted observation matrix. [sent-68, score-0.946]
46 Assume we are given a target data matrix Xt ∈ Rnt ×d corrupted with errors of large magnitude, and a related source data matrix Xs ∈ Rns ×d . [sent-69, score-0.75]
47 Given constant matrices As = [Ins , Ons ,nt ] and At = [Ont ,ns , Int ], we can re-express Ns and Nt in term of the unified matrix Zc such that Ns = As Zc and Nt = At Zc . [sent-72, score-0.176]
48 The learning problem of robust transfer PCA can then be formulated as the following joint minimiza3 tion problem min Zc ,Zs ,Zt ,Bc ,Bs ,Bt ,Es ,Et s. [sent-73, score-0.401]
49 αs As Zc Bc + Zs Bs + Es − Xs 2 + F 2 αt At Zc Bc + Zt Bt + Et − Xt 2 + βs Es F 2 ⊤ ⊤ ⊤ B c B c = I kc , B s B s = I ks , B t B t = I kt (9) 1 + β t Et 1 which minimizes the least squares reconstruction losses on both data matrices with ℓ1 -norm regularizers over the additive error matrices. [sent-75, score-0.869]
50 rank(Mc ) ≤ kc , rank(Ms ) ≤ ks , rank(Mt ) ≤ kt 2 F (11) which has a ℓ1 -norm regularized convex objective function, but is subjecting to non-convex inequality rank constraints. [sent-80, score-0.959]
51 A standard convexification of the rank constraints is to replace rank functions with their convex envelopes, nuclear norms [3, 14, 1, 6, 15]. [sent-81, score-0.561]
52 However, though the nuclear norm is a convex envelope of the rank function, it is not always a high-quality approximation of the rank function [11]. [sent-83, score-0.61]
53 Moreover, it is very difficult to select the appropriate trade-off parameters λs , λt for the nuclear norm regularizers in (12) to recover the low-rank matrix solutions in the original optimization in (11). [sent-84, score-0.447]
54 In principal component analysis problems it is much more convenient to have explicit control on the rank of the low-rank solution matrices. [sent-85, score-0.38]
55 Therefore instead of solving the nuclear norm based convex optimization problem (12), we develop a scalable and efficient proximal gradient algorithm to solve the rank constraint based minimization problem (11) directly, which is shown to converge to a stationary point. [sent-86, score-0.661]
56 After solving the optimization problem (11), the low-rank approximation of the corrupted matrix Xt ˆ can be obtained as Xt = At Mc + Mt . [sent-87, score-0.373]
57 4 Proximal Projected Gradient Descent Algorithm Proximal gradient methods have been popularly used for unconstrained convex optimization problems with continuous but non-smooth regularizers [2]. [sent-88, score-0.236]
58 In this work, we develop a proximal projected gradient algorithm to solve the non-convex optimization problem with matrix rank constraints in (11). [sent-89, score-0.583]
59 End While Here f (Θ) is a convex and continuously differentiable function while g(Θ) is a convex but nonsmooth function. [sent-96, score-0.126]
60 Let C = {Θ : rank(Mc ) ≤ kc , rank(Ms ) ≤ ks , rank(Mt ) ≤ kt }. [sent-98, score-0.655]
61 Our proximal projected gradient algorithm is an iterative procedure. [sent-100, score-0.212]
62 5 Experiments We evaluate the proposed approach using image denoising tasks constructed on the Yale Face Database, which contains 165 grayscale images of 15 individuals. [sent-114, score-0.256]
63 Our goal is to investigate the performance of the proposed approach on recovering data corrupted 0 with large and dense errors. [sent-116, score-0.255]
64 Let Xt denote a target image matrix from one subject, which has values between 0 and 255. [sent-118, score-0.277]
65 We randomly select a fraction of its pixels to add large errors to reach value 255, where the fraction of noisy pixels is controlled using a noise level parameter σ. [sent-119, score-0.176]
66 We then 0 use an uncorrupted image matrix Xs from the same or different subject as the source matrix to help ˆ the image denoising of Xt by recovering its low-rank approximation matrix Xt . [sent-121, score-0.922]
67 In the experiments, we compared the performance of the following methods on image denoising with large errors: • R-T-PCA: This is the proposed robust transfer PCA method. [sent-122, score-0.536]
68 • R-S-PCA: This is a robust shared PCA method that applies a rank-constrained version of 0 the robust PCA in [14] on the concatenated matrix [Xs ; Xt ] to recover a low-rank approxˆ t with rank kc + kt . [sent-125, score-1.279]
69 imation matrix X • R-PCA: This is a robust PCA method that applies a rank-constrained version of the robust ˆ PCA in [14] on Xt to recover a low-rank approximation matrix Xt with rank kc + kt . [sent-126, score-1.321]
70 0 • S-PCA: This is a shared PCA method that applies PCA on concatenated matrix [Xs ; Xt ] to ˆ recover a low-rank approximation matrix Xt with rank kc + kt . [sent-127, score-1.115]
71 • PCA: This method applies PCA on the noisy target matrix Xt to recover a low-rank apˆ proximation matrix Xt with rank kc + kt . [sent-128, score-1.109]
72 • R-2Step-PCA: This method exploits the auxiliary source matrix by first performing robust 0 PCA over the concatenated matrix [Xs ; Xt ] to produce a shared matrix Mc with rank kc , and then performing robust PCA over the residue matrix (Xt − At Mc ) to produce a matrix ˆ Mt with rank kt . [sent-129, score-2.062]
73 All the methods are evaluated using the root mean square error (RMSE) between the true target 0 ˆ image matrix Xt and the low-rank approximation matrix Xt recovered from the noisy image matrix. [sent-131, score-0.518]
74 Unless specified otherwise, we used kc = 8, ks = 3, kt = 3 in all experiments. [sent-132, score-0.655]
75 1 Intra-Subject Experiments We first conducted experiments by constructing 15 transfer tasks for the 15 subjects. [sent-134, score-0.274]
76 Specifically, for each subject, we used the first image matrix as the target matrix and used each of the remaining 10 image matrices as the source matrix each time. [sent-135, score-0.738]
77 For each source matrix, we repeated the experiments 5 times by randomly generating noisy target matrix using the procedure described above. [sent-136, score-0.336]
78 The average denoising results in terms of root mean square error (RMSE) with noise level σ = 5% are reported in Table 1. [sent-138, score-0.138]
79 We can see that the proposed method R-T-PCA outperforms all other methods 6 Table 1: The average denoising results in terms of RMSE at noise level σ = 5%. [sent-143, score-0.138]
80 The comparison between the two groups of methods, {R-T-PCA, R-S-PCA, S-PCA} and {R-PCA, PCA}, shows that a related source matrix is indeed useful for denoising the target matrix. [sent-236, score-0.419]
81 The superior performance of R-T-PCA over R-S-PCA and S-PCA demonstrates the efficacy of our transfer PCA framework in exploiting the auxiliary source matrix over methods that simply concatenate the auxiliary source matrix and target matrix. [sent-238, score-0.922]
82 2 Cross-Subject Experiments Next, we conducted transfer experiments using source matrix and target matrix from different subjects. [sent-240, score-0.668]
83 We randomly constructed 5 transfer tasks, Task-6-1, Task-8-2, Task-9-4, Task-12-8 and Task14-11, where the first number in the task name denotes the source subject index and second number denotes the target subject index. [sent-241, score-0.444]
84 For example, to construct Task-6-1, we used the first image matrix from subject-6 as the source matrix and used the first image matrix from subject-1 as the target matrix. [sent-242, score-0.684]
85 We repeated each experiment 10 times using randomly generated noisy target matrix. [sent-244, score-0.118]
86 These results also suggest that even a remotely related source image can be useful. [sent-249, score-0.163]
87 All these experiments demonstrate the efficacy of the proposed method in exploiting uncorrupted auxiliary data matrix for denoising target images corrupted with large errors. [sent-250, score-0.808]
88 7 Table 2: The average denoising results in terms of RMSE. [sent-251, score-0.113]
89 3 Parameter Analysis The optimization problem (11) for the proposed R-T-PCA method has a number of parameters to be set: αs , αt , βs , βt , kc , ks and kt . [sent-326, score-0.691]
90 Given that the source and target matrices are similar in size, in these experiments we set αs = αt = 1, βs = βt and ks = kt . [sent-328, score-0.594]
91 In the first experiment, we set (kc , ks , kt ) = (8, 3, 3) and study the performance of R-T-PCA with different βs = βt = β values, for β ∈ {0. [sent-329, score-0.356]
92 We can see that R-T-PCA is quite robust to β within the range of values, {0. [sent-337, score-0.154]
93 1 and compared R-T-PCA with other methods across a few different settings of (kc , ks , kt ), with (kc , ks , kt ) ∈ {(6, 3, 3), (8, 3, 3), (8, 5, 5), (10, 3, 3), (10, 5, 5)}. [sent-343, score-0.712]
94 6 Conclusion In this paper, we developed a novel robust transfer principal component analysis method to recover the low-rank representation of corrupted data by leveraging related uncorrupted auxiliary data. [sent-346, score-1.028]
95 This robust transfer PCA framework combines aspects of both robust PCA and transfer learning methodologies. [sent-347, score-0.712]
96 Our experiments over image denoising tasks demonstrated the proposed method can effectively exploit auxiliary uncorrupted image to recover images corrupted with random large errors and significantly outperform a number of comparison methods. [sent-349, score-0.926]
97 Robust l1 norm factorization in the presence of outliers and missing data by alternative convex programming. [sent-388, score-0.149]
98 Fixed point and bregman iterative methods for matrix rank minimization. [sent-395, score-0.325]
99 Guaranteed minimum rank solutions to linear matrix equations via nuclear norm minimization. [sent-406, score-0.467]
100 Robust principal component analysis: Exact recovery of corrupted low-rank matrices by convex optimization. [sent-424, score-0.49]
wordName wordTfidf (topN-words)
[('pca', 0.458), ('mc', 0.337), ('kc', 0.299), ('transfer', 0.202), ('corrupted', 0.193), ('kt', 0.189), ('rank', 0.18), ('ks', 0.167), ('mt', 0.161), ('robust', 0.154), ('ms', 0.146), ('xt', 0.133), ('uncorrupted', 0.126), ('zc', 0.124), ('matrix', 0.122), ('denoising', 0.113), ('proximal', 0.112), ('principal', 0.106), ('nuclear', 0.105), ('rmse', 0.103), ('auxiliary', 0.098), ('xs', 0.097), ('source', 0.096), ('target', 0.088), ('rns', 0.087), ('rnt', 0.087), ('recover', 0.079), ('errors', 0.077), ('image', 0.067), ('shared', 0.064), ('convex', 0.063), ('zb', 0.06), ('ns', 0.06), ('norm', 0.06), ('bc', 0.058), ('singular', 0.056), ('matrices', 0.054), ('projected', 0.051), ('convenient', 0.05), ('subspace', 0.049), ('gradient', 0.049), ('et', 0.046), ('regularizers', 0.045), ('nt', 0.045), ('component', 0.044), ('pes', 0.043), ('pmc', 0.043), ('pms', 0.043), ('pmt', 0.043), ('popularly', 0.043), ('svd', 0.043), ('bs', 0.042), ('images', 0.042), ('dimensional', 0.039), ('seeks', 0.039), ('yuhong', 0.038), ('envelopes', 0.038), ('pet', 0.038), ('subjecting', 0.038), ('conducted', 0.038), ('concatenated', 0.038), ('es', 0.037), ('heavily', 0.037), ('optimization', 0.036), ('reconstruction', 0.036), ('recovering', 0.036), ('tasks', 0.034), ('permits', 0.033), ('constraints', 0.033), ('reduction', 0.032), ('intrinsic', 0.032), ('noises', 0.03), ('noisy', 0.03), ('effectively', 0.03), ('recovery', 0.03), ('vk', 0.03), ('tackle', 0.03), ('dimensionality', 0.029), ('subject', 0.029), ('minimization', 0.029), ('losses', 0.029), ('degraded', 0.029), ('zs', 0.027), ('lipschitz', 0.027), ('stationary', 0.027), ('data', 0.026), ('bt', 0.026), ('noise', 0.025), ('descent', 0.025), ('low', 0.024), ('squares', 0.024), ('cacy', 0.023), ('joint', 0.023), ('regularized', 0.023), ('trace', 0.023), ('bb', 0.023), ('bregman', 0.023), ('fraction', 0.022), ('approximation', 0.022), ('min', 0.022)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000004 285 nips-2013-Robust Transfer Principal Component Analysis with Rank Constraints
Author: Yuhong Guo
Abstract: Principal component analysis (PCA), a well-established technique for data analysis and processing, provides a convenient form of dimensionality reduction that is effective for cleaning small Gaussian noises presented in the data. However, the applicability of standard principal component analysis in real scenarios is limited by its sensitivity to large errors. In this paper, we tackle the challenge problem of recovering data corrupted with errors of high magnitude by developing a novel robust transfer principal component analysis method. Our method is based on the assumption that useful information for the recovery of a corrupted data matrix can be gained from an uncorrupted related data matrix. Specifically, we formulate the data recovery problem as a joint robust principal component analysis problem on the two data matrices, with common principal components shared across matrices and individual principal components specific to each data matrix. The formulated optimization problem is a minimization problem over a convex objective function but with non-convex rank constraints. We develop an efficient proximal projected gradient descent algorithm to solve the proposed optimization problem with convergence guarantees. Our empirical results over image denoising tasks show the proposed method can effectively recover images with random large errors, and significantly outperform both standard PCA and robust PCA with rank constraints. 1
2 0.23166892 233 nips-2013-Online Robust PCA via Stochastic Optimization
Author: Jiashi Feng, Huan Xu, Shuicheng Yan
Abstract: Robust PCA methods are typically based on batch optimization and have to load all the samples into memory during optimization. This prevents them from efficiently processing big data. In this paper, we develop an Online Robust PCA (OR-PCA) that processes one sample per time instance and hence its memory cost is independent of the number of samples, significantly enhancing the computation and storage efficiency. The proposed OR-PCA is based on stochastic optimization of an equivalent reformulation of the batch RPCA. Indeed, we show that OR-PCA provides a sequence of subspace estimations converging to the optimum of its batch counterpart and hence is provably robust to sparse corruption. Moreover, OR-PCA can naturally be applied for tracking dynamic subspace. Comprehensive simulations on subspace recovering and tracking demonstrate the robustness and efficiency advantages of the OR-PCA over online PCA and batch RPCA methods. 1
3 0.21550415 232 nips-2013-Online PCA for Contaminated Data
Author: Jiashi Feng, Huan Xu, Shie Mannor, Shuicheng Yan
Abstract: We consider the online Principal Component Analysis (PCA) where contaminated samples (containing outliers) are revealed sequentially to the Principal Components (PCs) estimator. Due to their sensitiveness to outliers, previous online PCA algorithms fail in this case and their results can be arbitrarily skewed by the outliers. Here we propose the online robust PCA algorithm, which is able to improve the PCs estimation upon an initial one steadily, even when faced with a constant fraction of outliers. We show that the final result of the proposed online RPCA has an acceptable degradation from the optimum. Actually, under mild conditions, online RPCA achieves the maximal robustness with a 50% breakdown point. Moreover, online RPCA is shown to be efficient for both storage and computation, since it need not re-explore the previous samples as in traditional robust PCA algorithms. This endows online RPCA with scalability for large scale data. 1
4 0.20442697 116 nips-2013-Fantope Projection and Selection: A near-optimal convex relaxation of sparse PCA
Author: Vincent Q. Vu, Juhee Cho, Jing Lei, Karl Rohe
Abstract: We propose a novel convex relaxation of sparse principal subspace estimation based on the convex hull of rank-d projection matrices (the Fantope). The convex problem can be solved efficiently using alternating direction method of multipliers (ADMM). We establish a near-optimal convergence rate, in terms of the sparsity, ambient dimension, and sample size, for estimation of the principal subspace of a general covariance matrix without assuming the spiked covariance model. In the special case of d = 1, our result implies the near-optimality of DSPCA (d’Aspremont et al. [1]) even when the solution is not rank 1. We also provide a general theoretical framework for analyzing the statistical properties of the method for arbitrary input matrices that extends the applicability and provable guarantees to a wide array of settings. We demonstrate this with an application to Kendall’s tau correlation matrices and transelliptical component analysis. 1
5 0.1792589 188 nips-2013-Memory Limited, Streaming PCA
Author: Ioannis Mitliagkas, Constantine Caramanis, Prateek Jain
Abstract: We consider streaming, one-pass principal component analysis (PCA), in the highdimensional regime, with limited memory. Here, p-dimensional samples are presented sequentially, and the goal is to produce the k-dimensional subspace that best approximates these points. Standard algorithms require O(p2 ) memory; meanwhile no algorithm can do better than O(kp) memory, since this is what the output itself requires. Memory (or storage) complexity is most meaningful when understood in the context of computational and sample complexity. Sample complexity for high-dimensional PCA is typically studied in the setting of the spiked covariance model, where p-dimensional points are generated from a population covariance equal to the identity (white noise) plus a low-dimensional perturbation (the spike) which is the signal to be recovered. It is now well-understood that the spike can be recovered when the number of samples, n, scales proportionally with the dimension, p. Yet, all algorithms that provably achieve this, have memory complexity O(p2 ). Meanwhile, algorithms with memory-complexity O(kp) do not have provable bounds on sample complexity comparable to p. We present an algorithm that achieves both: it uses O(kp) memory (meaning storage of any kind) and is able to compute the k-dimensional spike with O(p log p) samplecomplexity – the first algorithm of its kind. While our theoretical analysis focuses on the spiked covariance model, our simulations show that our algorithm is successful on much more general models for the data. 1
6 0.1553628 286 nips-2013-Robust learning of low-dimensional dynamics from large neural ensembles
7 0.14574605 314 nips-2013-Stochastic Optimization of PCA with Capped MSG
8 0.13359459 120 nips-2013-Faster Ridge Regression via the Subsampled Randomized Hadamard Transform
9 0.12652503 224 nips-2013-On the Sample Complexity of Subspace Learning
10 0.11881668 91 nips-2013-Dirty Statistical Models
11 0.11192288 222 nips-2013-On the Linear Convergence of the Proximal Gradient Method for Trace Norm Regularization
12 0.10773309 66 nips-2013-Computing the Stationary Distribution Locally
13 0.10381906 56 nips-2013-Better Approximation and Faster Algorithm Using the Proximal Average
14 0.10298202 185 nips-2013-Matrix Completion From any Given Set of Observations
15 0.10205952 179 nips-2013-Low-Rank Matrix and Tensor Completion via Adaptive Sampling
16 0.10000037 113 nips-2013-Exact and Stable Recovery of Pairwise Interaction Tensors
17 0.098955214 269 nips-2013-Regression-tree Tuning in a Streaming Setting
18 0.09851186 53 nips-2013-Bayesian inference for low rank spatiotemporal neural receptive fields
19 0.098197147 295 nips-2013-Simultaneous Rectification and Alignment via Robust Recovery of Low-rank Tensors
20 0.097287036 27 nips-2013-Adaptive Multi-Column Deep Neural Networks with Application to Robust Image Denoising
topicId topicWeight
[(0, 0.235), (1, 0.092), (2, 0.133), (3, 0.099), (4, -0.05), (5, -0.038), (6, -0.094), (7, 0.169), (8, -0.137), (9, -0.15), (10, -0.167), (11, 0.08), (12, 0.083), (13, 0.183), (14, -0.037), (15, 0.019), (16, -0.008), (17, 0.052), (18, 0.083), (19, 0.024), (20, 0.084), (21, 0.004), (22, 0.068), (23, -0.058), (24, 0.078), (25, 0.025), (26, 0.091), (27, 0.032), (28, 0.027), (29, 0.073), (30, -0.014), (31, -0.013), (32, -0.078), (33, 0.012), (34, 0.027), (35, -0.164), (36, 0.017), (37, -0.004), (38, -0.16), (39, 0.036), (40, -0.052), (41, -0.035), (42, 0.05), (43, -0.047), (44, -0.028), (45, -0.018), (46, -0.012), (47, 0.039), (48, -0.016), (49, 0.049)]
simIndex simValue paperId paperTitle
same-paper 1 0.95692551 285 nips-2013-Robust Transfer Principal Component Analysis with Rank Constraints
Author: Yuhong Guo
Abstract: Principal component analysis (PCA), a well-established technique for data analysis and processing, provides a convenient form of dimensionality reduction that is effective for cleaning small Gaussian noises presented in the data. However, the applicability of standard principal component analysis in real scenarios is limited by its sensitivity to large errors. In this paper, we tackle the challenge problem of recovering data corrupted with errors of high magnitude by developing a novel robust transfer principal component analysis method. Our method is based on the assumption that useful information for the recovery of a corrupted data matrix can be gained from an uncorrupted related data matrix. Specifically, we formulate the data recovery problem as a joint robust principal component analysis problem on the two data matrices, with common principal components shared across matrices and individual principal components specific to each data matrix. The formulated optimization problem is a minimization problem over a convex objective function but with non-convex rank constraints. We develop an efficient proximal projected gradient descent algorithm to solve the proposed optimization problem with convergence guarantees. Our empirical results over image denoising tasks show the proposed method can effectively recover images with random large errors, and significantly outperform both standard PCA and robust PCA with rank constraints. 1
2 0.80965638 233 nips-2013-Online Robust PCA via Stochastic Optimization
Author: Jiashi Feng, Huan Xu, Shuicheng Yan
Abstract: Robust PCA methods are typically based on batch optimization and have to load all the samples into memory during optimization. This prevents them from efficiently processing big data. In this paper, we develop an Online Robust PCA (OR-PCA) that processes one sample per time instance and hence its memory cost is independent of the number of samples, significantly enhancing the computation and storage efficiency. The proposed OR-PCA is based on stochastic optimization of an equivalent reformulation of the batch RPCA. Indeed, we show that OR-PCA provides a sequence of subspace estimations converging to the optimum of its batch counterpart and hence is provably robust to sparse corruption. Moreover, OR-PCA can naturally be applied for tracking dynamic subspace. Comprehensive simulations on subspace recovering and tracking demonstrate the robustness and efficiency advantages of the OR-PCA over online PCA and batch RPCA methods. 1
3 0.79411674 314 nips-2013-Stochastic Optimization of PCA with Capped MSG
Author: Raman Arora, Andy Cotter, Nati Srebro
Abstract: We study PCA as a stochastic optimization problem and propose a novel stochastic approximation algorithm which we refer to as “Matrix Stochastic Gradient” (MSG), as well as a practical variant, Capped MSG. We study the method both theoretically and empirically. 1
4 0.78603858 232 nips-2013-Online PCA for Contaminated Data
Author: Jiashi Feng, Huan Xu, Shie Mannor, Shuicheng Yan
Abstract: We consider the online Principal Component Analysis (PCA) where contaminated samples (containing outliers) are revealed sequentially to the Principal Components (PCs) estimator. Due to their sensitiveness to outliers, previous online PCA algorithms fail in this case and their results can be arbitrarily skewed by the outliers. Here we propose the online robust PCA algorithm, which is able to improve the PCs estimation upon an initial one steadily, even when faced with a constant fraction of outliers. We show that the final result of the proposed online RPCA has an acceptable degradation from the optimum. Actually, under mild conditions, online RPCA achieves the maximal robustness with a 50% breakdown point. Moreover, online RPCA is shown to be efficient for both storage and computation, since it need not re-explore the previous samples as in traditional robust PCA algorithms. This endows online RPCA with scalability for large scale data. 1
5 0.69526964 188 nips-2013-Memory Limited, Streaming PCA
Author: Ioannis Mitliagkas, Constantine Caramanis, Prateek Jain
Abstract: We consider streaming, one-pass principal component analysis (PCA), in the highdimensional regime, with limited memory. Here, p-dimensional samples are presented sequentially, and the goal is to produce the k-dimensional subspace that best approximates these points. Standard algorithms require O(p2 ) memory; meanwhile no algorithm can do better than O(kp) memory, since this is what the output itself requires. Memory (or storage) complexity is most meaningful when understood in the context of computational and sample complexity. Sample complexity for high-dimensional PCA is typically studied in the setting of the spiked covariance model, where p-dimensional points are generated from a population covariance equal to the identity (white noise) plus a low-dimensional perturbation (the spike) which is the signal to be recovered. It is now well-understood that the spike can be recovered when the number of samples, n, scales proportionally with the dimension, p. Yet, all algorithms that provably achieve this, have memory complexity O(p2 ). Meanwhile, algorithms with memory-complexity O(kp) do not have provable bounds on sample complexity comparable to p. We present an algorithm that achieves both: it uses O(kp) memory (meaning storage of any kind) and is able to compute the k-dimensional spike with O(p log p) samplecomplexity – the first algorithm of its kind. While our theoretical analysis focuses on the spiked covariance model, our simulations show that our algorithm is successful on much more general models for the data. 1
6 0.65739483 116 nips-2013-Fantope Projection and Selection: A near-optimal convex relaxation of sparse PCA
7 0.62012631 120 nips-2013-Faster Ridge Regression via the Subsampled Randomized Hadamard Transform
8 0.5615924 117 nips-2013-Fast Algorithms for Gaussian Noise Invariant Independent Component Analysis
9 0.52513915 286 nips-2013-Robust learning of low-dimensional dynamics from large neural ensembles
10 0.49282491 256 nips-2013-Probabilistic Principal Geodesic Analysis
11 0.49255884 254 nips-2013-Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms
12 0.47632828 284 nips-2013-Robust Spatial Filtering with Beta Divergence
13 0.45953161 224 nips-2013-On the Sample Complexity of Subspace Learning
14 0.45007452 186 nips-2013-Matrix factorization with binary components
15 0.44300917 247 nips-2013-Phase Retrieval using Alternating Minimization
16 0.4426522 91 nips-2013-Dirty Statistical Models
17 0.43658558 307 nips-2013-Speedup Matrix Completion with Side Information: Application to Multi-Label Learning
18 0.4322817 342 nips-2013-Unsupervised Spectral Learning of Finite State Transducers
19 0.42904788 222 nips-2013-On the Linear Convergence of the Proximal Gradient Method for Trace Norm Regularization
20 0.42603478 302 nips-2013-Sparse Inverse Covariance Estimation with Calibration
topicId topicWeight
[(16, 0.021), (33, 0.169), (34, 0.116), (40, 0.149), (41, 0.028), (49, 0.029), (56, 0.099), (70, 0.032), (85, 0.135), (89, 0.043), (93, 0.063), (95, 0.023)]
simIndex simValue paperId paperTitle
same-paper 1 0.8944543 285 nips-2013-Robust Transfer Principal Component Analysis with Rank Constraints
Author: Yuhong Guo
Abstract: Principal component analysis (PCA), a well-established technique for data analysis and processing, provides a convenient form of dimensionality reduction that is effective for cleaning small Gaussian noises presented in the data. However, the applicability of standard principal component analysis in real scenarios is limited by its sensitivity to large errors. In this paper, we tackle the challenge problem of recovering data corrupted with errors of high magnitude by developing a novel robust transfer principal component analysis method. Our method is based on the assumption that useful information for the recovery of a corrupted data matrix can be gained from an uncorrupted related data matrix. Specifically, we formulate the data recovery problem as a joint robust principal component analysis problem on the two data matrices, with common principal components shared across matrices and individual principal components specific to each data matrix. The formulated optimization problem is a minimization problem over a convex objective function but with non-convex rank constraints. We develop an efficient proximal projected gradient descent algorithm to solve the proposed optimization problem with convergence guarantees. Our empirical results over image denoising tasks show the proposed method can effectively recover images with random large errors, and significantly outperform both standard PCA and robust PCA with rank constraints. 1
2 0.86578125 168 nips-2013-Learning to Pass Expectation Propagation Messages
Author: Nicolas Heess, Daniel Tarlow, John Winn
Abstract: Expectation Propagation (EP) is a popular approximate posterior inference algorithm that often provides a fast and accurate alternative to sampling-based methods. However, while the EP framework in theory allows for complex nonGaussian factors, there is still a significant practical barrier to using them within EP, because doing so requires the implementation of message update operators, which can be difficult and require hand-crafted approximations. In this work, we study the question of whether it is possible to automatically derive fast and accurate EP updates by learning a discriminative model (e.g., a neural network or random forest) to map EP message inputs to EP message outputs. We address the practical concerns that arise in the process, and we provide empirical analysis on several challenging and diverse factors, indicating that there is a space of factors where this approach appears promising. 1
3 0.84470105 7 nips-2013-A Gang of Bandits
Author: Nicolò Cesa-Bianchi, Claudio Gentile, Giovanni Zappella
Abstract: Multi-armed bandit problems formalize the exploration-exploitation trade-offs arising in several industrially relevant applications, such as online advertisement and, more generally, recommendation systems. In many cases, however, these applications have a strong social component, whose integration in the bandit algorithm could lead to a dramatic performance increase. For instance, content may be served to a group of users by taking advantage of an underlying network of social relationships among them. In this paper, we introduce novel algorithmic approaches to the solution of such networked bandit problems. More specifically, we design and analyze a global recommendation strategy which allocates a bandit algorithm to each network node (user) and allows it to “share” signals (contexts and payoffs) with the neghboring nodes. We then derive two more scalable variants of this strategy based on different ways of clustering the graph nodes. We experimentally compare the algorithm and its variants to state-of-the-art methods for contextual bandits that do not use the relational information. Our experiments, carried out on synthetic and real-world datasets, show a consistent increase in prediction performance obtained by exploiting the network structure. 1
4 0.84309167 332 nips-2013-Tracking Time-varying Graphical Structure
Author: Erich Kummerfeld, David Danks
Abstract: Structure learning algorithms for graphical models have focused almost exclusively on stable environments in which the underlying generative process does not change; that is, they assume that the generating model is globally stationary. In real-world environments, however, such changes often occur without warning or signal. Real-world data often come from generating models that are only locally stationary. In this paper, we present LoSST, a novel, heuristic structure learning algorithm that tracks changes in graphical model structure or parameters in a dynamic, real-time manner. We show by simulation that the algorithm performs comparably to batch-mode learning when the generating graphical structure is globally stationary, and significantly better when it is only locally stationary. 1
5 0.83562654 272 nips-2013-Regularized Spectral Clustering under the Degree-Corrected Stochastic Blockmodel
Author: Tai Qin, Karl Rohe
Abstract: Spectral clustering is a fast and popular algorithm for finding clusters in networks. Recently, Chaudhuri et al. [1] and Amini et al. [2] proposed inspired variations on the algorithm that artificially inflate the node degrees for improved statistical performance. The current paper extends the previous statistical estimation results to the more canonical spectral clustering algorithm in a way that removes any assumption on the minimum degree and provides guidance on the choice of the tuning parameter. Moreover, our results show how the “star shape” in the eigenvectors–a common feature of empirical networks–can be explained by the Degree-Corrected Stochastic Blockmodel and the Extended Planted Partition model, two statistical models that allow for highly heterogeneous degrees. Throughout, the paper characterizes and justifies several of the variations of the spectral clustering algorithm in terms of these models. 1
6 0.83465195 196 nips-2013-Modeling Overlapping Communities with Node Popularities
7 0.8289066 104 nips-2013-Efficient Online Inference for Bayesian Nonparametric Relational Models
8 0.8266291 206 nips-2013-Near-Optimal Entrywise Sampling for Data Matrices
9 0.82183707 232 nips-2013-Online PCA for Contaminated Data
10 0.81677032 35 nips-2013-Analyzing the Harmonic Structure in Graph-Based Learning
11 0.81641138 140 nips-2013-Improved and Generalized Upper Bounds on the Complexity of Policy Iteration
12 0.81448293 282 nips-2013-Robust Multimodal Graph Matching: Sparse Coding Meets Graph Matching
13 0.81262499 99 nips-2013-Dropout Training as Adaptive Regularization
14 0.80994356 45 nips-2013-BIG & QUIC: Sparse Inverse Covariance Estimation for a Million Variables
15 0.80940276 207 nips-2013-Near-optimal Anomaly Detection in Graphs using Lovasz Extended Scan Statistic
16 0.80860645 350 nips-2013-Wavelets on Graphs via Deep Learning
17 0.80824536 288 nips-2013-Scalable Influence Estimation in Continuous-Time Diffusion Networks
18 0.80707294 291 nips-2013-Sensor Selection in High-Dimensional Gaussian Trees with Nuisances
19 0.80701774 316 nips-2013-Stochastic blockmodel approximation of a graphon: Theory and consistent estimation
20 0.80664772 300 nips-2013-Solving the multi-way matching problem by permutation synchronization