nips nips2013 nips2013-116 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Vincent Q. Vu, Juhee Cho, Jing Lei, Karl Rohe
Abstract: We propose a novel convex relaxation of sparse principal subspace estimation based on the convex hull of rank-d projection matrices (the Fantope). The convex problem can be solved efficiently using alternating direction method of multipliers (ADMM). We establish a near-optimal convergence rate, in terms of the sparsity, ambient dimension, and sample size, for estimation of the principal subspace of a general covariance matrix without assuming the spiked covariance model. In the special case of d = 1, our result implies the near-optimality of DSPCA (d’Aspremont et al. [1]) even when the solution is not rank 1. We also provide a general theoretical framework for analyzing the statistical properties of the method for arbitrary input matrices that extends the applicability and provable guarantees to a wide array of settings. We demonstrate this with an application to Kendall’s tau correlation matrices and transelliptical component analysis. 1
Reference: text
sentIndex sentText sentNum sentScore
1 Fantope Projection and Selection: A near-optimal convex relaxation of sparse PCA Vincent Q. [sent-1, score-0.202]
2 edu Abstract We propose a novel convex relaxation of sparse principal subspace estimation based on the convex hull of rank-d projection matrices (the Fantope). [sent-9, score-0.994]
3 The convex problem can be solved efficiently using alternating direction method of multipliers (ADMM). [sent-10, score-0.149]
4 We establish a near-optimal convergence rate, in terms of the sparsity, ambient dimension, and sample size, for estimation of the principal subspace of a general covariance matrix without assuming the spiked covariance model. [sent-11, score-1.031]
5 We also provide a general theoretical framework for analyzing the statistical properties of the method for arbitrary input matrices that extends the applicability and provable guarantees to a wide array of settings. [sent-14, score-0.238]
6 We demonstrate this with an application to Kendall’s tau correlation matrices and transelliptical component analysis. [sent-15, score-0.471]
7 PCA uses the eigenvectors of the sample covariance matrix to compute the linear combinations of variables with the largest variance. [sent-17, score-0.381]
8 These principal directions of variation explain the covariation of the variables and can be exploited for dimension reduction. [sent-18, score-0.381]
9 the estimated principal directions can be noisy and unreliable [see 2, and the references therein]. [sent-21, score-0.392]
10 Over the past decade, there has been a fever of activity to address the drawbacks of PCA with a class of techniques called sparse PCA that combine the essence of PCA with the assumption that the phenomena of interest depend mostly on a few variables. [sent-22, score-0.103]
11 However, much of this work has focused on the first principal component. [sent-28, score-0.315]
12 One rationale behind this focus is by analogy with ordinary PCA: additional components can be found by iteratively deflating the input matrix to account for variation uncovered by previous components. [sent-29, score-0.181]
13 However, the use of deflation with sparse PCA entails complications of non-orthogonality, sub-optimality, and multiple tuning parameters [15]. [sent-30, score-0.103]
14 The principal directions of variation correspond to eigenvectors of some population matrix Σ. [sent-32, score-0.603]
15 There is no reason to assume a priori that the d largest eigenvalues 1 of Σ are distinct. [sent-33, score-0.211]
16 Even if the eigenvalues are distinct, estimates of individual eigenvectors can be unreliable if the gap between their eigenvalues is small. [sent-34, score-0.462]
17 So it seems reasonable, if not necessary, to de-emphasize eigenvectors and to instead focus on their span, i. [sent-35, score-0.11]
18 There has been relatively little work on the problem of estimating the principal subspace or even multiple eigenvectors simultaneously. [sent-38, score-0.69]
19 Sole exceptions are the diagonal thresholding method [2], which is just ordinary PCA applied to the subset of variables with largest marginal sample variance, or refinements such as iterative thresholding [16], which use diagonal thresholding as an initial estimate. [sent-40, score-0.471]
20 In this paper, we propose a novel convex optimization problem to estimate the d-dimensional principal subspace of a population matrix Σ based on a noisy input matrix S. [sent-46, score-0.853]
21 This rate turns out to be nearly minimax optimal (Corollary 3. [sent-48, score-0.106]
22 3), and under additional assumptions on signal strength, it also allows us to recover the support of the principal subspace (Theorem 3. [sent-49, score-0.58]
23 3) that yield nearoptimal statistical guarantees for other choices of input matrix, such as Pearson’s correlation and Kendall’s tau correlation matrices (Corollary 3. [sent-52, score-0.505]
24 It is based on a convex body, called the Fantope, that provides a tight relaxation for simultaneous rank and orthogonality constraints on the positive semidefinite cone. [sent-55, score-0.144]
25 We find that an alternating direction method of multipliers (ADMM) algorithm [e. [sent-57, score-0.099]
26 We formulate the sparse principal subspace problem as a novel semidefinite program with a Fantope constraint (Section 2). [sent-62, score-0.683]
27 We show that the proposed estimator achieves a near optimal rate of convergence in subspace estimation without assumptions on the rank of the solution or restrictive spiked covariance models. [sent-64, score-0.603]
28 We provide a general theoretical framework that accommodates other matrices, in addition to sample covariance, such as Pearson’s correlation and Kendall’s tau. [sent-67, score-0.109]
29 Related work Existing work most closely related to ours is the DSPCA approach for single component sparse PCA that was first proposed in [1]. [sent-71, score-0.161]
30 Subsequently, there has been theoretical analysis under a spiked covariance model and restrictions on the entries of the eigenvectors [11], and algorithmic developments including block coordinate ascent [9] and ADMM [20]. [sent-72, score-0.38]
31 The d > 1 case requires invention and novel techniques to deal with a convex body, the Fantope, that has never before been used in sparse PCA. [sent-74, score-0.153]
32 ∥x∥q is the usual ℓq norm with ∥x∥0 defined as the number of nonzero entries of x. [sent-76, score-0.108]
33 For a symmetric matrix A, we define λ1 (A) ≥ λ2 (A) ≥ · · · to be the eigenvalues of A with multiplicity. [sent-80, score-0.286]
34 For two subspaces M1 and M2 , sin Θ(M1 , M2 ) is defined to be the matrix whose diagonals are the sines of the canonical angles between the two subspaces [see 21, §VII]. [sent-82, score-0.213]
35 The first insight is a variational characterization of the principal subspace of a symmetric matrix. [sent-88, score-0.637]
36 The sum of the d largest eigenvalues of a symmetric matrix A can be expressed as d (a) λi (A) = i=1 (b) max ⟨A, V V T ⟩ = max ⟨A, X⟩ . [sent-89, score-0.342]
37 V T V =Id X∈F d (2) Identity (a) is an extremal property known as Ky Fan’s maximum principle [23]; (b) is based on the less well known observation that F d = conv({V V T : V T V = Id }) , i. [sent-90, score-0.076]
38 the extremal points of F d are the rank-d projection matrices. [sent-92, score-0.179]
39 The second insight is a connection between the (1, 1)-norm and a notion of subspace sparsity introduced by [18]. [sent-94, score-0.319]
40 Thus, ∥X∥1,1 is a convex relaxation of what [18] call row sparsity for subspaces. [sent-100, score-0.153]
41 [18] proposed solving an equivalent form of the above optimization problem and showed that the estimator corresponding to its global solution is minimax rate optimal under a general statistical model for S. [sent-102, score-0.203]
42 Subspace estimation The constraint X ∈ F d guarantees that its rank is ≥ d. [sent-105, score-0.076]
43 However X need not be an extremal point of F d , i. [sent-106, score-0.076]
44 In order to obtain a proper d-dimensional subspace estimate, we can extract the d leading eigenvectors of X, say V , and form the projection matrix Π = V V T . [sent-109, score-0.552]
45 The projection is unique, but the choice of basis is arbitrary. [sent-110, score-0.103]
46 We can follow the convention of standard PCA by choosing an orthogonal matrix O so that (V O)T S(V O) is diagonal, and take V O as the orthonormal basis for the subspace estimate. [sent-111, score-0.372]
47 3 3 Theory In this section we describe our theoretical framework for studying the statistical properties of X given by (1) with arbitrary input matrices that satisfy the following assumptions. [sent-112, score-0.136]
48 The projection Π onto the subspace spanned by the eigenvectors of Σ corresponding to its d largest eigenvalues satisfies ∥Π∥2,0 ≤ s, or equivalently, ∥diag(Π)∥0 ≤ s. [sent-118, score-0.689]
49 1 below) implies that the statistical properties of the error of the estimator ∆ := X − Π , can be derived, in many cases, by routine analysis of the entrywise errors of the input matrix W := S − Σ . [sent-120, score-0.208]
50 It is different from the “restricted strong convexity” in [25], a notion of curvature tailored for regularization in the form of penalizing an unconstrained convex objective. [sent-126, score-0.15]
51 Let A be a symmetric matrix and E be the projection onto the subspace spanned by the eigenvectors of A corresponding to its d largest eigenvalues λ1 ≥ λ2 ≥ · · · . [sent-131, score-0.82]
52 1 first appeared in [18] with the additional restriction that F is a projection matrix. [sent-134, score-0.103]
53 The following is an immediate corollary of Lemma 3. [sent-136, score-0.089]
54 Let A,B be symmetric matrices and MA , MB be their respective d-dimensional principal subspaces. [sent-140, score-0.431]
55 3] is that it does not require a bound on the differences between eigenvalues of A and eigenvalues of B. [sent-146, score-0.31]
56 Our primary use of this result is to show that even if rank(X) ̸= d, its principal subspace will be close to that of Π if ∆ is small. [sent-148, score-0.58]
57 If M is the principal d-dimensional subspace of Σ and M is the principal d-dimensional subspace of X, then √ |||sin Θ(M, M)|||2 ≤ 2|||∆|||2 . [sent-151, score-1.16]
58 The next theorem gives a sufficient condition for support recovery by diagonal thresholding X. [sent-161, score-0.165]
59 This is not the most general result possible, but it allows us to illustrate the statistical properties of X for two different types of input matrices: sample covariance and Kendall’s tau correlation. [sent-169, score-0.4]
60 λ=σ (4) log p/n ≤ σ , |||X − Π|||2 ≤ 4σ s δ log p/n Sample covariance Consider the setting where the input matrix is the sample covariance matrix of a random sample of size n > 1 from a sub-Gaussian distribution. [sent-174, score-0.467]
61 Under this condition we have the following corollary of Theorem 3. [sent-176, score-0.089]
62 sample of size n > 1 from a sub-Gaussian distribution (5) with population covariance matrix Σ. [sent-183, score-0.253]
63 Comparing with the minimax lower bounds derived in [17, 18], we see that the rate in Corollary 3. [sent-185, score-0.106]
64 3 is roughly larger than the optimal minimax rate by a factor of λ1 /λd+1 · s/d The first term only becomes important in the near-degenerate case where λd+1 ≪ λ1 . [sent-186, score-0.106]
65 The second term is likely to be unimprovable without additional conditions on S and Σ such as a spiked covariance model. [sent-188, score-0.236]
66 5 Kendall’s tau Kendall’s tau correlation provides a robust and nonparametric alternative to ordinary (Pearson) correlation. [sent-190, score-0.481]
67 p-variate random vectors, the theoretical and empirical versions of Kendall’s tau correlation matrix are τij := Cor sign(Y1i − Y2i ) , sign(Y1j − Y2j ) 2 τij := ˆ sign(Ysi − Yti ) sign(Ysj − Ytj ) . [sent-194, score-0.334]
68 However, under additional conditions, the following lemma gives meaning to T by extending (6) to a wide class of distributions, called transelliptical by [29], that includes the nonparanormal. [sent-204, score-0.181]
69 , fp (Y1p ) ˜ has elliptical distribution with scatter matrix Σ, then ˜ ˜ ˜ Tij = Σij / Σii Σjj . [sent-216, score-0.154]
70 4 shows that Kendall’s tau can be used in place of the sample correlation matrix for a wide class of distributions without much loss of efficiency. [sent-222, score-0.403]
71 ρ 2 Y Sλ/ρ is the elementwise soft thresholding operator [e. [sent-234, score-0.129]
72 PF d is the Euclidean projection onto F d and is given in closed form in the following lemma. [sent-239, score-0.103]
73 If X = i γi ui ui is a spectral decomposition of X, then + + T PF d (X) = i γi (θ)ui ui , where γi (θ) = min(max(γi − θ, 0), 1) and θ satisfies the equation + i γi (θ) = d. [sent-242, score-0.138]
74 Thus, PF d (X) involves computing an eigendecomposition of Y , and then modifying the eigenvalues by solving a monotone, piecewise linear equation. [sent-243, score-0.155]
75 These methods obtain multiple component estimates by taking the kth component estimate vk from input matrix Sk , ˆ and then re-running the method with the deflated input matrix: Sk+1 = (I − vk vk )Sk (I − vk vk ). [sent-249, score-0.619]
76 ˆ ˆT ˆ ˆT The resulting d-dimensional principal subspace estimate is the span of v1 , . [sent-250, score-0.58]
77 We generated input matrices by sampling n = 100, i. [sent-256, score-0.096]
78 observations from a Np (0, Σ), p = 200 distribution and taking S to be the usual sample covariance matrix. [sent-259, score-0.141]
79 We generated V by sampling its nonzero entries from a standard Gaussian distribution and then orthnormalizing V while retaining the desired sparsity pattern. [sent-262, score-0.131]
80 We then embedded Π inside the population covariance matrix Σ = αΠ + (I − Π)Σ0 (I − Π), where Σ0 is a Wishart matrix with p degrees of freedom and α > 0 is chosen so that the effective noise level (in the optimal minimax rate [18]), σ 2 = λ1 λd+1 /(λd − λd+1 ) ∈ {1, 10}. [sent-264, score-0.402]
81 2 Figure 1 summarizes the resulting mean squared error |||Π − Π|||2 across 100 replicates for each of the different combinations of simulation parameters. [sent-265, score-0.089]
82 6 Discussion Estimating sparse principal subspaces in high-dimensions poses both computational and statistical challenges. [sent-271, score-0.499]
83 The contribution of this paper—a novel SDP based estimator, an efficient algorithm, and strong statistical guarantees for a wide array of input matrices—is a significant leap forward on both fronts. [sent-272, score-0.179]
84 Techniques like cross-validation need to be carefully formulated and studied in the context of principal subspace estimation. [sent-281, score-0.58]
85 “A direct formulation of sparse PCA using semidefinite programming ”. [sent-285, score-0.103]
86 “On consistency and sparsity for principal components analysis in high dimensions ”. [sent-293, score-0.369]
87 “A modified principal component technique based on the Lasso ”. [sent-303, score-0.373]
88 “Sparse principal component analysis via regularized low rank matrix approximation ”. [sent-318, score-0.492]
89 “A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis ”. [sent-326, score-0.57]
90 “Generalized power method for sparse principal component analysis ”. [sent-331, score-0.476]
91 “A majorization-minimization approach to the sparse generalized eigenvalue problem ”. [sent-342, score-0.137]
92 “Large-scale sparse principal component analysis with application to text data ”. [sent-350, score-0.476]
93 “High-dimensional analysis of semidefinite relaxations for sparse principal components ”. [sent-368, score-0.418]
94 “Minimax bounds for sparse pca with noisy high-dimensional data ”. [sent-376, score-0.319]
95 “Minimax rates of estimation for sparse PCA in high dimensions ”. [sent-386, score-0.103]
96 “Minimax sparse principal subspace estimation in high dimensions ”. [sent-436, score-0.683]
97 “Distributed optimization and statistical learning via the alternating direction method of multipliers ”. [sent-445, score-0.139]
98 “Alternating direction method of multipliers for sparse principal component analysis ”. [sent-451, score-0.535]
99 “On a theorem of Weyl concerning eigenvalues of linear transformations I ”. [sent-465, score-0.193]
100 “On the sum of the largest eigenvalues of a symmetric matrix ”. [sent-472, score-0.342]
wordName wordTfidf (topN-words)
[('fantope', 0.376), ('principal', 0.315), ('subspace', 0.265), ('pca', 0.216), ('admm', 0.203), ('kendall', 0.189), ('dspca', 0.188), ('tau', 0.182), ('fps', 0.166), ('eigenvalues', 0.155), ('ation', 0.136), ('spiked', 0.126), ('covariance', 0.11), ('eigenvectors', 0.11), ('semide', 0.107), ('minimax', 0.106), ('sparse', 0.103), ('projection', 0.103), ('pf', 0.095), ('transelliptical', 0.094), ('xjj', 0.094), ('jj', 0.093), ('thresholding', 0.091), ('corollary', 0.089), ('correlation', 0.078), ('extremal', 0.076), ('sdp', 0.076), ('matrix', 0.074), ('vk', 0.071), ('gpower', 0.063), ('probablity', 0.063), ('spc', 0.063), ('weyl', 0.063), ('curvature', 0.061), ('multipliers', 0.059), ('vu', 0.059), ('matrices', 0.059), ('component', 0.058), ('symmetric', 0.057), ('ij', 0.057), ('estimator', 0.057), ('sin', 0.057), ('largest', 0.056), ('pearson', 0.056), ('jcgs', 0.055), ('replicates', 0.055), ('sparsity', 0.054), ('sign', 0.054), ('cor', 0.051), ('convex', 0.05), ('relaxation', 0.049), ('lemma', 0.049), ('ui', 0.046), ('ky', 0.045), ('rank', 0.045), ('fp', 0.044), ('tij', 0.044), ('nonzero', 0.043), ('mb', 0.042), ('aspremont', 0.042), ('unreliable', 0.042), ('subspaces', 0.041), ('jasa', 0.04), ('wisconsin', 0.04), ('jmlr', 0.04), ('alternating', 0.04), ('statistical', 0.04), ('semiparametric', 0.039), ('regularization', 0.039), ('ordinary', 0.039), ('population', 0.038), ('theorem', 0.038), ('elementwise', 0.038), ('sk', 0.038), ('body', 0.038), ('tr', 0.038), ('wide', 0.038), ('id', 0.038), ('frobenius', 0.038), ('input', 0.037), ('dual', 0.037), ('diagonal', 0.036), ('elliptical', 0.036), ('directions', 0.035), ('fj', 0.035), ('madison', 0.035), ('entries', 0.034), ('simulation', 0.034), ('penalty', 0.034), ('eigenvalue', 0.034), ('fan', 0.034), ('convention', 0.033), ('array', 0.033), ('et', 0.033), ('disjoint', 0.033), ('monotone', 0.032), ('variation', 0.031), ('guarantees', 0.031), ('sample', 0.031), ('norm', 0.031)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999958 116 nips-2013-Fantope Projection and Selection: A near-optimal convex relaxation of sparse PCA
Author: Vincent Q. Vu, Juhee Cho, Jing Lei, Karl Rohe
Abstract: We propose a novel convex relaxation of sparse principal subspace estimation based on the convex hull of rank-d projection matrices (the Fantope). The convex problem can be solved efficiently using alternating direction method of multipliers (ADMM). We establish a near-optimal convergence rate, in terms of the sparsity, ambient dimension, and sample size, for estimation of the principal subspace of a general covariance matrix without assuming the spiked covariance model. In the special case of d = 1, our result implies the near-optimality of DSPCA (d’Aspremont et al. [1]) even when the solution is not rank 1. We also provide a general theoretical framework for analyzing the statistical properties of the method for arbitrary input matrices that extends the applicability and provable guarantees to a wide array of settings. We demonstrate this with an application to Kendall’s tau correlation matrices and transelliptical component analysis. 1
2 0.23696919 91 nips-2013-Dirty Statistical Models
Author: Eunho Yang, Pradeep Ravikumar
Abstract: We provide a unified framework for the high-dimensional analysis of “superposition-structured” or “dirty” statistical models: where the model parameters are a superposition of structurally constrained parameters. We allow for any number and types of structures, and any statistical model. We consider the general class of M -estimators that minimize the sum of any loss function, and an instance of what we call a “hybrid” regularization, that is the infimal convolution of weighted regularization functions, one for each structural component. We provide corollaries showcasing our unified framework for varied statistical models such as linear regression, multiple regression and principal component analysis, over varied superposition structures. 1
3 0.21984677 188 nips-2013-Memory Limited, Streaming PCA
Author: Ioannis Mitliagkas, Constantine Caramanis, Prateek Jain
Abstract: We consider streaming, one-pass principal component analysis (PCA), in the highdimensional regime, with limited memory. Here, p-dimensional samples are presented sequentially, and the goal is to produce the k-dimensional subspace that best approximates these points. Standard algorithms require O(p2 ) memory; meanwhile no algorithm can do better than O(kp) memory, since this is what the output itself requires. Memory (or storage) complexity is most meaningful when understood in the context of computational and sample complexity. Sample complexity for high-dimensional PCA is typically studied in the setting of the spiked covariance model, where p-dimensional points are generated from a population covariance equal to the identity (white noise) plus a low-dimensional perturbation (the spike) which is the signal to be recovered. It is now well-understood that the spike can be recovered when the number of samples, n, scales proportionally with the dimension, p. Yet, all algorithms that provably achieve this, have memory complexity O(p2 ). Meanwhile, algorithms with memory-complexity O(kp) do not have provable bounds on sample complexity comparable to p. We present an algorithm that achieves both: it uses O(kp) memory (meaning storage of any kind) and is able to compute the k-dimensional spike with O(p log p) samplecomplexity – the first algorithm of its kind. While our theoretical analysis focuses on the spiked covariance model, our simulations show that our algorithm is successful on much more general models for the data. 1
4 0.20442697 285 nips-2013-Robust Transfer Principal Component Analysis with Rank Constraints
Author: Yuhong Guo
Abstract: Principal component analysis (PCA), a well-established technique for data analysis and processing, provides a convenient form of dimensionality reduction that is effective for cleaning small Gaussian noises presented in the data. However, the applicability of standard principal component analysis in real scenarios is limited by its sensitivity to large errors. In this paper, we tackle the challenge problem of recovering data corrupted with errors of high magnitude by developing a novel robust transfer principal component analysis method. Our method is based on the assumption that useful information for the recovery of a corrupted data matrix can be gained from an uncorrupted related data matrix. Specifically, we formulate the data recovery problem as a joint robust principal component analysis problem on the two data matrices, with common principal components shared across matrices and individual principal components specific to each data matrix. The formulated optimization problem is a minimization problem over a convex objective function but with non-convex rank constraints. We develop an efficient proximal projected gradient descent algorithm to solve the proposed optimization problem with convergence guarantees. Our empirical results over image denoising tasks show the proposed method can effectively recover images with random large errors, and significantly outperform both standard PCA and robust PCA with rank constraints. 1
5 0.20176381 224 nips-2013-On the Sample Complexity of Subspace Learning
Author: Alessandro Rudi, Guillermo D. Canas, Lorenzo Rosasco
Abstract: A large number of algorithms in machine learning, from principal component analysis (PCA), and its non-linear (kernel) extensions, to more recent spectral embedding and support estimation methods, rely on estimating a linear subspace from samples. In this paper we introduce a general formulation of this problem and derive novel learning error estimates. Our results rely on natural assumptions on the spectral properties of the covariance operator associated to the data distribution, and hold for a wide class of metrics between subspaces. As special cases, we discuss sharp error estimates for the reconstruction properties of PCA and spectral support estimation. Key to our analysis is an operator theoretic approach that has broad applicability to spectral learning methods. 1
6 0.18705294 286 nips-2013-Robust learning of low-dimensional dynamics from large neural ensembles
7 0.17898272 233 nips-2013-Online Robust PCA via Stochastic Optimization
8 0.1609176 259 nips-2013-Provable Subspace Clustering: When LRR meets SSC
9 0.14870434 232 nips-2013-Online PCA for Contaminated Data
10 0.14420085 137 nips-2013-High-Dimensional Gaussian Process Bandits
11 0.13598032 283 nips-2013-Robust Sparse Principal Component Regression under the High Dimensional Elliptical Model
12 0.12096794 256 nips-2013-Probabilistic Principal Geodesic Analysis
13 0.12091494 179 nips-2013-Low-Rank Matrix and Tensor Completion via Adaptive Sampling
14 0.11592276 146 nips-2013-Large Scale Distributed Sparse Precision Estimation
15 0.11455495 314 nips-2013-Stochastic Optimization of PCA with Capped MSG
16 0.10976581 192 nips-2013-Minimax Theory for High-dimensional Gaussian Mixtures with Sparse Mean Separation
17 0.109702 302 nips-2013-Sparse Inverse Covariance Estimation with Calibration
18 0.1037762 215 nips-2013-On Decomposing the Proximal Map
19 0.10156353 321 nips-2013-Supervised Sparse Analysis and Synthesis Operators
20 0.096859679 185 nips-2013-Matrix Completion From any Given Set of Observations
topicId topicWeight
[(0, 0.247), (1, 0.119), (2, 0.178), (3, 0.139), (4, -0.072), (5, 0.05), (6, -0.139), (7, 0.083), (8, -0.221), (9, -0.077), (10, -0.036), (11, 0.173), (12, 0.064), (13, 0.164), (14, 0.016), (15, 0.023), (16, 0.031), (17, -0.005), (18, 0.019), (19, -0.042), (20, 0.026), (21, -0.035), (22, -0.014), (23, -0.025), (24, 0.156), (25, -0.026), (26, 0.063), (27, 0.078), (28, 0.007), (29, 0.037), (30, -0.013), (31, -0.019), (32, -0.028), (33, -0.061), (34, -0.003), (35, 0.047), (36, -0.061), (37, 0.032), (38, 0.058), (39, 0.044), (40, 0.036), (41, -0.003), (42, -0.004), (43, 0.072), (44, -0.054), (45, 0.105), (46, -0.06), (47, 0.053), (48, -0.037), (49, 0.005)]
simIndex simValue paperId paperTitle
same-paper 1 0.95675313 116 nips-2013-Fantope Projection and Selection: A near-optimal convex relaxation of sparse PCA
Author: Vincent Q. Vu, Juhee Cho, Jing Lei, Karl Rohe
Abstract: We propose a novel convex relaxation of sparse principal subspace estimation based on the convex hull of rank-d projection matrices (the Fantope). The convex problem can be solved efficiently using alternating direction method of multipliers (ADMM). We establish a near-optimal convergence rate, in terms of the sparsity, ambient dimension, and sample size, for estimation of the principal subspace of a general covariance matrix without assuming the spiked covariance model. In the special case of d = 1, our result implies the near-optimality of DSPCA (d’Aspremont et al. [1]) even when the solution is not rank 1. We also provide a general theoretical framework for analyzing the statistical properties of the method for arbitrary input matrices that extends the applicability and provable guarantees to a wide array of settings. We demonstrate this with an application to Kendall’s tau correlation matrices and transelliptical component analysis. 1
2 0.76939738 224 nips-2013-On the Sample Complexity of Subspace Learning
Author: Alessandro Rudi, Guillermo D. Canas, Lorenzo Rosasco
Abstract: A large number of algorithms in machine learning, from principal component analysis (PCA), and its non-linear (kernel) extensions, to more recent spectral embedding and support estimation methods, rely on estimating a linear subspace from samples. In this paper we introduce a general formulation of this problem and derive novel learning error estimates. Our results rely on natural assumptions on the spectral properties of the covariance operator associated to the data distribution, and hold for a wide class of metrics between subspaces. As special cases, we discuss sharp error estimates for the reconstruction properties of PCA and spectral support estimation. Key to our analysis is an operator theoretic approach that has broad applicability to spectral learning methods. 1
3 0.76148039 91 nips-2013-Dirty Statistical Models
Author: Eunho Yang, Pradeep Ravikumar
Abstract: We provide a unified framework for the high-dimensional analysis of “superposition-structured” or “dirty” statistical models: where the model parameters are a superposition of structurally constrained parameters. We allow for any number and types of structures, and any statistical model. We consider the general class of M -estimators that minimize the sum of any loss function, and an instance of what we call a “hybrid” regularization, that is the infimal convolution of weighted regularization functions, one for each structural component. We provide corollaries showcasing our unified framework for varied statistical models such as linear regression, multiple regression and principal component analysis, over varied superposition structures. 1
4 0.74486917 233 nips-2013-Online Robust PCA via Stochastic Optimization
Author: Jiashi Feng, Huan Xu, Shuicheng Yan
Abstract: Robust PCA methods are typically based on batch optimization and have to load all the samples into memory during optimization. This prevents them from efficiently processing big data. In this paper, we develop an Online Robust PCA (OR-PCA) that processes one sample per time instance and hence its memory cost is independent of the number of samples, significantly enhancing the computation and storage efficiency. The proposed OR-PCA is based on stochastic optimization of an equivalent reformulation of the batch RPCA. Indeed, we show that OR-PCA provides a sequence of subspace estimations converging to the optimum of its batch counterpart and hence is provably robust to sparse corruption. Moreover, OR-PCA can naturally be applied for tracking dynamic subspace. Comprehensive simulations on subspace recovering and tracking demonstrate the robustness and efficiency advantages of the OR-PCA over online PCA and batch RPCA methods. 1
5 0.73607343 259 nips-2013-Provable Subspace Clustering: When LRR meets SSC
Author: Yu-Xiang Wang, Huan Xu, Chenlei Leng
Abstract: Sparse Subspace Clustering (SSC) and Low-Rank Representation (LRR) are both considered as the state-of-the-art methods for subspace clustering. The two methods are fundamentally similar in that both are convex optimizations exploiting the intuition of “Self-Expressiveness”. The main difference is that SSC minimizes the vector 1 norm of the representation matrix to induce sparsity while LRR minimizes nuclear norm (aka trace norm) to promote a low-rank structure. Because the representation matrix is often simultaneously sparse and low-rank, we propose a new algorithm, termed Low-Rank Sparse Subspace Clustering (LRSSC), by combining SSC and LRR, and develops theoretical guarantees of when the algorithm succeeds. The results reveal interesting insights into the strength and weakness of SSC and LRR and demonstrate how LRSSC can take the advantages of both methods in preserving the “Self-Expressiveness Property” and “Graph Connectivity” at the same time. 1
6 0.72713137 188 nips-2013-Memory Limited, Streaming PCA
7 0.72012502 314 nips-2013-Stochastic Optimization of PCA with Capped MSG
8 0.7149331 285 nips-2013-Robust Transfer Principal Component Analysis with Rank Constraints
9 0.64284623 120 nips-2013-Faster Ridge Regression via the Subsampled Randomized Hadamard Transform
10 0.63882601 302 nips-2013-Sparse Inverse Covariance Estimation with Calibration
11 0.62936497 117 nips-2013-Fast Algorithms for Gaussian Noise Invariant Independent Component Analysis
12 0.60957766 232 nips-2013-Online PCA for Contaminated Data
13 0.60608828 286 nips-2013-Robust learning of low-dimensional dynamics from large neural ensembles
14 0.60281891 297 nips-2013-Sketching Structured Matrices for Faster Nonlinear Regression
15 0.59574348 73 nips-2013-Convex Relaxations for Permutation Problems
16 0.59185618 256 nips-2013-Probabilistic Principal Geodesic Analysis
17 0.58653039 284 nips-2013-Robust Spatial Filtering with Beta Divergence
18 0.56591421 130 nips-2013-Generalizing Analytic Shrinkage for Arbitrary Covariance Structures
19 0.56323725 186 nips-2013-Matrix factorization with binary components
20 0.55485713 137 nips-2013-High-Dimensional Gaussian Process Bandits
topicId topicWeight
[(2, 0.018), (16, 0.038), (33, 0.146), (34, 0.141), (41, 0.035), (47, 0.173), (49, 0.038), (56, 0.114), (70, 0.021), (85, 0.054), (89, 0.068), (93, 0.053), (95, 0.033)]
simIndex simValue paperId paperTitle
1 0.94868243 203 nips-2013-Multilinear Dynamical Systems for Tensor Time Series
Author: Mark Rogers, Lei Li, Stuart Russell
Abstract: Data in the sciences frequently occur as sequences of multidimensional arrays called tensors. How can hidden, evolving trends in such data be extracted while preserving the tensor structure? The model that is traditionally used is the linear dynamical system (LDS) with Gaussian noise, which treats the latent state and observation at each time slice as a vector. We present the multilinear dynamical system (MLDS) for modeling tensor time series and an expectation–maximization (EM) algorithm to estimate the parameters. The MLDS models each tensor observation in the time series as the multilinear projection of the corresponding member of a sequence of latent tensors. The latent tensors are again evolving with respect to a multilinear projection. Compared to the LDS with an equal number of parameters, the MLDS achieves higher prediction accuracy and marginal likelihood for both artificial and real datasets. 1
2 0.87172037 232 nips-2013-Online PCA for Contaminated Data
Author: Jiashi Feng, Huan Xu, Shie Mannor, Shuicheng Yan
Abstract: We consider the online Principal Component Analysis (PCA) where contaminated samples (containing outliers) are revealed sequentially to the Principal Components (PCs) estimator. Due to their sensitiveness to outliers, previous online PCA algorithms fail in this case and their results can be arbitrarily skewed by the outliers. Here we propose the online robust PCA algorithm, which is able to improve the PCs estimation upon an initial one steadily, even when faced with a constant fraction of outliers. We show that the final result of the proposed online RPCA has an acceptable degradation from the optimum. Actually, under mild conditions, online RPCA achieves the maximal robustness with a 50% breakdown point. Moreover, online RPCA is shown to be efficient for both storage and computation, since it need not re-explore the previous samples as in traditional robust PCA algorithms. This endows online RPCA with scalability for large scale data. 1
same-paper 3 0.86556 116 nips-2013-Fantope Projection and Selection: A near-optimal convex relaxation of sparse PCA
Author: Vincent Q. Vu, Juhee Cho, Jing Lei, Karl Rohe
Abstract: We propose a novel convex relaxation of sparse principal subspace estimation based on the convex hull of rank-d projection matrices (the Fantope). The convex problem can be solved efficiently using alternating direction method of multipliers (ADMM). We establish a near-optimal convergence rate, in terms of the sparsity, ambient dimension, and sample size, for estimation of the principal subspace of a general covariance matrix without assuming the spiked covariance model. In the special case of d = 1, our result implies the near-optimality of DSPCA (d’Aspremont et al. [1]) even when the solution is not rank 1. We also provide a general theoretical framework for analyzing the statistical properties of the method for arbitrary input matrices that extends the applicability and provable guarantees to a wide array of settings. We demonstrate this with an application to Kendall’s tau correlation matrices and transelliptical component analysis. 1
4 0.80760694 173 nips-2013-Least Informative Dimensions
Author: Fabian Sinz, Anna Stockl, January Grewe, January Benda
Abstract: We present a novel non-parametric method for finding a subspace of stimulus features that contains all information about the response of a system. Our method generalizes similar approaches to this problem such as spike triggered average, spike triggered covariance, or maximally informative dimensions. Instead of maximizing the mutual information between features and responses directly, we use integral probability metrics in kernel Hilbert spaces to minimize the information between uninformative features and the combination of informative features and responses. Since estimators of these metrics access the data via kernels, are easy to compute, and exhibit good theoretical convergence properties, our method can easily be generalized to populations of neurons or spike patterns. By using a particular expansion of the mutual information, we can show that the informative features must contain all information if we can make the uninformative features independent of the rest. 1
5 0.8036375 45 nips-2013-BIG & QUIC: Sparse Inverse Covariance Estimation for a Million Variables
Author: Cho-Jui Hsieh, Matyas A. Sustik, Inderjit Dhillon, Pradeep Ravikumar, Russell Poldrack
Abstract: The 1 -regularized Gaussian maximum likelihood estimator (MLE) has been shown to have strong statistical guarantees in recovering a sparse inverse covariance matrix even under high-dimensional settings. However, it requires solving a difficult non-smooth log-determinant program with number of parameters scaling quadratically with the number of Gaussian variables. State-of-the-art methods thus do not scale to problems with more than 20, 000 variables. In this paper, we develop an algorithm B IG QUIC, which can solve 1 million dimensional 1 regularized Gaussian MLE problems (which would thus have 1000 billion parameters) using a single machine, with bounded memory. In order to do so, we carefully exploit the underlying structure of the problem. Our innovations include a novel block-coordinate descent method with the blocks chosen via a clustering scheme to minimize repeated computations; and allowing for inexact computation of specific components. In spite of these modifications, we are able to theoretically analyze our procedure and show that B IG QUIC can achieve super-linear or even quadratic convergence rates. 1
6 0.80181378 99 nips-2013-Dropout Training as Adaptive Regularization
7 0.80165958 285 nips-2013-Robust Transfer Principal Component Analysis with Rank Constraints
8 0.80109137 350 nips-2013-Wavelets on Graphs via Deep Learning
9 0.80005372 239 nips-2013-Optimistic policy iteration and natural actor-critic: A unifying view and a non-optimality result
10 0.79978359 278 nips-2013-Reward Mapping for Transfer in Long-Lived Agents
11 0.79927385 49 nips-2013-Bayesian Inference and Online Experimental Design for Mapping Neural Microcircuits
12 0.79925025 310 nips-2013-Statistical analysis of coupled time series with Kernel Cross-Spectral Density operators.
13 0.79912591 201 nips-2013-Multi-Task Bayesian Optimization
14 0.79868436 280 nips-2013-Robust Data-Driven Dynamic Programming
15 0.79866803 262 nips-2013-Real-Time Inference for a Gamma Process Model of Neural Spiking
16 0.79848027 150 nips-2013-Learning Adaptive Value of Information for Structured Prediction
17 0.79726273 101 nips-2013-EDML for Learning Parameters in Directed and Undirected Graphical Models
18 0.79711944 259 nips-2013-Provable Subspace Clustering: When LRR meets SSC
19 0.79701799 5 nips-2013-A Deep Architecture for Matching Short Texts